Image recognition with machine learning in python and tensorflow
Hello,
This is the seventh episode of my series "automating games with python"
You can find on my account How I made my own python bot to automate complex games (part 1) Which explains my motivation and the game I'm automating itself. the part 2 is How to control the mouse and keyboard with python for automation
Which digs into the core functions that are needed for automation. Part 3 which talks about a wrapper that I made to easily implement image searching within your python program and how to use it : Image recognition with python.
Part 4 : Why I had to use machine learning for bypassing the anti-bot security explains why I needed machine learning and how simple image recognition was not enough. Part 5 goes into the creation of the dataset and the challenges I've faced while doing so : How to create your own dataset for machine learning.
Part 6 covers the theory behind image recognition with ML : A crash course on image recognition with machine learning
The model
Our data is very easy to recognize for a neural network, Because the images are always pretty much the same, they have the same angle, the same contrast, the same size, the same colors etc etc. Most of the usual complexity is not here. So we can use a very simple neural net.
source : homemade
I did not represent the 840 input nodes because there would be so many connections that it would look like a triangle Curious ? (yep... A triangle).
This is a neural network with no hidden layers, with 840 input pixels since the images are 35*24, black and white hence no color channels. linked to an ouput layer of 9 neurons to classify our 9 classes :
source : homemade.
Side note about the labels :
The cross is labelled as "0" in the folders (images/0/*), but as 8 when it's predicted from the neural network, I swapped it by accident, and then built everything on top of it, so don't get confused about that.
This makes training very fast, since we only have 7 569 parameters to learn (840 * 9 weights and 9 biases).
The code
We will use tensorflow because it allows us not to reinvent the wheel and code faster, but a vanilla python implementation would work fine as well.
Note : I introduced the Capchat class in How to create your own dataset for machine learning.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import cv2
from os import listdir
from os.path import isfile, join
import numpy as np
import tensorflow as tf
import math
def main(_):
# Import the data
cap = Capchat()
# Create the model
x = tf.placeholder(tf.float32, [None, 840])
W = tf.Variable(tf.zeros([840, 9]), name="weights")
b = tf.Variable(tf.zeros([9]), name="biases")
mult = tf.matmul(x, W) # W * X...
y = tf.add(mult, b, name="calc") # + b
# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 9])
# cost function
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
# optimizer
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
# allows to save the model later
saver = tf.train.Saver()
# start a session to run the network on
sess = tf.InteractiveSession()
# initialize global variables
tf.global_variables_initializer().run()
# Train for 1000 steps, notice the cap.X_train and cap.Y_train
for _ in range(1000):
sess.run(train_step, feed_dict={x: cap.X_train, y_: cap.Y_train})
# Extract one hot encoded output via argmax
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
# Test for accuraccy on the testset, notice the cap.X_test and cap.Y_test
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: cap.X_test,
y_: cap.Y_test}))
# save the model learned weights and biases
saver.save(sess, "./model")
if __name__ == '__main__':
tf.app.run(main=main)
(we achieve around 99.3% accuracy)
With that we have our trained model, which can solve our anti bot problem, let's move onto the production !
Moving to production
We have 15 squares total (8 around the cat, 8 around the character but one is hidden so we only capture 7)
source : screenshot of the game wakfu
let's recap what we need :
- Find which numbers are around the cat
- Find the position of those 3 numbers around the character
- click on them
So, first up, we grab all the squares, so they can be identified, we do that the same way we did when we grabbed the training data. Then we put all that into an array X.
X = np.zeros((15, 840), np.int32) # global variable
def preprocess(im, index):
im = np.array(im) # image to array
im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY).flatten() # turn to black and white and flatten
X[index] = im
def get_images():
pos = imagesearch_small("./cap/character.png", 0.7)
pos1 = imagesearch_small("./cap/cat.png", 0.7)
print("capturing the images")
scene = pyautogui.screenshot()
# squares around the character
character1 = scene.crop((pos[0] - 43,pos[1] - 24, pos[0] - 8, pos[1]))
character2 = scene.crop((pos[0] - 87, pos[1] - 2, pos[0] - 52, pos[1] + 22))
character3 = scene.crop((pos[0] - 43, pos[1] + 19, pos[0] - 8, pos[1] + 43))
character4 = scene.crop((pos[0] + 1, pos[1] + 40, pos[0] + 36, pos[1] + 64))
character5 = scene.crop((pos[0] + 43, pos[1] + 19, pos[0] + 78, pos[1] + 43))
character6 = scene.crop((pos[0] + 87, pos[1] - 3, pos[0] + 122, pos[1] + 21))
character7 = scene.crop((pos[0] + 43, pos[1] - 24, pos[0] + 78, pos[1]))
# squares around the cat
cat1 = scene.crop((pos1[0] - 43, pos1[1] - 24, pos1[0] - 8, pos1[1]))
cat2 = scene.crop((pos1[0] - 87, pos1[1] - 2, pos1[0] - 52, pos1[1] + 22))
cat3 = scene.crop((pos1[0] - 43, pos1[1] + 19, pos1[0] - 8, pos1[1] + 43))
cat4 = scene.crop((pos1[0] + 1, pos1[1] + 40, pos1[0] + 36, pos1[1] + 64))
cat5 = scene.crop((pos1[0] + 43, pos1[1] + 19, pos1[0] + 78, pos1[1] + 43))
cat6 = scene.crop((pos1[0] + 87, pos1[1] - 3, pos1[0] + 122, pos1[1] + 21))
cat7 = scene.crop((pos1[0] + 43, pos1[1] - 24, pos1[0] + 78, pos1[1]))
cat8 = scene.crop((pos1[0] + 1, pos1[1] - 46, pos1[0] + 36, pos1[1] - 22))
# preprocess and store it in the global X variable
preprocess(character1, 0)
preprocess(character2, 1)
preprocess(character3, 2)
preprocess(character4, 3)
preprocess(character5, 4)
preprocess(character6, 5)
preprocess(character7, 6)
preprocess(cat1, 7)
preprocess(cat2, 8)
preprocess(cat3, 9)
preprocess(cat4, 10)
preprocess(cat5, 11)
preprocess(cat6, 12)
preprocess(cat7, 13)
preprocess(cat8, 14)
return pos
This is rather ugly and could be re-factored but here I wanted to write code that can be understood even by non coders.
Getting the predictions :
This is very easy,
We load the the model, get the weights and biases and run the model :
with tf.Session() as session:
# restore the model
saver = tf.train.import_meta_graph("./cap/mod.meta")
saver.restore(session, tf.train.latest_checkpoint("./cap/"))
graph = tf.get_default_graph()
# create the x variable to store the inputs
x = tf.placeholder(tf.float32, [None, 840])
# get weights and biases
W = graph.get_tensor_by_name("weights:0")
b = graph.get_tensor_by_name("biases:0")
# the complete network
# tf.argmax converts the one-hot to numbers
y = tf.argmax(tf.nn.softmax(tf.matmul(x, W) + b), axis=1)
P = session.run(y, feed_dict={x: X})
# an array with one hot encoded predictions
predictions = P
This will output an array with our predictions, so it will have the form
[3, 8, 2,....] with each number being the number classified on the square.
Now, we grab the 8 cat predictions in one array :
cat_predictions = predictions[7:] # 0 to 7 = 8
And put the rest in another array
char_predictions = predictions[:7]
We now have the cat_prediction array that looks like this
[8,8,8,3,5,8,2,8] (the 8 represent the crosses)
So we simply have to get which indices of the array are not eights
indices = [i for i, x in enumerate(cat_predictions) if x != 8]
Now we know what numbers we need to click ! We just need to find where they are and handle the hidden square case :
positions = []
for indice in indices:
try:
number_to_find = cat_predictions[indice]
# char_predictions.index is like a find function
position_of_the_square = char_predictions.index(number_to_find)
positions.append(position_of_the_square)
except ValueError:
# if there is an error this means that the number was not seen, so it must be positionned on the hidden square
positions.append(7)
(again, this is a one liner that was extended for more readability)
And that's it ! Now we know which squares we need to click to bypass the anti-bot protection !
Conclusion :
After a post that was only the theory, it was about time that we got our hand dirty with code and we did !
99% may not be good enough, on such a simple dataset we want 100%, The easy way to do that is to stack a few more layers but honestly I've let this run for a long time and I have yet to see it fail. In another post we'll take it to 100% accuracy with a total overkill deep convolutional neural network .
If you want to toy around I've uploaded the training images and an adapted version of the code that does not require the game to github : https://github.com/drov0/tensorflow-example
And if you want to try to make your own there are great tutorials from the tensorflow team :
https://www.tensorflow.org/get_started/mnist/beginners
and
https://www.tensorflow.org/get_started/mnist/pros
Martin
@originalworks
The @OriginalWorks bot has determined this post by @howo to be original material and upvoted it!
To call @OriginalWorks, simply reply to any post with @originalworks or !originalworks in your message!
To enter this post into the daily RESTEEM contest, upvote this comment! The user with the most upvotes on their @OriginalWorks comment will win!
For more information, Click Here!
Special thanks to @reggaemuffin for being a supporter! Vote him as a witness to help make Steemit a better place!