Image recognition with machine learning in python and tensorflowsteemCreated with Sketch.

in #steemstem9 years ago (edited)

Hello,

This is the seventh episode of my series "automating games with python"

You can find on my account How I made my own python bot to automate complex games (part 1) Which explains my motivation and the game I'm automating itself. the part 2 is How to control the mouse and keyboard with python for automation
Which digs into the core functions that are needed for automation. Part 3 which talks about a wrapper that I made to easily implement image searching within your python program and how to use it : Image recognition with python.
Part 4 : Why I had to use machine learning for bypassing the anti-bot security explains why I needed machine learning and how simple image recognition was not enough. Part 5 goes into the creation of the dataset and the challenges I've faced while doing so : How to create your own dataset for machine learning.
Part 6 covers the theory behind image recognition with ML : A crash course on image recognition with machine learning

The model

Our data is very easy to recognize for a neural network, Because the images are always pretty much the same, they have the same angle, the same contrast, the same size, the same colors etc etc. Most of the usual complexity is not here. So we can use a very simple neural net.

source : homemade

I did not represent the 840 input nodes because there would be so many connections that it would look like a triangle Curious ? (yep... A triangle).

This is a neural network with no hidden layers, with 840 input pixels since the images are 35*24, black and white hence no color channels. linked to an ouput layer of 9 neurons to classify our 9 classes :




source : homemade.

Side note about the labels :
The cross is labelled as "0" in the folders (images/0/*), but as 8 when it's predicted from the neural network, I swapped it by accident, and then built everything on top of it, so don't get confused about that.

This makes training very fast, since we only have 7 569 parameters to learn (840 * 9 weights and 9 biases).

The code

We will use tensorflow because it allows us not to reinvent the wheel and code faster, but a vanilla python implementation would work fine as well.

Note : I introduced the Capchat class in How to create your own dataset for machine learning.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function


import cv2
from os import listdir
from os.path import isfile, join
import numpy as np
import tensorflow as tf
import math

def main(_):
  # Import the data
  cap = Capchat()
  # Create the model
  x = tf.placeholder(tf.float32, [None, 840])
  W = tf.Variable(tf.zeros([840, 9]), name="weights")
  b = tf.Variable(tf.zeros([9]), name="biases")
  mult = tf.matmul(x, W) # W * X...
  y = tf.add(mult, b, name="calc") # + b

  # Define loss and optimizer
  y_ = tf.placeholder(tf.float32, [None, 9])

  # cost function
  cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
  # optimizer
  train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
  # allows to save the model later
  saver = tf.train.Saver()

  # start a session to run the network on
  sess = tf.InteractiveSession()

  # initialize global variables
  tf.global_variables_initializer().run()

  # Train for 1000 steps, notice the cap.X_train and cap.Y_train

  for _ in range(1000):
    sess.run(train_step, feed_dict={x: cap.X_train, y_: cap.Y_train})

  # Extract one hot encoded output via argmax
  correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))

  # Test for accuraccy on the testset, notice the cap.X_test and cap.Y_test
  accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  print(sess.run(accuracy, feed_dict={x: cap.X_test,
                                      y_: cap.Y_test}))

  # save the model learned weights and biases
  saver.save(sess, "./model")


if __name__ == '__main__':
  tf.app.run(main=main)

(we achieve around 99.3% accuracy)

With that we have our trained model, which can solve our anti bot problem, let's move onto the production !

Moving to production

We have 15 squares total (8 around the cat, 8 around the character but one is hidden so we only capture 7)


source : screenshot of the game wakfu

let's recap what we need :

  • Find which numbers are around the cat
  • Find the position of those 3 numbers around the character
  • click on them

So, first up, we grab all the squares, so they can be identified, we do that the same way we did when we grabbed the training data. Then we put all that into an array X.

X = np.zeros((15, 840), np.int32) # global variable

def preprocess(im, index):
    im = np.array(im) # image to array
    im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY).flatten() # turn to black and white and flatten
    X[index] = im


def get_images():
    pos = imagesearch_small("./cap/character.png", 0.7)
    pos1 = imagesearch_small("./cap/cat.png", 0.7)

    print("capturing the images")

    scene = pyautogui.screenshot()

    # squares around the character
    character1 = scene.crop((pos[0] - 43,pos[1] - 24, pos[0] - 8, pos[1]))
    character2 = scene.crop((pos[0] - 87, pos[1] - 2, pos[0] - 52, pos[1] + 22))
    character3 = scene.crop((pos[0] - 43, pos[1] + 19, pos[0] - 8, pos[1] + 43))
    character4 = scene.crop((pos[0] + 1, pos[1] + 40, pos[0] + 36, pos[1] + 64))
    character5 = scene.crop((pos[0] + 43, pos[1] + 19, pos[0] + 78, pos[1] + 43))
    character6 = scene.crop((pos[0] + 87, pos[1] - 3, pos[0] + 122, pos[1] + 21))
    character7 = scene.crop((pos[0] + 43, pos[1] - 24, pos[0] + 78, pos[1]))

    # squares around the cat
    cat1 = scene.crop((pos1[0] - 43, pos1[1] - 24, pos1[0] - 8, pos1[1]))
    cat2 = scene.crop((pos1[0] - 87, pos1[1] - 2, pos1[0] - 52, pos1[1] + 22))
    cat3 = scene.crop((pos1[0] - 43, pos1[1] + 19, pos1[0] - 8, pos1[1] + 43))
    cat4 = scene.crop((pos1[0] + 1, pos1[1] + 40, pos1[0] + 36, pos1[1] + 64))
    cat5 = scene.crop((pos1[0] + 43, pos1[1] + 19, pos1[0] + 78, pos1[1] + 43))
    cat6 = scene.crop((pos1[0] + 87, pos1[1] - 3, pos1[0] + 122, pos1[1] + 21))
    cat7 = scene.crop((pos1[0] + 43, pos1[1] - 24, pos1[0] + 78, pos1[1]))
    cat8 = scene.crop((pos1[0] + 1, pos1[1] - 46, pos1[0] + 36, pos1[1] - 22))

    # preprocess and store it in the global X variable
    preprocess(character1, 0)
    preprocess(character2, 1)
    preprocess(character3, 2)
    preprocess(character4, 3)
    preprocess(character5, 4)
    preprocess(character6, 5)
    preprocess(character7, 6)

    preprocess(cat1, 7)
    preprocess(cat2, 8)
    preprocess(cat3, 9)
    preprocess(cat4, 10)
    preprocess(cat5, 11)
    preprocess(cat6, 12)
    preprocess(cat7, 13)
    preprocess(cat8, 14)
    return pos

This is rather ugly and could be re-factored but here I wanted to write code that can be understood even by non coders.

Getting the predictions :

This is very easy,
We load the the model, get the weights and biases and run the model :

with tf.Session() as session:
    # restore the model
    saver = tf.train.import_meta_graph("./cap/mod.meta")
    saver.restore(session, tf.train.latest_checkpoint("./cap/"))
    graph = tf.get_default_graph()
    # create the x variable to store the inputs
    x = tf.placeholder(tf.float32, [None, 840])
    # get weights and biases 
    W = graph.get_tensor_by_name("weights:0")
    b = graph.get_tensor_by_name("biases:0")
    # the complete network
    # tf.argmax converts the one-hot to numbers
    y = tf.argmax(tf.nn.softmax(tf.matmul(x, W) + b), axis=1)
    
    P = session.run(y, feed_dict={x: X})
    # an array with one hot encoded predictions
    predictions = P

This will output an array with our predictions, so it will have the form

[3, 8, 2,....] with each number being the number classified on the square.

Now, we grab the 8 cat predictions in one array :

cat_predictions = predictions[7:] # 0 to 7 = 8

And put the rest in another array

char_predictions = predictions[:7]

We now have the cat_prediction array that looks like this

[8,8,8,3,5,8,2,8] (the 8 represent the crosses)

So we simply have to get which indices of the array are not eights

indices = [i for i, x in enumerate(cat_predictions) if x != 8]

Now we know what numbers we need to click ! We just need to find where they are and handle the hidden square case :

positions = []
for indice in indices:
    try:
        number_to_find = cat_predictions[indice]
        # char_predictions.index is like a find function
        position_of_the_square = char_predictions.index(number_to_find)
        positions.append(position_of_the_square)
    except ValueError:
        # if there is an error this means that the number was not seen, so it must be positionned on the hidden square 
        positions.append(7)

(again, this is a one liner that was extended for more readability)

And that's it ! Now we know which squares we need to click to bypass the anti-bot protection !

Conclusion :

After a post that was only the theory, it was about time that we got our hand dirty with code and we did !

99% may not be good enough, on such a simple dataset we want 100%, The easy way to do that is to stack a few more layers but honestly I've let this run for a long time and I have yet to see it fail. In another post we'll take it to 100% accuracy with a total overkill deep convolutional neural network .

If you want to toy around I've uploaded the training images and an adapted version of the code that does not require the game to github : https://github.com/drov0/tensorflow-example

And if you want to try to make your own there are great tutorials from the tensorflow team :

https://www.tensorflow.org/get_started/mnist/beginners
and
https://www.tensorflow.org/get_started/mnist/pros

Martin

Sort:  

The @OriginalWorks bot has determined this post by @howo to be original material and upvoted it!

ezgif.com-resize.gif

To call @OriginalWorks, simply reply to any post with @originalworks or !originalworks in your message!

To enter this post into the daily RESTEEM contest, upvote this comment! The user with the most upvotes on their @OriginalWorks comment will win!

For more information, Click Here!
Special thanks to @reggaemuffin for being a supporter! Vote him as a witness to help make Steemit a better place!

Coin Marketplace

STEEM 0.05
TRX 0.33
JST 0.079
BTC 63953.57
ETH 1690.68
USDT 1.00
SBD 0.42