Lego EV3 Robot Obstacle Avoidance with Q Learning

toddrjohnson (54)in #steemstem • 6 years ago (edited)

In an earlier post I described my daughters 6th grade science fair project to examine the exploration/exploitation trade-off in reinforcement learning. That post covers more details of how the robot is learning and the software and hardware we are using to control the robot. This is is just a quick video of one of the trials showing the robot learning with a small amount of exploration that decays over time. In this case, the probability of randomly taking a move each step (epsilon) is 10/(10+t) where t is the step number. If the robot does not take a random move it uses its Q table (see the previous post) to take what it has learned is the best move for the state it is in. Each trial consists of 2000 steps. At the beginning of the video t is such that there is around an 80% chance of taking a random move. After 1000 steps (the middle of the video) the chance of a random move has dropped to less than 10%.

The robots goal is to learn to maximize its total future discounted reward. Each step of the trial, the robot selects an action and then gets a reward. If the robot moves forward it gets +1. If it turns right or left it gets +0.5. All other actions (rotating or moving backwards) get -1.0. Hitting a wall gives it -5.0.

To sense its environment, the robot has only one touch sensor tied to the front bumper and one ultrasonic sensor. The ultrasonic sensor can read distance from 0 to 255 cm, but to keep this simple for my daughter, the distance is rescaled to 0 to 4. After a few trials and some follow-up tests with the ultrasonic sensor, we realized that the sensor cannot detect the wall if it is at a very low (acute) angle to the wall. At that point, it returns that it is 200+ cm from the wall. As a result, the robot tends to learn a conservative strategy of simply going in circles when it is far from walls. Under perfect sensor conditions, this would not be optimal as it will only return .5 per step, or 100 for 200 steps vs. what should be closer to 200 under an optimal go forward then turn before the wall strategy. However, the problems with the ultrasonic sensor, possibly in combination with the various parameters we have set, are making it difficult to find and maintain that strategy.

Another problem here is that due to the sensor issue and how the distance is discretized, the robot's world is only partially observable. The robot would benefit from adding a memory of recent states, but that would greatly complicate the Q table.

The robot could also benefit from moving to a learning approach that uses continuous states and actions, but that requires a shift to much more complex algorithms and underlying technology, such as DDPG (Deep Deterministic Policy Gradient). That's not going to be something my 6th grader can explain to others.

Very rough code is available here: https://github.com/tjohnson250/ev3rl

To support this post please upvote, follow @toddrjohnson, and consider using one of my referral links below:

Presearch: Earn Cryptocurrency By Searching the Internet
Honeyminer: Start mining bitcoin in 1 minute

Proud member of:

](www.steemit.com/@steemitbloggers)]

#science #steemitbloggers #machinelearning

6 years ago in #steemstem by toddrjohnson (54)

$0.20

Sort:

Trending

[-]

minnowsupport (70) 6 years ago

Congratulations! This post has been upvoted from the communal account, @minnowsupport, by toddrjohnson from the Minnow Support Project. It's a witness project run by aggroed, ausbitbank, teamsteem, someguy123, neoxian, followbtcnews, and netuoso. The goal is to help Steemit grow by supporting Minnows. Please find us at the Peace, Abundance, and Liberty Network (PALnet) Discord Channel. It's a completely public and open space to all members of the Steemit community who voluntarily choose to be there.

If you would like to delegate to the Minnow Support Project you can do so by clicking on the following links: 50SP, 100SP, 250SP, 500SP, 1000SP, 5000SP.
Be sure to leave at least 50SP undelegated on your account.

$0.00

[-]

thegoliath (66) 6 years ago

Great to see it working well and learning a lot better. I like how smooth it turns while in the corner.

$0.00

[-]

bengy (73) 6 years ago

Wow, this is pretty awesome! It moves a bit like our robot vacuum!

$0.00

[-]

quillfire (61) 6 years ago

@toddrjohnson,

This is a GREAT Science Fair Experiment.

Fascinating to see how machine learning is approached.

Boy, it really makes you appreciate your brain, doesn't it? And, just like your brain, it's all about patterns.

Great post.

Quill

$0.00

[-]

steemitboard (66) 6 years ago

Congratulations @toddrjohnson! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You got more than 400 replies. Your next target is to reach 500 replies.

_{Click here to view your Board of Honor}
_{If you no longer want to receive notifications, reply to this comment with the word STOP}

Support SteemitBoard's project! Vote for its witness and get one more award!

$0.00

[-]

steemitboard (66) 6 years ago

Congratulations @toddrjohnson! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You made more than 600 comments. Your next target is to reach 700 comments.

_{Click here to view your Board of Honor}
_{If you no longer want to receive notifications, reply to this comment with the word STOP}

Do not miss the last post from @steemitboard:

Saint Nicholas challenge for good boys and girls

Support SteemitBoard's project! Vote for its witness and get one more award!

$0.00

STEEM 0.15

TRX 0.15

JST 0.028

BTC 53949.09

ETH 2223.84

USDT 1.00

SBD 2.31

Lego EV3 Robot Obstacle Avoidance with Q Learning

Coin Marketplace