Building a bot for StarCraft II 1: Introduction

cpufronz (57)in #steemstem • 6 years ago

Source: Flickr

Last semester I did an interesting project, which was about building a bot for StarCraft II. The bot I created is specialised on collecting resources and its goal is to optimise the collection of resources. For that I am using an reinforcement learning algorithm, reinforcement learning algorithms are inspired by how humans learn. The bot (agent) has to interact with the environment and gets a reward, depending on its performance, the goal of the algorithm is to maximise the reward it receives, it does this without any prior "knowledge". In the beginning it only executes random actions and slowly it gets more "intelligent" and it gets better and better at gaining rewards. It was a very interesting project and I learned a lot while working on this project (like TensorFlow), but unfortunately in the end it was only mildly successful, since I don't have enough computaton power at hand to run this computations.

You can find the code for the project on Github, this mini-series is a slightly edited version of my final report. I had to split it up since the whole report is 24 pages long and too much for most to read on Steem, if you want to read it all now, you can also find it on Github.

If you are interested in building a StarCraft II bot yourself, I guess the second part (which I'm going to publish tomorrow) will the most interesting one, since it features an exhaustive description of observations from the PySC2, something I didn't find anywhere before and therefore (hopefully) usefull for you.

Introduction

Recent advances in deep reinforcement learning have successfully mastered the game of Go [SHM+16] and are able to achieve super-human performance in classic Atari video games [MKS+13]. After this accomplishments, artificial intelligence researchers are looking for a new grand challenge. The computer real-time strategy game StarCraft 2 seems to be a promising domain for research on reinforcement learning algorithms, since it provides a problem set that is more challenging that what was done in prior work. E.g. it is a multi-agent problem with multiple players interacting, in an environment that only gives imperfect information and has a large action space [VEB+17].

This report is about my work, which is dealing with a subdomain for the whole problem set: it is about building an agent that is able to harvest resources as efficiently as possible. To solve this task I implemented an agent using the asynchronous advantage actor critic (A3C) algorithm [MBM+16] with TensorFlow. This report will firstly introduce the StarCraft 2 problem domain, by giving a detailed introduction into its environment. Then I will describe my agent, its architecture, the A3C algorithm and how I implemented this algorithm. Finally I will analyse the performance of the agent as well as its behaviour.

Bibliography

MBM+16 Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu.
Asynchronous methods for deep reinforcement learning.
In International Conference on Machine Learning, pages 1928-1937, 2016.
MKS+13Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller.
Playing atari with deep reinforcement learning.
arXiv preprint arXiv:1312.5602, 2013.
SHM+16 David Silver, Aja Huang, Chris Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, Nal Nham, Johnand Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis.
Mastering the game of go with deep neural networks and tree search.
Nature, 529:484 EP -, Jan 2016.
VEB+17 Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, et al.
StarCraft II: A new challenge for reinforcement learning.
arXiv preprint arXiv:1708.04782, 2017.