Ten days ago I released the first version of my Steem Monsters bot into the wild. I've been tweaking and experimenting a lot since then. The current version pushed into CHAMPION III and stayed at the edge of it pretty consistently.
The highest rating it has reached was 3856. At season end it stood at 3711, to which it recovered from 3630. Overall the bot was trending between 3600 and 3800 after the last change.
The first version of the bot just tried to beat the last five teams of the enemy in the current ruleset. As the current API only allows to get the last 50 battles easily, there are rarely suitable matches. Therefore I had to group similar rulesets. Which helped with some, but rulesets like "Keep Your Distance" are too unique to be grouped together.
This approach, let's call it the "naive bot" was sufficient to reach Gold III and sometimes Gold II. Then I put some money on the table and upgraded some of my splinters to (almost) max level. Including Fire, Earth, Death, Dragon and Normal Splinter and some strong units from the other Splinters (e.g. Angel of Life). The new team, along with a Breadth-First Tree Initialization of depth 3 allowed the bot to rise to Gold II and stay there consistently. But for some reason, it couldn't progress beyond that point. Win streaks were followed by long losing streaks and prevented progress. The "naive initialization bot" needed an improvement.
One Step Up
As I've explained in my last post, at the core of my bot I use a Monte Carlo Tree Search (MCTS). This algorithm allows to search large state spaces and finding local or global optima within them without looking at every single combination. In the naive initialization bot, the algorithm was only used for my team. In the next version, I applied the MCTS to both my team and the enemy team. The MCTS tries to beat my last 5 teams with the enemy monster collection and does the same for the last 5 enemy teams with my collection. This created two teams, one from the enemy collection, and another from my collection. Afterwards, I set the algorithm to find a team which can defeat both. The Versus Bot was born. This version left the gold league behind and set its aim at the Diamond League. It crushed the 2400 rank within minutes and quickly thereafter followed 2500, 2600 and 2800. But that's where the Versus Bot got stuck. Barely getting its feet wet in the Diamond League. Never advancing further than 2900, sometimes falling back all the way to Gold I. Although it quickly recovered back into Diamond III a new approach was needed to advance further.
Learn from your mistakes
The salvation for the versus bot was a weak "learning" approach. I wouldn't really call it learning, but it sort of is a very naive and weak learning approach. The bot stores both its own past team selections as well as the enemy teams. Not player specific, but in general. They get ranked per ruleset and splinter with their win/loss count. At the start of each Monte Carlo Tree Search, the bot does the usual Breath First Tree Initialization and afterwards does a couple of "Depth-First" initializations with the last 5 winning teams of each splinter for the ruleset. In total 90 leads for promising teams.
This new approach was doing good at first but seemed to taper off the longer the bot ran. The mistake was pretty obvious, I introduced a limit of 50 teams, but new teams were added to the end. This caused the bot to only remember teams with multiple wins (which rarely happens) and old teams which were added at the beginning. One quick rewrite later new teams were added at the start of those lists. When sorting this kept the strong teams still in front but propagated the new teams at the front of teams with equal win/loss counts. Making them "fresher" in memory. Old teams are forgotten and pushed off the list. With this change, the bot showed a much stronger play and generated weak teams much more rarely.
With this approach, the bot left Diamond III and II behind and fought its path through Diamond I, reaching the Champion League. That's where it got into trouble again. It pushed up to a rating of 3856 but stumbled back to the Edge of the Diamond League and stayed there. Walking in and out of Champion III. That was just a couple of hours ago. Now I need to think up new improvements.
With the current state of the bot, I suspect I need more drastic measures to advance. For now, I'll experiment with some sort of team decay. Making the bot forget teams after a sufficient number of new teams were added. This could lead to the bot improving its generated teams step by step. My thought process behind that is as follows: If a team stays long at the top it influences each team generation. If too many successful teams stay there, the bot becomes predictable and it notices way too late if teams are losing consistently. If the bot starts to forget them, let's say after 10 new teams were added those new teams are already influenced by this old team, and should, in theory, be better adapted to the current opponent situation.
Aside from that an easy quick fix would be to store teams player specific. If the bot encounters an opponent again this gives the bot a good idea which teams it will encounter.
Now onto more drastic measures. One further way of improving would be guiding the MCTS with a classification approach. For example a Random Forest or Neural Network which decides to reward, penalize or prune parts of the game tree. This would guide the MCTS to more efficient teams more quickly. The main issue with this approach is, that I'd need to lower the iterations of the MCTS because the new approach would take longer. On one hand, this can lead to improvements, but on the other, as seen in Googles Alpha Go it can also wrongly guide the search and overlook better solutions. Surely, this can lead to an overall improvement but that is something I need to evaluate before I speculate further. I can also go down a purer Reinforcement Learning approach or combine both approaches. I'll look into them over the coming days.
I'd like to thank you for staying/reading until this point. As my bot has been massively more successful I than I initially expected, I'd like to share some of its League rewards below. The first comment gets Daria Dragon Scale and a Black Dragon. All the other cards get distributed randomly to the next commenters in sequence. Just comment below and include your Steem Monsters username so that I can send them to you. I'll wait with the random distribution until at least ten people have commented. Or until tomorrow evening. Then I'll distribute the cards to the people who have commented until then.
|Daria Dragonscale lvl 1||Black Dragon lvl 1|
|Naga Fire Wizard lvl 1||Imp Bowman lvl 1||Gold Wood Nymph lvl 1||Javelin Thrower lvl 2||Vampire lvl 3|
|Daria Dragonscale lvl 1||Skeletal Warrior lvl 2||Highland Archer lvl 3||Rusty Android lvl 3||Goblin Mech lvl 3|