Building a bot for StarCraft II 2: The StarCraft II Environment

cpufronz (57)in #steemstem • 8 years ago

Environment

The pysc2 framework¹. allows StarCraft 2 agents to be written in Python, by providing a Python interface for the StarCraft 2 API². This interface was build with reinforcement learning in mind and provides informations about the state of the current game, so it can be easily used in reinforcement learning algorithms. In the following three subsections I will introduce which observations are provided by the framework, how they are connected to states and what actions the agent can perform.

The information in this section is take from [Lim17] and [BE17] or from reading the source code myself.

Observations

The pysc2 framework provides the following observations:

available_actions
build_queue
cargo
cargo_slots_available
control_groups
game_loop
minimap
multi_select
player
score_cumulative
screen
single_select

Probably the most important observations are screen and minimap, because the contain information about the units on screen, respectively on the game map. Both consist of different feature layers, as shown by figure 1. All feature layers are two-dimensional arrays, containing the information similar to the pixels in an image.

minimap

The minimap is a low resolution of the whole game map, it gives an overview of what's going on in the game, but with less detail. Initially it shows the whole map, but only the terrain including information about where to find resources, the position of the opponent is not shown, it is hidden by the so-called "fog-of-war"³. The minimap is a (7, x, y) tensor, where x and y are the x- and y-resolution of the minimap. The first dimension represents the following features:

height_map: Shows the terrain level. It takes values [0,255], which gradually denote the height of the map, 0 being the bottom layer and 255 being the highest elevation of the map.
visibility: Which parts of the map are hidden, have been seen or are currently visible. It takes values [0,1,2] denoting [hidden, have been seen, currently visible].
creep: Which parts of the map are covered with Zerg creep⁴. It takes values [0,1] denoting denoting [no creep, creep].
camera: Which part of the maps is visible by the screen layer. It takes values [0,1] denoting [not visible, visible].
player_id: Which player owns the units. It takes values [0,16], where values from 0 to 15 denote the absolute player_id and 16 denotes neutral units.
player_relative: Which status the units have relative to the player. It takes values [0, 1, 2, 3, 4] denoting [background, self, ally, neutral, hostile].
selected: Which units are selected. It takes values [0,1] denoting [not selected, selected].

screen

The screen represents the visual/spatial features of the current game, similar to the minimap, showing only a part of the game map, but with a higher resolution. So far the screen is only represented through feature layers, but for future releases it is planned to add an RGB Pixel representation [VEB+17]. Just like for the minimap also here some details are hidden by the fog-of-war, if there is no player unit present at the part of the map which is currently shown by screen. The screen is a (17, x, y) tensor, where x and y are the x- and y-resolution of the game and the first dimension represents the following features:

height_map: The terrain level, like height_level from minimap.
visibility_map: The visibility of the map, like visibility from minimap.
creep: Which parts of the screen are covered with Zerg creep, like creep from minimap.
power: Which parts of the screen are supplied with power⁵.
player_id: Which player owns the units on screen, like player_id from minimap.
player_relative: Which status the units on screen have relative to the player, like player_relative from minimap.
unit_type: The unit ids for all units on screen, the unit id is an integer from 0 to 894⁶.
selected: Which units on screen are selected. It takes values [0,1] denoting [selected, not selected].
unit_hit_points: The absolute hit points the units on screen have. It takes values from 0, denoting no unit, to whatever value is the current amount of hit points for units on screen.
unit_hit_points_ratio: The relative amount of hit points of the units on screen with respect to their maximum amount. It takes values from 0 to 255, denoting no unit/unit destroyed or a unit with 100% hit points.
unit_energy: The absolute amount of energy points units on screen have⁷. It takes values from 0, denoting no unit/energy to whatever value is the current amount of energy for units on screen.
unit_energy_ratio: The relative amount of energy, similar to unit_hit_points_ratio. It takes values from 0 to 255, denoting no unit/unit has no energy or a unit with 100% energy points.
unit_shields: The absolute amount of shield points units on screen have⁸. It takes values for 0, denoting no unit/shield points to whatever value is the current value of shield points for units on screen
unit_shields_ratio: The relative amount of energy, similar to unit_hit_points_ratio. It takes values from 0 to 255, denoting no unit/shield points or a unit with 100% shield points.
unit_density: How many units are in this pixel.
unit_density_aa: Like unit_density, but anti-aliased with a maximum of 16 per unit per pixel. It shows how much of a pixel is covered by a unit, if multiple units are on a pixel it each proportion will be summed up to a value of maximum 256.
effects: Effects are the visualisation of an ongoing special action (e.g. healing Terran units). It takes integer values from 0 to 3687⁹.

player

A (11) tensor showing general information about the player, giving the following informations:

player_id: The ID of the player, an integer from 0 to 15.
minerals: Current count of minerals.
vespene: Current count of vespene gas.
food_used: The current amount of supply used¹⁰.
food_cap: The current maximum number of supply.
food_army: How much of the supply is used for army units.
food_workers: How much of the supply is used for worker units.
idle_worker_count: How many of the workers are currently idle.
army_count: Current number of all army units.
warp_gate_count: Current number of warp gates, only for Protoss.
larva_count: Current number of larva, only for Zerg.

single_select

A (7) tensor showing information about the selected unit:

unit_type: The type of the unit.
player_relative: Which status of the unit, relative to the player. It takes values [0, 1, 2, 3, 4] denoting [background, self, ally, neutral, hostile].
health: The current health points of the unit as absolute number¹¹.
shields: The current shield points of the unit as absolute number¹².
energy: The current amount of energy points of the unit as absolute number¹³.
transport_slot: If the unit is transported, the amount of transport slots taken.
build_progress: If the unit is being build, the percentage of how far the building process is.

multi_select

A (n, 7) tensor, similar to single_select, but for n selected units.

build_queue

A (n, 7) tensor, similar to single_select, but for all units that are in the build queue of a production building, where n is the number of units in the build queue.

cargo

A (n, 7) tensor, similar to single_select, but for n units that are in a transporter.

control_groups

A (10, 2) tensor showing the unit leader type and count for all 10 control groups. A control group of units is a group that is mapped to a certain hot-key (0-9) in order to quickly access them.

available_actions

A (n) tensor listing all actions that are available at the time of this observation. The amount of total actions is quite high, but not all actions are available at any given point. The number n of actions that are available at a given point in time is actually quite low. Which actions are available depends on what (if a) unit is selected: Some units have a lot of actions, others less (e.g. buildings have in general less). But there are some general actions (like moving the screen), which are always available.

score_cumulative

A (13) tensor showing the "Blizzard score". The Blizzard score is usually presented to the player after the game, showing specific values of how he played, in order to judge how good it was:

score: The total Blizzard score for the current step, the higher the better.
idle_production_time: Sum of the time where the player's production is stuck, because of reaching the supply cap, the lower the better¹⁴.
idle_worker_time: Sum of the time the player's worker units spend doing nothing, the lower the better.
total_value_units: Total sum of the value of all units the player built (value does not get decreased if a unit gets destroyed), the higher the better.
total_value_structures: Total sum of the value of all structures the player built (value does not get decreased if a structure gets destroyed), the higher the better.
killed_value_units: Total sum of all destroyed enemy units¹⁵, the higher the better.
killed_value_structures: Total sum of all destroyed enemy buildings¹⁶, the higher the better.
collected_minerals: Total amount of collected minerals, the higher the better.
collected_vespene: Total amount of collected gas, the higher the better.
collection_rate_minerals: Current rate of mineral collection (minerals per minute), the higher the better.
collection_rate_vespene: Current rate of gas collection (gas per minute), the higher the better.
spent_minerals: The total amount of minerals spent by the player, here it really depends which is better, in general the higher the better, since it correlates with a bigger, more advanced army and more workers (better economy).
spent_vespene: The total amount of gas spent by the player, here it really depends which is better, in general the higher the better, since it correlates with a bigger, more advanced army.

game_loop

A scalar that gives the number of the current iteration in the game loop.

cargo_slots_available

A scalar that indicates how many cargo slots are available in a selected transporter.

Figure 1: Graphical representation of the different feature layers in the pysc2 framework.

States

As the exhaustive description of all observations during a single game step shows, the combination of possible states is huge. Therefore only the combination of a few features qualifies to become states for the reinforcement learning algorithm. What makes the whole situation even more difficult is the fact that the action space is in general continuous, a large number of actions needs x- and y-coordinates where it's performed.

To reduce the agent's complexity, I will only use one of the three races for the beginning. I chose the Terrans, the humans, since they are usually the race used to teach the basics of StarCraft II. Unlike the other two races, they have no restrictions where they can build their structures.

I will use following:

minimap
- visibility
- camera
screen
- player_relative
- unit_type
player
- minerals
- vespene
- food_used
- food_cap
- food_workers
- idle_worker_count
score_cumulative
- score
- idle_worker_time
- collected_minerals
- collected_vespene
- collection_rate_minerals
- collection_rate_vespene
available_actions

Actions

As mentioned above, the action space is huge, since it is continuous. But also the number of different actions is rather large (in total there are 524 different actions). The framework tackles this issue by only allowing the agent to execute valid actions, meaning limiting the agent to only execute actions that are currently available. The currently available actions can be taken from the available_actions tensor. To keep things easy in the beginning only the following 15 actions will be used:

no_op: Do nothing. It requires no target position.
move_camera: Moves the camera, so it is centred around the target position. It requires a target position on the minimap.
select_point: Selects what is at the target position (the action is also executed, if there is nothing at the target position, in that case nothing is selected). It requires a target position on the screen.
select_idle_worker: Selects an idle worker (more or less randomly). It requires no target position.
Build_CommandCenter_screen: Builds a command centre. It requires a target position on the screen.
Build_Refinery_screen: Builds a refinery for gas on the screen, note that a refinery can only by build on a vespene geyser. It requires a target position on the screen.
Build_SupplyDepot_screen: Builds a supply depot. It requires a target position on screen.
Harvest_Gather_screen: Sends a worker unit to collect resources. It requires a target position.
Harvest_Return_quick: Makes a resource collecting worker unit to return the currently collected resources to base immediately. It requires not target position.
Morph_SupplyDepot_Lower_quick: Lowers a supply depot, so it doesn't block units from passing the field it is build on. It requires no target position.
Morph_SupplyDepot_Raise_quick: Raises a previously lowered supply depot (when constructed supply depots are always raised), so it does prevent units from passing the field it is build on. It requires no target position.
Move_screen: Moves units to the given position on the screen. It requires a target position on the screen.
Move_minimap: Moves units to the given position on the minimap. It requires a target position on the minimap.
Rally_Workers_screen: Sets a rally point for workers on the screen. It requires a target position on screen.
Rally_Workers_minimap: Sets a rally point for workers on the minimap. It requires a target position on the minimap.

Bibliography

BE17 Inc. Blizzard Entertainment.
SC2API documentation. https://blizzard.github.io/s2client-api/, 2017.
[Online; accessed 2018-03-29].
Lim17 DeepMind Technologies Limited.
Pysc2 - StarCraft II Learning Environment.
https://github.com/deepmind/pysc2/blob/master/docs/environment.md, 2017.
[Online; accessed 2018-03-29].
VEB+17 Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, et al.
StarCraft II: A new challenge for reinforcement learning.
arXiv preprint arXiv:1708.04782, 2017.

Footnotes

[1] https://github.com/deepmind/pysc2 verified 2018-03-29
[2] https://github.com/Blizzard/s2client-api verified 2018-03-29
[3] Fog-of-war means that only those parts of the game world are visible where the player has a unit. If no unit is there, the visibility of that area slowly fades out so that only buildings remain visible, but no units. Changes (e.g. newly build buildings) will not be shown, until the player send a new unit to explore. This mechanism of the game encourages the player to explore the map.
[4] One of the three races in the game, the Zerg, can only build buildings if the ground is covered with "creep". Only their command centre and the gas-extractor do not require to be build on creep. Creep can be extended by building special structures or by performing a special action for expansion. Zerg units move faster and regenerate when on creep, if a Zerg building is not surrounded by creep, it will take damage over time (apart from those buildings which don't need creep in the first place). This game-mechanism emphasizes the organic/insect-like nature of the Zerg, as well as adding another strategical dimension to the game, when playing as Zerg, since it restricts where the player can build structures.
[5] Analogous to the Zerg's creep, Protoss structures need to be build on a part of the game map, that is supplied with power. Pylons supply the surrounding fields with power, meaning that other structures can be build there. If a pylon gets destroyed, the surrounding structures are shutting down, meaning being unable to operate, unless they are in reach of another pylon, which supplies them with energy. Like the Zerg's creep this limits the places, where Protoss can put structures (just like for the Zerg, this limitation does not apply to a command centre and a gas assimilator and also not to the pylons themself), making it more difficult to expand.
[6] A list of unit ids can be found here: https://github.com/Blizzard/s2client-api/blob/master/include/sc2api/sc2_typeenums.h line 25ff., verified 2017-11-29.
[7] Note that only some units have energy points, those are used to perform special actions.
[8] Note that only Protoss units have shields.
[9] A list of effect ids can be found here: https://github.com/Blizzard/s2client-api/blob/master/include/sc2api/sc2_typeenums.h line 388ff., verified 2018-03-29.
[10] Besides minerals and gas, food, also known as supply, is the third kind of resource. It limits how many units a player can have in total, if the current supply cap is reached no more new units can be produced. The maximum amount of supply is 200, but this does not mean, that every player can have maximum 200 units, since stronger units require more than 1 supply, the most supply one unit can use is 8. In order to raise this supply cap, the player has to build special buildings/units depending on which race he plays.
[11] Note that just from this value it is not evident, if the unit is damaged or not, this information can be derived from the unit_hit_points_ratio layer from the screen tensor.
[12] Note that similar to health it does not give information about the ration of the shield to its maximum value, this information can be derived from the unit_shields_ratio layer from the screen tensor.
[13] Note that similar to health it does not give information about the ration of the shield to its maximum value, this information can be derived from the unit_energy_ratio layer from the screen tensor.
[14] Note that for Zerg this value does not have any meaning, since the way it's calculated it's always increasing.
[15] Note that the value is not the same as for the total_value_units fields: https://blizzard.github.io/s2client-api/sc2__score_8h_source.html, line 109f., verified 2018-03-29.
[16] Note that the value is not the same as for the total_value_structures fields: https://blizzard.github.io/s2client-api/sc2__score_8h_source.html, line 109f., verified 2018-03-29.