The researchers at OpenAI developed a team of video game bots under the project name OpenAI Five to play five-on-five competitive games against humans in Dota 2. Since then, the Dota 2 competitive esports scene has never been the same.
What is Dota 2?
Dota 2 is a strategy game played on a square map with two teams defending bases in opposite corners.
Each team’s base contains a structure called an ancient; the game ends when one of these ancients is destroyed by the opposing team. Teams have five players, each controlling a hero unit with unique abilities. Players gather resources such as gold from creeps, which they use to increase their hero’s power by purchasing items and improving abilities. Humans interact with the Dota 2 game using a keyboard, mouse, and computer monitor and make decisions in real-time, reason about long-term consequences of their actions, and more.
OpenAI Five and Dota 2
The OpenAI Five’s first public appearance occurred in 2017 at The International 2017. They faced off on a one-on-one game against Dendi and won the game. This shook the stage since Dendi was a much-celebrated professional in the gaming field. At The International 2018, OpenAI Five made another demonstration by playing in two games against professional teams, paiN Gaming and Big God. Although the bots lost both matches, the experience gained was critical to analyze and adjust OpenAI Five’s algorithm for future games.
On April 13th, 2019, the OpenAI Five bots played against The International 2018 champions OG, at a live event in San Francisco where the bots won a best-of-three series (2-0). This demonstrated that the system could learn to play at the highest level of skills. The same month, a four-day online event to play against the bots was opened to the public. There the bots played in a total of 42,729 games, winning all but 4,075 of them.
OpenAI says, ‘Our Dota 2 AI, called OpenAI Five, learned by playing over 10,000 years of games against itself. It demonstrated the ability to achieve expert-level performance, learn human-AI cooperation, and operate at an internet-scale.’
OpenAI Five Architecture Overview
According to the paper published by OpenAI, the bots uses a neural network that consists of a single layer 4096 unit LSTM(Long short-term memory) that learns to play the game. It says that complex multi-array observation space is processed into a single vector, which is then passed through a 4096-unit LSTM. The LSTM state is projected to obtain the policy outputs (actions and value function). Each of the five heroes on the team is controlled by a replica of this network with nearly identical inputs, each with its own hidden state. The networks take different actions due to a part of the observation processing’s output indicating which of the five heroes is being controlled. The neural network conducts actions via numerous possible action heads (no human data involved), and every head has meaning. For instance, the number of ticks to delay an action, what action to select – the X or Y coordinate of this action in a grid around the unit. In addition, action heads are computed independently. The AI system observes the world as a list of 20,000 numbers and takes action by conducting a list of eight enumeration values. Also, it selects different actions and targets to understand how to encode every action and observe the world.
Overview of the training system
According to the paper published by OpenAI, the training system consists of 4 primary types of machines shown in Fig(ii) above (Forward Pass GPU, Rollout worker, Controller and Optimizer). Rollouts run the Dota 2 game on CPUs. They communicate in a tight loop with Forward Pass GPUs, which sample actions from the policy given the current observation. Rollouts send their data to Optimizer GPUs, which perform gradient updates. The Optimizers publish the parameter versions to storage in the Controller, and the Forward Pass GPUs occasionally pull the latest parameter version. For further detailed information take a look at this awesome OpenAI Five paper
The bot had a single training run that ran from June 30th, 2018 to April 22nd, 2019. Over the course of training, it played games against numerous amateur players, professional
players, and professional teams in order to gauge progress. After ten months of training using 770 +- 50 PFlops/days of computing, it defeated the Dota 2 world champions in a best-of-three match. The Fig(iii) shows how the true skills of OpenAI grew over the course of training along with the compute(in PFlops/s-days). More about this can be found in the paper.
OpenAI calls this system of AI learning as ‘reinforcement learning’ since the bots learn over time by playing against themselves hundreds of times a day for months.
According to OpenAI co-founder and chairman Greg Brockman, who is also the organization’s Chief Technology Officer, OpenAI Five improves by playing itself in an accelerated virtual environment. “OpenAI Five is powered by deep reinforce learning, which means we didn’t code it how to play. We coded it how to learn.”
When Dota 2 game engine runs at 30 frames per second, OpenAI Five only acts on every 4th frame which they call a timestep. Each timestep, OpenAI Five receives an observation from the game engine encoding all the information a human player would see in monitor display such as units’ health, position, etc i.e. it basically sees the data. Based on these data and optimizing the resources it makes the strategic move.
OpenAI Five is one of many milestones in the field of reinforcement learning.
This project showed the amazing teamwork and collaboration between artificial agents to achieve goals. Let us hope that this research will one day find its application in something highly useful and surprising in our everyday life.