Throughout history of human societies games have played various roles, be it ritualistic, educational or more recently, entertaining. No matter the genre, format, or ultimate function, they always have had one thing in common: playing them involved us, humans. Creating an artificial actor that could imitate real players for years was only a theoretical challenge- an imaginative exercise that occupied the minds of scholars such as famed Alan Turing, who in 40’s and 50’s worked on a chess algorithm using just pen and paper. However, thanks to breakthrough research taking place in the last several decades such concepts are no longer hypothetical. With the advent of processors and the progress of data science (particularly computer vision and machine learning techniques), non-human actors have been trained to compete and, in an increasing number of cases, win against their homo sapiens counterparts. The problem at hand is not by any means restricted to the realm of offline or online entertainment. The benefits of creating AI players are not just about designing a bot that might invent new gaming strategies or be a fun opponent to play against. On the contrary, training such models gives us deep insights into the modern science of algorithms. It also proves that the reinforcement learning approach, a foundation of AI, really works and might be applied to other fields of the economy and technology. How did this revolution come about? Let’s take a broad look at the major stages that have led to the current state of AI in games and gaming.
Reinforcement learning (RL) in gaming: the basics
Over the last few years, AI has shown to be very effective in mastering various human-created games. The most popular technique used for creating models that can adopt rules provided and participate in a game is reinforcement learning (RL). Simply put, RL is a technique that lets the computer play its moves over and over and figure out how to be good at it, without strictly introducing the rules of the game and relying only on the information whether the tried strategy turned out to be successful or not. This approach resembles a situation where somebody figures out the rules of the game by observation only. It usually starts with a computer taking completely random actions and analyzing their effectiveness ex-post. The aim of that is to allow the algorithm to explore many different strategies and have some initial knowledge on which moves seem to be desired and which are not. Over time however the chance of making a random move gradually decreases. But what does it mean when the move is not random in the first place? When the AI makes a move, the algorithm calculates which decision seems to be the best at a given moment. The calculations are made based on the experience the model has gained so far (the more games the AI model has played, the more accurate the results should be). The not random option is about choosing the action which seems to be best according to the calculations – it allows the model to evaluate the best strategy it has learned. On the other hand, the goal of making random moves, even in the late stage of training the model, is to allow it to explore new strategies. The latter may turn out to be the preferred ones.
AI vs thousands of years old games
This technique can be used to analyze any game that has rules- it applies to video, sport, card or board games equally. When it comes to classic games, the most famous achievements of modern computer science regard Go and chess.
Games are very complex and have a reach history, which initially posed a huge challenge for AI researchers and engineers. However, in the last decade scientists created reinforced learning engines that turned out to be up for the given tasks. In just 2 years AI managed to beat world class champions in both chess (2015) and Go (2017) respectively. The latter one became a major event in countries such as China and Korea and it’s speculated that the shock that the unexpected win by the computer over human players has led there to an increase in spending on AI related to AI R&D and education.
Computer killed the videogame star
Things get even more interesting in the case of the video games. In 2013, a new algorithm was released. Its features allowed it not only to play games on the Atari console, but to do so with very good results.
What’s interesting about this application is that the algorithm taught itself from scratch. The only information it had about the game was a recording of a gameplay of just one match. By observing the competition, the code had learned to make decisions based solely on what was being displayed on the screen. A different, yet of a similar kind of algorithm managed to learn to play another game, Dota 2. It also was able to do it fast and quickly became so good that it beat a professional e-gaming team.
Enter the sandbox
However, in some game genres the goal is not to beat an enemy or even achieve the highest possible score. Of the best examples is Minecraft. Minecraft is one of the most popular games in the history of the industry.
It’s most frequently chosen gameplay is the so-called survival mode. In this feature there are no prescribed tasks or obligations which makes it every time a non-linear, unique sandbox experience. As a player you can do various things which involve collecting resources, crafting items, building various constructions, interacting with other players and much more. This brief description seems to be sufficient to convince anyone, that the learning approach which I had previously described has no obvious application here – in the end we don’t know which actions are good or bad.
OpenAI has presented a very interesting approach for creating a simulation of a real-life computer player with use of machine learning. The goal is really ambitious – the authors of the algorithm aim to make it mimic human behavior. Again, the observational method was selected as researchers let the program learn game rules and strategies from watching online video streams of actual games played by real people.
How to outsmart one’s author,
But what does it mean in the technical sense to make the AI recreate in-game human thinking? There are three main steps required to accomplish this task. The first one is to make the program learn to predict next actions (which consists of used keyboard keys and mouse movement) based on current video frame. In order to do this, a set of 2000 hours of recorded gameplay and actions was prepared. This data was generated and provided by the contractors for this purpose alone.
The second stage required creating correct categories of content for the AI to learn from. The algorithm designed to execute this task was used for the subsequent step which was about accurately labeling videos from the internet. In this case, labeling means predicting what action was made in each frame of the recordings, which altogether accounted for over 70000 hours. This phase resulted in creating a dataset with a huge number of frames and actions made in each of them.
The final step was the most complex and required adopting this set to create an agent which decides what possible action to make in every frame. For this task, researchers applied the behavioral cloning technique. Its premise is to estimate the probability of choosing each possible action based on all the moves from the past frames.
The algorithm yet again turned out very effective. The agent it created had successfully learned how to execute various actions such as collecting resources, crafting, hunting and even… stealing items from loot chests in virtual villages. The feat was truly massive: some of the meta-tasks it took required over 20000 actions in order to be completed (one of which included crafting of a magical diamond pickaxe).
Off to the offline?
The spectacular (and somewhat unexpected) effectiveness of the OpenAI algorithms will most likely not be restricted to the video games. As with many digital innovations, breakthrough models will be tested against problems outside of virtual environment. The insights might lead researchers to explore new ideas on solving real-world problems. This is precisely one of the reasons why researchers select Minecraft – a sandbox game with a huge variety of actions but it is not goal oriented for their experiments in the first place. As the authors point out, videogames are only one of possible applications of those algorithms. But why wouldn’t we think of some more sophisticated and more futuristic (as in: less realistic) applications? Afterall, even though constructing industrial humanoid robot is extremely complex, the parallels to creating bots that mimic human behavior in Minecraft are too striking to ignore.
Author: Grzegorz Biały – a final year Master’s Degree student of mathematics at University Of Warsaw. Before joining IDEAS NCBR as an intern, he worked as a data scientist in banking and online gambling industry. His biggest hobby is music.
Images: Youtube.com (DeepMind channel), author’s archives, Adobe Stock.