• TheRealSally

How AI defeated the World Champions of Arcade & Board Games

Learn How AI agents outperformed human prodigies in some of the famous board or video games using Reinforcement Learning.

-Written by Gautam

A Robot and a Human playing Chess
“The field of Artificial Intelligence is set to conquer most of the human disciplines; from art and literature to commerce and sociology; from computational biology and decision analysis to games and puzzles.” ~Anand Krish

I still remember that when I was a child, I was fond of playing chess on my computer. I loved watching the grandmasters of this game compete. One day I watched a 2003 documentary film called Game Over: Kasparov and the Machine which was about an IBM Supercomputer called Deep Blue who defeated Garry Kasparov, the highest-rated chess player in history (at the time) and the World Champion for 15 years. The game was played in 6 rounds, Deep Blue won with a score of 3½–2½ ( F in the comment section for Kasparov ). At that time I couldn’t understand how a simple human-made entity can outperform a human himself, how can a lifeless thing posses Intelligence!

I then searched the internet for understanding the under-the-hood logic and implementations of such things. It was that time when I came to know about the term called Reinforcement Learning.

Let us first look into what reinforcement learning is and how it is implemented.

What is Reinforcement Learning?

Reinforcement Learning is an aspect of Machine learning where an agent(a virtual player) is trained based on rewarding desired behaviours and/or punishing the undesired ones. The learning method has been adopted in artificial intelligence (AI) as a method of directing unsupervised machine learning through rewards and penalties.

The major areas in which Reinforcement Learning is commonly used are operations research, game theory, simulation-based optimization, swarm intelligence, statistics, genetic algorithms, etc.

Difference between RL and other branches of ML

Let us try to understand it with an example, suppose our RL agent is learning to play Mario.

In this our agent performs some actions (move left, move right, jump, duck, etc) and based on the actions performed by him on the current state, the environment (i.e the game) rewards him (a negative value for dying, a positive value for collecting coins, power-ups, etc), the rewards of each state and action are different based on some value function. The general strategy/policy of the agent is to maximize the reward until the end of the game and don’t die while doing it 😂 .

Reinforcement Learning in Mario

Note that I have highlighted some terms in the previous paragraph, these terms are the main sub-elements of RL.

Now let us look at a formal and general definition of these core sub-elements.

Elements of Reinforcement Learning

Venn Representation of the Elements of Reinforcement Learning
  1. Policy — is a mapping from perceived states of the environment to actions to be taken when in those states. The policy is the core of a reinforcement learning agent in the sense that it alone is sufficient to determine behavior. It may be stochastic, specifying probabilities for each action.

  2. Agent — The agent in RL is the component that decides what action to take. To make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has.

  3. Rewards — On each time step, the environment sends to the reinforcement learning agent a single number called reward. The agent’s sole objective is to maximize the total reward it receives in the long run. The reward signal thus defines what are the good and the bad signals for the agent. It may be a stochastic function of the state and action.

  4. Value Function — specifies, roughly, the value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. Whereas rewards determine the immediate, intrinsic desirability of the environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow and the rewards available in those states. For example, a state might always yield a low immediate reward but still have a high value because it is regularly followed by other states that yield high rewards, or the reverse could be true.

  5. Model of the environment — this mimics the behaviour of the environment, which allows inferences to be made about how the environment will behave. For example, given a state and an action, the model might predict the resultant next state and next reward. Methods for solving reinforcement learning problems that use models are called model-based methods, as opposed to simpler model-free methods, trial and error learners.

So basically after all these elements are defined, the RL agent is then tested on the environment so that it can learn and reach human-level efficiency (and maybe beyond that!).

A graph b/w expected reward with time(no. of episodes ) for Pac-Man using Deep-Q Network

Now that we have understood quite a bit about the theoretical aspects of reinforcement learning, we shall now see how some AI agents created by some organizations defeated the scores of world record holders!

AI Agents breaking Records in popular Games

After the introduction of reinforcement learning, many organizations around the world started making AI agents that can compete in an environment. Thanks to the modern CPUs & GPUs with extraordinary processing power and efficient libraries as well as frameworks in python3, we can now create agents which after training can outperform humans in a broad spectrum of tasks.

And many of them performed even beyond their expectations and made history!

A bully robot

Now let us explore the achievements of AI in some famous competitive games.

1) Pac-Man :

Pac Man is possibly the world’s most recognized video game, I bet that every one of you who is reading this article must have played this famous game once in your lifetime, be it on an arcade machine or your mobile phones or PC, etc. For those of you who don’t know what Pac-Man is:

It is a maze chase video game. wherein the player controls the eponymous character (Pac-Man himself) through an enclosed maze. The objective of the game is to eat all of the dots placed in the maze while avoiding four ugly colored ghosts! Sounds quite easy, but it isn’t 😇.

Pac-Man Gameplay

It was rated as one of the hardest games for an AI to beat, but an AI from Microsoft’s Maluuba team — a Canadian deep learning startup the company acquired earlier this year — has now scored the maximum score possible of 999,990 in the Atari game, beating the human record by approximately four times. Honest confession, my high-score couldn’t even cross the three-digit mark.

This was achieved using a method of reinforcement learning called Hybrid Reward Architecture. The team taught 150 AI agents to work together in parallel to master the game. Agents were rewarded for eating pellets while other agents avoided ghosts. A top agent then got feedback from the others and used a weighted average to make decisions.

This is insane!, right? Well buckle up, there is more to come!

2) Go:

Go is an abstract strategy board game (19 * 19 board) for two players, in which the aim is to surround more territory than the opponent. The game was invented in China more than 2,500 years ago(the only creation of china that lasted this long )and is believed to be the oldest board game continuously played to the present day. More than 46 million people in the world know how to play this game.

The game of Go has 3³⁶¹ possible states, which is bigger than the square of the number of atoms in this universe! (but smaller than the number of video games I wish I could have 🙁). At any point of time, this makes it the most challenging classical game for artificial intelligence due to its complexity and our limitations of computational power.

A London based organization called DeepMind Technologies which was later acquired by Google created a computer program named AlphaGo, it competed against legendary Go player Mr Lee Sedol, the winner of 18 world titles, who is widely considered the greatest player of the past decade. AlphaGo defeated him by clear domination of 4–1.

AlphaGo’s victory in Seoul, South Korea, in March 2016 was watched by over 200 million people worldwide, which is a record in itself! This landmark achievement was a decade ahead of its time (well, someone is giving a neck to neck competition to Tony Stark!).

Lee Sedol vs AlphaGo ( 2nd match in Seoul )

AlphaGo uses a Monte Carlo tree search algorithm to find its moves based on knowledge previously “learned” by machine learning, specifically by an artificial neural network (a deep learning method) by extensive training, both from human and computer play. A neural network is trained to predict AlphaGo’s move selections and also the winner’s games. This neural net improves the strength of tree search, resulting in a higher quality of move selection and stronger self-play in the next iteration.

After reading this, you might think that after all, it is rather just a very good computer program that mimics humans based on past experiences and training, why are we calling it “Intelligent”? Well, for that read the next paragraph!

In the second game between AlphaGo and Lee Sedol, AlphaGo made a move (move 37) that no human ever would, players all over the world who were watching the game live commented that they would never make such a move if they were in his shoes. But this move turned out to be remarkably beautiful. As the world looked on, the move so perfectly demonstrated the enormously powerful and rather mysterious talents of modern Artificial Intelligence.

This confirms the fact that today’s world artificial intelligence is more than just a computer program, it is pure intelligence, and probably in the future may be better than humans in all aspects. This is fascinating as well as frightening at the same time!

Well, I don’t know about you, but I wouldn’t like it if, in the coming future, companies started hiring AIs and I become jobless! 😆 That situation would be quite similar to what you see beneath.

An AI robot kicking a human


To sum it up, we first tried to understand what reinforcement learning is and how it is different from the rest of the branches of machine learning, we then gave a glance over the elements of RL and how an agent is trained on a particular environment. Then we dived into the various popular games in which AI agents performed beyond human limits and how they are different from a basic computer program. Though current AIs are very far from actual human intelligence, still they are progressing with a commendable rate. Reinforcement learning is becoming more popular today due to its broad applicability to solving problems relating to real-world scenarios.


© 2020 by Sally Robotics. Made with      @ BITS Pilani

  • Mail
  • Medium
  • LinkedIn
  • Twitter
  • Facebook