Siby Abraham is a computer scientist specialising in artificial intelligence. He is an associate professor and head of the department of mathematics and statistics at Guru Nanak Khalsa College, Mumbai.
How many years does it take for a child that does not know anything about English to master it at a Shakespearean level? Assume that there is no one to teach the child, and that the child only knows about the fundamentals of English grammar to begin with. Suppose also that there is no book, no help and no support (human or non-human) at all times.
Recent research in artificial intelligence (AI) suggests that, if you replaced the child with a learning computer, the time taken could be in the order of a few days.
How would a computer be able to learn by itself? Look at how a child learns to ride a bicycle, for example. Hands on the handlebar, one or two fingers stretching over the brake lever, and legs over the pedals but able to stretch out enough to prevent a fall – she learns by herself. And if she learns the right thing, a pat on the back encourages her. If she picks the wrong strategy, a stern look of displeasure can discourage her, forcing her to look for a better way. Such iterated winning and losing cycles will guide the child to pick up cycling in a gradual and incremental way. This is called reinforcement learning, and it is one of the methods used by scientists working on machine-learning to teach a computer to teach itself.
A paper published in the journal Nature on October 18, 2017, reported a program that could learn the game Go by itself using reinforcement learning in a flat three days, and beat the world champion. Incredible, isn’t it? The researchers at DeepMind, a subsidiary of Google working on AI and who built the program, shared their findings on how it is possible to have superhuman intelligence without human knowledge.
Go is an ancient two-player game, around 2,500 years old. It is played predominantly in China, Korea and Japan. It has a simple set of rules that dictate play on 19 by 19 grid. There are two coloured stones: black and white. Players take turns to put down one stone at a time and try to occupy the maximum number of cells on the grid. An opponent’s stone can be ‘captured’ if it is surrounded by one’s own stones. The game ends when one party wins or both agree to a draw.
Despite this simplicity, Go is computationally complex. Since the board has 361 cells, the first player’s first ‘move’ has 361 possible outcomes. Then, the second player can put down her first stone in any one of the remaining 360 squares. So for the first pair of moves, there are 129,960 (361*360) possible different board positions. As the number of moves increases, the complexity increases at an exponential rate. Mathematicians have estimated that there are around 10170 valid board positions through a single game. This number is greater than total number of atoms in the universe. It is also far greater than the number of valid board positions in chess, around 2120. This is one of the reasons why chess was felled by AI in 1997, when the IBM program Deep Blue defeated the then world champion Gary Kasparov, while Go took another two decades.
The name of the program described in Nature is AlphaGo Zero. It is an updated version of AlphaGo Fan, which played the human player Fan Hui in October 2015. Another version, AlphaGo Lee, was used to defeat acclaimed Go player Lee Sedol 4-1 in March 2016. A third version, AlphaGo Master, whopped top-ranked player Ke Jie 60-0 in January 2017. AlphaGo Zero is the fourth and the latest in the AlphaGo pipeline, the efficiency of the program having improved by many orders of magnitude at each step.
Though there are four different variants, some fundamental things are common. The first is the use of artificial neural networks (ANN) as a strategy to tackle the problem. The ANN is a computational model that mimics biological neural networks. Broadly speaking, a neural network is an architecture that describes the parallel and distributed computing and decision-making capacities of the brain, which works by coordinating a large collection of simple devices called neurons.
Second: all the variants programs identify the effectiveness of a strategy by analysing what one’s opponent would do in response to one’s move. In this context, the program would visualise a tree of possible decisions, each branch splitting off to smaller branches depending on the opponent’s reaction. The terminal tip of each branch would represent a decision. Once such a decision tree has been created, computer algorithms can easily move through all the branches and identify the best counter-move at any point. The algorithms also prune unwanted branches of the tree and preserve the more important ones for future reference.
As the machine containing the program is trained, it produces one such tree for every game. So over millions of games, the machine will have access to a forest of trees, branches and decision nodes. Using this, it can better determine the best strategy for each game. This is what our brain also does when it is learning.
The programs were trained using supervised learning: they were expected to modify its parameters in order to get predetermined and preset results. Once the programs were taught with all possible variations of the known moves, they were deemed ready to play in a real environment with a human opponent.
What is important here is the domain knowledge – without which the first three AlphaGo siblings could not do anything. However, such knowledge is often very expensive to obtain and tends to be unreliable – and it was to obviate just such shortcomings that AlphaGo Zero was built. Zero used tabula rasa, the philosophy of starting from a clean slate: the program initiated a random move and started to play against itself. Like a child learning to ride a bicycle, the program received credit for good moves and black marks for bad ones.
This way, Zero mastered Go in three days. Specifically, it played 4.9 million games against itself, analysing over 1,600 moves before each one that it played and took about 0.4 seconds per move on average.
First, it started with random moves. In three hours, it was like a human beginner, focusing on immediate gains over the long haul. In 19 hours, it could understand the fundamentals of sophisticated strategies. In 70 hours, it became almost superhuman, tackling issues at multiple locations on the board at once.
The ELO rating, used to rate players in board games like Go, chess, etc., gives a stronger idea of Zero’s supremacy. AlphaGo Fan has an ELO rating of around 3,200; Sedol, 3,526; Ke Jie, 3,661; and Zero (after 40 days), 5,100. It is the best Go player we know – and it got here without using human knowledge.
Zero’s is significant because it is not restricted to a game. It demonstrates the significance of adopting the tabula rasa philosophy and learning in a way that humans don’t, rather can’t. This in turn suggests that it can help develop entirely new ways to tackle humankind’s more difficult problems (e.g., the prediction of the 3D structures of proteins based on DNA data).
At the same time, it is also bound to make us wary about the ‘rise of the machines’. At the end of a defeat in May 2017, Ke Jie had said, “Last year, it was still quite human-like when it played. But this year, it became like a god of Go.” In fact, Ke Jie went on to study the games played by the machine in detail, fine tuned his strategies and then beat top-ranked human Go players in 20 consecutive games.