Tech

Machine Beats Human at a Complex Boardgame – But Don’t Cheer Yet

Google Deepmind beat Lee Se-dol at Go three games in a row. The wins signal a triumph for its software but its hardware requirements present a bottleneck going ahead.

At the end of the third game of Go between Google's DeepMind AI and Lee Se-dol. The game ended with the AI's third successive victory and its clinching the series of five. Credit: Google DeepMind

At the end of the third game of Go between Google’s DeepMind AI and Lee Se-dol. The game ended with the AI’s third successive victory and its clinching the series of five. Credit: Google DeepMind

On February 10, 1996, a computer developed by IBM called Deep Blue defeated Garry Kasparov, then the reigning world champion, at a game of chess. Deep Blue was derived from Deep Thought, designed by a group of engineers out of Carnegie Mellon University, Murray Campbell and Feng-Hsiung Hsu among them. Before his defeat, Kasparov had declared that no computer would be able to beat him. After he was beaten 2-1 in a series of games by an upgraded version of Deep Blue in 1997, he didn’t relent. He wrote in an article (paywall) that he’d witnessed the computer display a “human sense of danger” during the games. According to him, his defeat at the hands of a robot was really a defeat at the hands of a robot that was turning human.

This was an odd claim to make. Unlike DeepMind – the software built by Google that plays the boardgame Go – Deep Blue didn’t exactly display artificial intelligence. It was just very good at doing one simple thing: searching. Engineers and chess players had together coded millions of possible chess moves into the computer and, during the game with Kasparov, it had to decide which moves – selected from the database of all moves – would defeat Kasparov’s and then play them. So it didn’t exactly strategise during the game but moved like a human player intent only on winning and with an exceedingly high recalling power.

DeepMind isn’t designed to play like this. The total number of moves possible in any game of chess is 10120 (the Shannon number) – a number far less than the total number of moves possible in a game of Go: 10761. Even the total number of legal ‘positions’ in Go is closer to 10170 according to a calculation performed earlier this year.

Even if we’re confused about its provenance, DeepMind’s achievement is an achievement de facto because we’re reassured by Go’s complexity.

In this context, looking for the ‘right’ move from a database containing 10761 moves would be too difficult. So, instead of trying to recall the most ‘winning’ move in a particular situation, DeepMind devises its own winning strategies and executes them. Google, its builder, doesn’t know what strategies DeepMind can come up with in the course of a match. In fact, in its third match against Go grandmaster Lee Se-dol, DeepMind executed a strategy of moves that experts have said no human could’ve conceived of and which eventually won the game for the software. Was that artificial intelligence?

There is an obvious difference between being able to access a lot of computing power and being artificially intelligent, and choosing between which kind of machine to build to solve real-world problems can appear to be a no-brainer at the outset. In our context: Deep Blue can’t learn to play Go but DeepMind can learn to play chess. However, would your decision change if particular goal-driven tests indicated that DeepMind consumes way more resources than Deep Blue does to solve certain problems? Moreover: humans are very good at solving problems, being much more efficient than the best computers at some of them, but is the human way the best way to solve a problem or simply the best way we know to solve a problem?

Feng, one of the engineers who’d built Deep Blue’s precursor, had predicted in 2007 that a brute-force machine, performing “intense analysis” a la Blue, would defeat a human Go champion by 2017. His prediction has come partially true – Lee Se-dol lost against DeepMind in the first game on March 8, 2016 – but DeepMind isn’t just a brute-force program. To understand why, let’s look at the secret behind Deep Blue’s success first.

A role for randomness

Floating brains. Credit: yumikrum/Flickr, CC BY 2.0

Credit: yumikrum/Flickr, CC BY 2.0

In a game of chess that’s already underway, Deep Blue’s next move is chosen from among a set of moves that all yield favourable results but such that the chosen move is the best. How is the ‘bestness’ determined? From each move on, Deep Blue evaluates a random sample selected from among all possible subsequent moves that could follow, and their consequences, in repeated cycles, visualised as a growing tree of options. Before Deep Blue was built, Feng and his colleagues found that in order to beat a human champion at chess, the computer wouldn’t have to visualise this tree until the endgame but only for it to stay a few steps ahead of how far the human would be thinking. It was also found that for every cycle ahead the computer plumbed, the more formidable it became. So in its games against Kasparov, Deep Blue was 12 cycles ahead, evaluating close to 200 million options/second (its component CPUs collectively allowed a theoretical maximum of 1 billion options/second).

However, given how many more potential moves there are at each step in Go than in chess, DeepMind working like Deep Blue would’ve only been a matter of accessing as much computing power as possible. And DeepMind’s creators partly sidestepped this problem by teaching the program to make better choices in each analysis. The full details are available in a scientific paper published by the creators in Nature in January 2016; the salient point is the presence of two decision-assisting parameters: policy networks and value networks. These are much easier to calculate than the identity of a winning move, and if their values were known, DeepMind would have a clearer idea of what to look for instead of having to go all guns blazing. So maximising the potential of these parameters allows DeepMind to cap the number of options it has to evaluate at each step as well as availing two more degrees of freedom along which to strike ahead.

Its makers say this way of ‘thinking’ about a problem is closer to how humans do it – and it might be so, too. David Auerbach, a software engineer based in New York, argues that randomness plays a big role when humans think up creative solutions to problems. A similar sort of randomness is suspected to have manifested in game 3 of DeepMind v. Se-dol (out of a total of five, four of which have been played), when the program executed a still-mysterious strategy and won. The role of randomness is to introduce an option that a series of logical evaluations wouldn’t have allowed for. As Auerbach writes,

Computer scientist and complexity theorist Melanie Mitchell, who studied under Douglas Hofstadter, has studied how computers can classify objects with the help of shape prototypes. These experiments use neural networks and a dictionary of ‘learned shapes’ to match new shapes against. In [2013], Mitchell made a startling discovery: If the algorithms’ dictionary was removed and replaced with a series of random shape projections to match against, the algorithms performed equally well. While intuition may suggest that the brain builds up representational archetypes which correspond to objects in the world, Mitchell’s research suggests that pure randomness may have an important role in the process of conceptualisation.

The extent to which these results speak to human analogising is unclear. But they open the possibility that our process of analogy making may be even less rational and more stochastic than we suspect, and that the deep archetypes we match against in our brain might bear far less relationship to reality than we might think.

The ‘0-to-1’ for learning machines

Fabien Giraud & Raphaël Siboni's Last Manoeuvres in the Dark is a networked field of 300 terracotta Darth Vader masks, perched on high sticks and aligned in a military formation like the Xian army. Credit: nadya/Flickr, CC BY 2.0

Fabien Giraud & Raphaël Siboni’s Last Manoeuvres in the Dark is a networked field of 300 terracotta Darth Vader masks, perched on high sticks and aligned in a military formation like the Xian army. Credit: nadya/Flickr, CC BY 2.0

And so, if human intelligence is marked by the incorporation of randomness, is DeepMind going to be the Skynet we deserve?

No. In fact, we still don’t know if that moment in game 3 wasn’t the result of a bug. If it was, it wouldn’t be surprising because a similar bug was detected in Deep Blue at the start of its tourney against Kasparov in 1997, and whose effects were mistaken for brilliance on the part of the computer. As Feng wrote in his book Behind Deep Blue (p. 224; 2002):

Garry, meanwhile, apparently was puzzled by Deep Blue’s last move in the game. The day after game two was a rest day, and when I surfed the web to catch up on the match news, I came across an article written by Frederic Friedel about what happened in the Kasparov camp came game one. Garry was perplexed by the move 44. … Rd1. Deep Blue played it as a result of a bug but Garry did not know that. So the whole Kasparov camp went into a very deep analysis on why the alternative move 44. … Rf5 was no good. In the end, they concluded that the reason why Deep Blue did not 44. … Rf5 was “It probably saw mates in twenty or more [moves].” I could not help but burst out laughing.

On the other hand, if DeepMind’s moves against Se-dol were not the result of a bug but an uncanny form of reasoning… As Gideon Lichfield wrote in Quartz, “A classic fear about AI is that the machines we build to serve us will destroy us instead, not because they become sentient and malicious, but because they devise unforeseen and catastrophic ways to reach the goals we set them. … What we call common sense and logic will be revealed as small-minded prejudices, baked in by aeons of biological and social evolution, which trap us in a tiny corner of the possible intellectual universe.”

The road to evoking this response, and leaping ahead of Deep Blue, was paved with new learning techniques. If Deep Blue was built to execute if-this-then-that moves, DeepMind was taught to learn using its neural networks. In 2015, it set off ripples in computer science circles when DeepMind taught itself to play 49 classic video games. On February 4, 2016, engineers working on it announced that the program had successfully navigated a 3D maze by looking its way through. According to Kevin Kelly, a tech journalist at Wired, a machine figuring out how it could learn to play a game was the 0-to-1 moment before devices could getter at solving problems sans human intervention. The next big leap on this front would be for machines to be able to learn without human supervision.

A power guzzler

But at what costs are these feats being achieved? Even if we’re confused about its provenance, DeepMind’s achievement is an achievement de facto because we’re reassured by Go’s beautiful complexity. Then again, it still only signals a partial victory because, with the wins over Se-dol, computer scientists finally have a powerful program that learns as it goes and is capable of navigating its way to a preferred solution – as the hardware still hulks.

If Deep Blue was built to execute if-this-then-that moves, DeepMind was taught to learn using its neural networks.

Replying separately to a question posed by Edge, David Dalrymple of the MIT Media Lab and Alexander Wissner-Gross, a scientist and inventor, pointed out a common aspect of AI advances made in 2015 that indicates a bottleneck. Dalrymple said, “Over the past few years, a raft of classic challenges in artificial intelligence which had stood unsolved for decades were conquered, almost without warning, through an approach long disparaged by AI purists for its ‘statistical’ flavour: it’s essentially about learning probability distributions from large volumes of data, rather than examining humans’ problem-solving techniques and attempting to encode them in executable form.” (Emphasis added.)

Similarly, Wissner-Gross replied, “the average elapsed time between key algorithm proposals and corresponding advances was about eighteen years, whereas the average elapsed time between key dataset availabilities and corresponding advances was less than three years, or about six times faster, suggesting that datasets might have been limiting factors in the advances. In particular, one might hypothesise that the key algorithms underlying AI breakthroughs are often latent, simply needing to be mined out of the existing literature by large, high-quality datasets and then optimised for the available hardware of the day.”

Of course, there is a confounding factor – which we can resolve in the future by determining if a learning AI can be built divorced of large databases. But in the meantime, those observing the rise and rise of DeepMind remain cautious about celebrating the presence of a new intelligence in their midst; ask yourself: how better or worse off would DeepMind be with the addition or subtraction of some computing power?

a. Results of a tournament between different Go programs. To provide a greater challenge to AlphaGo [the Go-playing instance of DeepMind], some programs (pale upper bars) were given four handicap stones (that is, free moves at the start of every game) against all opponents. Fan Hui is a human Go player. b. Performance of AlphaGo, on a single machine, for different combinations of components. c. Scalability study of [] in AlphaGo with search threads and GPUs, using asynchronous search (light blue) or distributed search (dark blue), for 2 s per move. Credit: doi:10.1038/nature16961

a. Results of a tournament between different Go programs. To provide a greater challenge to AlphaGo [the Go-playing instance of DeepMind], some programs (pale upper bars) were given four handicap stones (that is, free moves at the start of every game) against all opponents. Fan Hui is a human Go player. b. Performance of AlphaGo, on a single machine, for different combinations of components. c. Scalability study in AlphaGo with search threads and GPUs, using asynchronous search (light blue) or distributed search (dark blue), for 2 s per move. Credit: doi:10.1038/nature16961

Miles Brundage, a PhD student at Arizona State University, phrased it well: “We should keep track of multiple states of the art in AI as opposed to a singular state of the art, then comparing [the performance of different configurations of DeepMind against a player], is to compare two distinct states of the art – performance given small computational power (and a small team, for that matter) and performance given massive computational power and the efforts of over a dozen of the best AI researchers in the world.” According to the paper in Nature, one Go-playing variant of DeepMind used 1,202 CPUs and 176 GPUs (graphical processing units). According to one estimate, that’s around 100,000x the computing power Deep Blue commanded.

So, simply in terms of the software, DeepMind presents a significant advance and stands to be the first strategising, boardgame-playing AI that has beaten a human champion at Go. Irrespective of anything else, this is an awesome development because Go is the most complex existing game that doesn’t include hidden or random information in its gameplay and at which humans are were still the best players. However, before the techno-optimists can rally, DeepMind still has a long way to go to be comparable to humans to deliver the same output – work-hour for work-hour, watt for watt.