Last month in Montreal, researchers huddled around a monitor at Maluuba, an artificial intelligence startup Microsoft acquired in January, to learn the answer to a minor mystery of computer science: What happens when you score a million points at classic Atari game Ms. Pac-Man? Such a question might seem to lack a certain urgency, considering the game and its original arcade version were released in 1982. But they would soon get an answer: An inhuman, machine-learning powered player they had built was chomping towards a seven-digit score.
The moment proved somewhat anticlimactic. “It just reset to zero, it was kind of disappointing,” says Rahul Mehrotra, a program manager at Maluuba, who was part of the small crowd. But the company’s researchers claim the guts of their bot that revealed the same algorithmic techniques that reached maximum possible score – 999,990 – could help machines master more complex tasks.
Ms. Pac-Man has been targeted by artificial intelligence researchers for years, but no player, human or otherwise, has ever scored so big. Mehrotra says software that can learn to balance the demands of dodging four ghosts, hunting down fruit, and eating pellets could also help office workers plot a path through their own maze of competing objectives. Maluuba is focused on long-term AI research and operates more or less independently inside Microsoft, but it has to pay its way. Mehrotra imagines ideas at work in the Ms. Pac-Man bot helping users of Microsoft’s sales and business tool Dynamics prioritizing sales leads, for example. That might not have the same nerd cachet as breaking the scoreboard on an Atari classic, but it could certainly be a lot more lucrative.
Atari games have become a popular testbed for researchers looking to try out ways machines could make sense of the real world. Google forked out hundreds of millions for UK startup DeepMind in 2014 after it demonstrated software that learned to play some Atari games better than an expert human, just by playing the game over and over again to discover how to rack up points. The same technique—called reinforcement learning—was at work in DeepMind’s Go champion-beating system, AlphaGo.
Maluuba’s engineers got fixated on Ms. Pac-Man because it was one of the games DeepMind and others have found reinforcement learning can’t figure out so easily. The game was created back in 1982 to be tricky. Experts at the original Pac-Man could literally play with their eyes shut by memorizing the maps and movements of the game’s monsters. In Ms. Pac-Man, the ghosts and fruit move around in unpredictable ways, forcing a player to constantly rethink what they’re doing.
Maluuba reached its historic high score by breaking up the problem. Instead of having one agent use reinforcement learning to try and digest all the game’s complexity into a single strategy, researchers created a crowd of more than 150 reinforcement learning agents that each work on how one element of the game—such as the fruit, pellets, or four ghosts—affect the score. Individual agents feed recommendations on what moves to make to a central decider, which pools their suggestions to determine what Ms. Pac-Man should do next.
For those following along at home, it’s still too early to cross Atari games off your list of things humans can still beat computers at. Maluuba’s modified reinforcement learning method isn’t expected to work so dramatically on other titles that are difficult for machines, such as platformer Montezuma’s Revenge, in which players explore an underground pyramid. It and some other hard games require players to make longer-term plans, which aren’t easily discovered by trial and error experimentation.
Maluuba’s new trick would also require some adaptations to be used on other games (or tasks). A human has to decide how to partition up a particular problem to the multiple agents that will work on it. And to take on Ms. Pac-Man, the software was given a feed of data describing the position of ghosts and other items on the screen. By contrast, DeepMind's Atari-playing software only needs to look at the pixels on the game’s screen, more like a human player.
Silvia Ferrari, director of Duke University’s Laboratory for Intelligent Systems and Controls, says that could make Maluuba’s approach difficult to apply to real world problems. (In January her lab claimed its Ms. Pac-Man bot had set a new record for a non-human, scoring 43,720.) One of the main motivations for work on machine learning is that it can let computers figure out how to tackle a new problem with minimal, or zero, adjustment.
Harm van Seijen, a research scientist at Maluuba, counters that needing to adapt the system somewhat to the problem in hand could be a positive. One drawback of having software learn complex tasks all by itself is that it can later be difficult to figure out why it behaves a particular way—a big deal if it's in charge of something like driving safely or deciding who gets a loan.
Van Seijen says a system made up smaller components that can be inspected individually can be more transparent. “It can give you more insight and control into how the decision is made,” he says. If Maluuba's Ms. Pac-Man bot does get reincarnated as a smarter version of the notorious Clippy, it shouldn't be able to keep any secrets.