
Reasoning in Stratego must be done over a large number of sequential actions with no obvious insight into how each action contributes to the final outcome.įinally, the number of possible game states (expressed as “game tree complexity”) is off the chart compared with chess, Go and poker, making it incredibly difficult to solve. The AI techniques that work so well in games like Texas hold’em don’t transfer to Stratego, however, because of the sheer length of the game – often hundreds of moves before a player wins. The need to make decisions with imperfect information, and the potential to bluff, makes Stratego more akin to Texas hold’em poker and requires a human-like capacity once noted by the American writer Jack London: “Life is not always a matter of holding good cards, but sometimes, playing a poor hand well.” The machine learning approaches that work so well on perfect information games, such as DeepMind’s AlphaZero, are not easily transferred to Stratego.

This is in stark contrast to games of perfect information such as chess or Go, in which the location and identity of every piece is known to both players. The identity of an opponent's piece is typically revealed only when it meets the other player on the battlefield. Right: A game in play, showing Blue’s Spy capturing Red’s 10.

The two pale blue areas are “lakes” and are never entered. Notice how the Flag is tucked away safely at the back, flanked by protective Bombs. In battles, higher-ranking pieces win, except the 10 (Marshal) loses when attacked by a Spy, and Bombs always win except when captured by a Miner. The types of pieces and their rankings are shown below. Since both players don't have access to the same knowledge, they need to balance all possible outcomes when making a decision – providing a challenging benchmark for studying strategic interactions. Both players start by arranging their 40 playing pieces in whatever starting formation they like, initially hidden from one another as the game begins. Stratego is challenging for AI, in part, because it’s a game of imperfect information. And it’s a zero-sum game, so any gain by one player represents a loss of the same magnitude for their opponent. It’s a game of bluff and tactics, of information gathering and subtle manoeuvring. Stratego is a turn-based, capture-the-flag game. Our paper shows how DeepNash can be applied in situations of uncertainty and successfully balance outcomes to help solve complex problems.

In pursuit of our mission of solving intelligence to advance science and benefit humanity, we need to build advanced AI systems that can operate in complex, real-world situations with limited information of other agents and people. The value of mastering Stratego goes beyond gaming. For this reason, DeepNash goes far beyond game tree search altogether. It also means that a very successful AI technique called “game tree search”, previously used to master many games of perfect information, is not sufficiently scalable for Stratego. This complexity has meant that other AI-based Stratego systems have struggled to get beyond amateur level. Unlike chess and Go, Stratego is a game of imperfect information: players cannot directly observe the identities of their opponent's pieces. So hard, in fact, that DeepNash has reached an all-time top-three ranking among human experts on the world’s biggest online Stratego platform, Gravon.īoard games have historically been a measure of progress in the field of AI, allowing us to study how humans and machines develop and execute strategies in a controlled environment. Its play style converges to a Nash equilibrium, which means its play is very hard for an opponent to exploit.

Published in Science, we present DeepNash, an AI agent that learned the game from scratch to a human expert level by playing against itself.ĭeepNash uses a novel approach, based on game theory and model-free deep reinforcement learning. Stratego, the classic board game that’s more complex than chess and Go, and craftier than poker, has now been mastered. Game-playing artificial intelligence (AI) systems have advanced to a new frontier. DeepNash learns to play Stratego from scratch by combining game theory and model-free deep RL
