DeepMind: An algorithm that learns to play the strategist like a human expert | technology

UK company DeepMind, which has been owned by Google since 2014, has developed an algorithm capable of playing Stratego, a popular board game, like an expert human. As explained by a team of company researchers in an article published today in the journal Sciences, DeepNash (that’s how the tool was called) was ranked among the top three players on the Gravon portal specializing in online games for this game. This is a milestone due to the sheer complexity of the game, which combines elements of strategy, intuition (players don’t have all the information needed to make perfect plans), and even deception. The study’s authors believe the algorithm could have applications in areas such as automatic traffic optimization.

Marketed by Jumbo since the 1960s, despite being invented before World War I, Stratego was one of the few popular board games not yet perfected by AI. This strategy game has a double challenge: it requires long-term strategic thinking, like chess, but you also need to manage imperfect information, like in poker, because the opponent’s chips start out covered and are revealed as the game progresses. Departure. This uniqueness makes it a more complex game than Go, a thousand-year-old Asian game whose board allows tiles to be arranged into more different groups than there are atoms in the universe. It also means that winning requires more cunning than poker, where you don’t know your opponent’s cards either, and you need both intuition and mathematical knowledge.

Historically, gaming emulators have been a good thermometer for measuring the effectiveness of computer software. They provide a strictly governed environment in which tools can develop their capabilities and where it is easy to measure their success: just see if they win the game. It is an ideal test bed for studying how humans and machines develop and implement winning strategies. Hence, DeepMind set its sights on Stratego, which is quite a challenge for the device due to the lack of information it has to manage during the game.

See also  The Stop Harassment, Monetization, and Edit Button: This is How Elon Musk Can Improve Twitter | technology
In the strategy there are 12 types of chips with different attributes.  Each player places 40 tokens on the board, but does not know how the opponent places them.
In the strategy there are 12 types of chips with different attributes. Each player places 40 tokens on the board, but does not know how the opponent places them.Deep mind

DeepMind has a long history in this field, developing sophisticated tools to outperform man in complex long-term strategy games with perfect information, like Go (with AphaGo), but also in imperfect informational video games, like StarCraft (with AlphaStar). To date, no one has succeeded in developing a tool capable of playing strategy at the same level as an expert human being. It is no accident: the game has 10 possible arrangements, which is much better than Texas Hold’em poker, a game with incomplete information (each one only knows which cards are in his hand and which ones to play) well thought out, with 10 states, such as Go , the ancient Asian game, which contains 10 options.

On the other hand, any move made on the first turn involves thinking of 10 possible pairs of tile configurations. In poker it is 10⁶. This problem does not exist in perfect information games, because the tiles are in a visual position.

These two specific complexities make it impossible to draw on previous research for the game emulator approach for Stratego. For this reason, the DeepMind team developed a reinforcement learning algorithm that applies theoretical models based on the Nash equilibrium, a theory by the famous American mathematician who specializes in game theory. The tool does not attempt to predict the opponent’s possible moves, which is the usual approximation in game simulators, because the probability tree of a game that has just started is almost endless, but instead devises its own strategy and then adapts to it based on its walks.

See also  Paul B. Preciado: "No veo ninguna diferencia entre las feministas socialistas y el lenguaje nacionalcatólico del arzobispado y Vox"

“Our paper demonstrates how DeepNash can be applied in situations of uncertainty and successfully balance their actions to help solve complex problems,” explains Julien Perolat, lead author of the study. The scientist and his colleagues believe that R-NaD, the algorithm behind DeepNash, may be useful in developing new AI applications that involve interacting with many humans with different goals, which leads to a lack of information in the system. it will happen.

Large-scale traffic improvement management to reduce travel times and associated gas emissions appears to be a good application, Perolat and colleagues write in the journal Science.

In this play, the machine fools the human player, pretending to be a scout and managing to track down the spy, who is a key player.
In this play, the machine fools the human player, pretending to be a scout and managing to track down the spy, who is a key player.Deep mind

How to play strategy

Stratego is living a second youth thanks to the Internet. The now popular board game has made its way to forums like Gravon, where players from all over the world compete against each other in tense online matches.

In Stratego, two players take turns facing each other, and have 40 pieces with different attributes on their side of the board. The goal is to capture the opponent’s flag or to leave the opponent without moving pieces. To do this, players alternately apply their mobile tokens, which can be of ten types, corresponding to military ranks and specialists such as miners, scouts, or spies. Every time a token comes into contact with another opponent, they are both exposed. The person who wins, due to being of a higher rank or due to their special abilities, remains on the board; The loser is out of the game.

The DeepNash algorithm is capable of developing unpredictable strategies and executing rewarding moves in a seemingly random manner. All this is intended to confuse the opponent so that he cannot draw conclusions about the machine’s playing style. In one of the games reviewed in the article, for example, he sacrificed two important pieces to determine the location of the opponent’s highest-ranking pieces. This left him at a physical disadvantage, but the algorithm realized that having information about the location of the opponent’s best pieces gave him a 70% chance of success. In the end he won that match. On another occasion, he bluffed, chasing a high-ranked piece with a very low-ranked piece, leading the opponent to be convinced he was playing 10 (Marshal) and taking the spy (S), a strategic piece which he lost to Scout (2).

“The level of DeepNash’s play surprised me. I’d never seen a machine capable of playing Stratego like an experienced human. After playing against DeepNash myself, I wasn’t surprised that it later went on to put itself in the top three of the Gravon rankings.” said Vincent de Boer, co-author of the science article and champion Ex-World Stratego: “I think he would do well if they let him play the World Championships.”

You can follow country technology in Facebook s Twitter Or sign up here to receive The weekly newsletter.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button