Three questions and answers: Is NetHack the perfect AI game?

NetHack is an extremely complex and unforgivable game: as soon as the game character dies, you have to start all over again. Whether you are a beginner or a very good player, the end of the game often comes as a surprise. To find out if Deep Reinforcement Learning NetHack has grown, AI researchers hosted the NetHack Challenge last year. Classic, hand-programmed bots and AI agents competed against the game. Tim Rocktäschel and Heinrich Küttler explain why the game is such a good benchmark for developing and testing AI.




Tim Rocktäschel is Associate Professor at University College London (UCL) Artificial Intelligence Center and Scholar of the European Laboratory for Learning and Intelligent Systems (ELLIS). His research group deals with autonomous and self-motivated learning agents in complex environments.




Heinrich Küttler is part of the founding team of Inflection AI after positions at Facebook AI Research, DeepMind, Google and a doctorate in mathematical physics at the LMU Munich.

Why couldn’t any of the agents level up NetHack and beat the game?

Tim: Artificial intelligence has made unimaginable advances over the past decade. Computer games such as StarCraft II, Dota 2 or Minecraft and board games such as chess, Go and Diplomacy were often used as milestones for testing intelligent behavior. NetHack is one of the hardest computer games in history. Not only does it cause headaches for human players and lead to frequent character deaths, but it also presents many challenges that the current AI cannot solve yet. The game is highly complex and includes hundreds of items and monsters that a successful player needs to know about. Unlike chess and go, it is only partially observable – similar to poker, players have to develop an idea of ​​what is probably real. NetHack is extremely long, a successful game can easily contain 100,000 actions. The game is stochastic, like Dungeons & Dragons and many other RPGs, the outcome of actions is often left to chance. And finally, it’s procedurally generated. Each game leads to different, new situations that the player has to adapt to.

Although NetHack is visually simple, we believe that AI methods that NetHack can learn will also be exciting for real-world problems. For example, it is an open problem to develop AI methods that can handle unforeseen situations in a robust manner. The fact that neither programmed bots nor trained deep reinforcement learning agents have managed to get particularly far in this game shows us that there is still a lot to explore in AI and that NetHack will remain an exciting milestone for the foreseeable future.

The alternative plan to beating the game in the Challenge was scoring points. So, were the winners of the competition specifically geared towards optimizing the high score?

Tim: Yes, and that caused problems. Collecting points leads to local optima in which AI methods can be trapped. At the end of the day, however, NetHack isn’t about points, it’s about achieving the goal of the game. There are some professional players who try to reach the game goal with a minimum score – which requires extraordinary skill. As soon as you give an AI points as a reward, it tries to maximize those points. But what if it makes sense to prioritize actions that temporarily do not lead to an increase in points? A classic example in NetHack is that good players carefully consider whether they really want to kill every monster along the way. More experience points and a higher player level mean that the game sends ever stronger monsters towards the character. Bad if you haven’t yet found the right gear to face these new threats.

What better solution is there when reward-oriented machine learning is problematic in such cases?

Heinrich: Inherently curious agents could be a solution. With the model, the agent simply wants to experience something new and unforeseeable. One problem with this approach is the “noisy TV problem”. Should the agent see the white noise of a television, he might find that he cannot predict the precise development of the black and white dots and therefore be stuck in front of this noise forever. Incidentally, neuroscientists who want to research how the human brain works are faced with a similar problem. For example, Karl Friston asks why people don’t just sit in a dark room, where you can best predict future impressions – the room stays dark.

The problem with rewards is that someone has to define them. Often what you actually want is not what you say. This is similar to the grand vizier and the genie in Aladdin. Aladdin persuades the grand vizier to become a djinni because he is more powerful than he is. When Djinni fulfills the vizier’s wish, Jafar is trapped in a magic lamp; because even the greatest Djinni always has a lamp that makes him a servant. This is how classic fairy tales are reflected in modern AI research and people are still bad at really saying or programming what they want.

Tim and Heinrich, thank you very much for your answers. Most recently, the Video PreTraining method for Minecraft, in which OpenAI trained an AI with thousands of hours of human gameplay, made a name for itself. More on last year’s NetHack Challenge can be found on the competition website. The initiators also recorded their findings from the challenge in a paper.

In the “Three Questions and Answers” series, iX wants to get to the heart of today’s IT challenges – whether it’s the user’s point of view in front of the PC, the manager’s point of view or the everyday life of an administrator. Do you have suggestions from your daily practice or that of your users? Whose tips on which topic would you like to read in a nutshell? Then please write to us or leave a comment in the forum.


More from iX Magazine

More from iX Magazine


More from iX Magazine

More from iX Magazine


(psst)

To home page

#questions #answers #NetHack #perfect #game

Leave a Comment

Your email address will not be published.