Neural network learns Minecraft with YouTube videos

The fact that artificial intelligence can play games is not new. Chess algorithms in particular have already become well known, but an AI is also successful in StarCraft 2. Now for the first time, OpenAI succeeds in having an AI make a diamond pickaxe in Minecraft, like first SingularityHub reported.

The first AI-generated diamond pickaxe

The AI ​​company from San Francisco is not taking the first AI steps in the world of blocks, but previous attempts by other developers have not necessarily gotten very far. The difficulty lies in the non-linear gameplay with a procedurally generated game world without clear goals – very different than on a chess board or a StarCraft map. In a blog entry, the nine-strong development team now describes how a neural network was created on the basis of around 70,000 hours of Minecraft videos on YouTube. Many short shots show the AI ​​in action – for example when climbing around, hunting for food, melting down ores or crafting objects at the workbench.

The special feature: All of this happens in survival mode, whereas earlier Minecraft AIs were often in creative mode or worked in specially adapted, simplified input environments. For example, in the AI ​​competition MineRL, in which none of the 660 participating teams got as far as mining diamonds in 2019.

To the best of our knowledge, there is no published work that operates in the full, unmodified human action space, which includes drag-and-drop inventory management and item crafting.


OpenAI, meanwhile, simulates classic mouse and keyboard inputs just like a real gamer would. However, the latter would not like the refresh rate used: The algorithm simulates with only 20 FPS in order to get by with less computing power.

Video PreTraining keeps Nvidia Volta and Ampere busy for days

The paper on the project is freely accessible, as is the code via GitHub. The researchers describe the development of their artificial intelligence in detail. They call their methodology Video PreTraining. First, around 2,000 hours of Minecraft videos were manually logged, which actions the player is performing at any given moment – i.e. which keys he presses, how he moves the mouse and what he actually does in the game. This trained an inverse dynamic model that could produce corresponding descriptions for 70,000 more hours of Minecraft videos.

Overview of the Video PreTraining methodology (Image: OpenAI)

Basically, an algorithm was first written that tries to recognize which of the player’s inputs led to the sequence shown in the game scenes fed to it. Individual actions were isolated and the gameplay before and after was considered. This is much easier for the AI ​​than just classifying what is happening on the basis of the past – the algorithm in their model does not have to first try to predict what the player is actually planning to do in the video, the researchers explain.

In order to utilize the wealth of unlabeled video data available on the internet, we introduce a novel, yet simple, semi-supervised imitation learning method: Video PreTraining (VPT). We start by gathering a small dataset from contractors where we record not only their video, but also the actions they took, which in our case are keypresses and mouse movements. With this data we train an inverse dynamics model (IDM), which predicts the action being taken at each step in the video. […]

We chose to validate our method in Minecraft because it first is one of the most actively played video games in the world and thus has a wealth of freely available video data and second is open-ended with a wide variety of things to do, similar to real-world applications such as computer usage. […]


A total of 32 Nvidia A100 graphics accelerators were used to learn which inputs belong to actions – the Ampere GPUs still needed around four days to view and process all the material. The independent logging of the 70,000 hours of Minecraft videos took even longer at around nine days, although a full 720 Volta Tesla V100 GPUs were used.

Behavorial Cloning brings the first stone pickaxe

The result: The neural network manages to walk around in Minecraft, cutting down trees, processing the wood and even making a workbench. But that didn’t go far enough for the OpenAI developers; with the concept of behavioral cloning, the AI ​​should be further optimized. The fine-tuning was to feed the algorithm with more specific gameplay in order to train it for a specific behavior – for example building a small house or the typical course of the first ten minutes of the game. A100 graphics cards were used again, but this time there were only 16 of them. It took them about two days to do the calculations.

Impact of behavioral cloning tweaking
Impact of behavioral cloning tweaking (Image: OpenAI)
As the number of hours increases, the skills of the AI ​​increase
The skills of the AI ​​increase with the number of hours (Image: OpenAI)

And indeed: The researchers observed significant improvements in their AI. For example, felling the first tree only took about an eighth of the time. The efficiency of crafting wooden planks increased by a factor of 59; when making a workbench it was even a factor of 215. In addition, the neural network now made wooden tools, quarried stone and then made a stone pickaxe. For this purpose, the algorithm was fed with around 10,000 hours of video – for the wooden tools, on the other hand, 100 hours of training was enough.

Reach your goal with reinforcement learning

As the next and last step, the OpenAI team finally chose further fine-tuning with reinforcement learning. The training was now even more specific, and the AI ​​was given a clearly defined goal: it should make a diamond pickaxe. Apparently this included finding, mining and melting down iron ore, making an iron pickaxe and ultimately searching for diamonds. Thanks to the targeted training, the neural network was able to do all of this even faster than the average Minecraft player: the diamond pickaxe was ready after around 20 minutes.

The diamond pickaxe was only possible with fine tuning and reinforced learning
The diamond pickaxe was only possible with fine tuning and reinforced learning (Image: OpenAI)

One difficulty was the intermediate goals: If you gave the AI ​​the task of looking for iron with a stone pickaxe you had just made, you dug it straight down after crafting and left the workbench behind – this took longer to later produce the iron pickaxe. As a result, another intermediate goal was necessary: ​​dismantling and picking up the crafting table; in addition, different goals had to be weighted differently.

Once again, processing the necessary gameplay steps required a lot of computing power: 56,719 CPU cores and 80 GPUs were busy with machine learning for around six days. The algorithm went through a total of 4,000 successively optimized iterations, with a total of 16.8 billion frames being read and processed.

Further application in and outside of Minecraft conceivable

The nine researchers see great potential in the newly tested Video PreTraining methodology. The concept can easily be transferred to other situations, it is said, and applications outside of video games are also conceivable. The advantage here is that the AI ​​was designed to handle mouse and keyboard inputs. For the further development of the neural network, however, the MineRL competition 2022, which starts on July 1, is also being used. The participants can use the OpenAI model to train the artificial intelligence for other special goals – perhaps this will soon lead to the first netherite pickaxe.

VPT paves the path toward allowing agents to learn to act by watching the vast numbers of videos on the internet. Compared to generative video modeling or contrastive methods that would only yield representational priors, VPT offers the exciting possibility of directly learning large scale behavioral priors in more domains than just language. While we only experiment in Minecraft, the game is very open-ended and the native human interface (mouse and keyboard) is very generic, so we believe our results bode well for other similar domains, eg computer usage.

We are also open sourcing our contractor data, Minecraft environment, model code, and model weights, which we hope will aid future research into VPT. Furthermore, we have partnered with the MineRL NeurIPS competition this year. Contestants can use and fine-tune our models to try to solve many difficult tasks in Minecraft.


Was this article interesting, helpful or both? The editors are happy about any support from ComputerBase Pro and disabled ad blockers. More about ads on ComputerBase.

#Neural #network #learns #Minecraft #YouTube #videos

Leave a Comment

Your email address will not be published.