Apple GAUDI: AI system turns sentences into 3D scenes

Image: Apple

The article can only be displayed with activated JavaScript. Please enable JavaScript in your browser and reload the page.

Apple shows the AI ​​system GAUDI. It can generate 3D interior scenes and provides the basis for a new generation of generative AI.

So-called neural rendering brings artificial intelligence into computer graphics: AI researchers from Nvidia show, for example, how 3D objects are created from photos, and Google relies on Neural Radiance Fields (NeRFs) for Immersive View or develops NeRFs for the representation of people.

So far, NeRFs have mainly been used as a kind neural storage medium for 3D models and 3D scenes used, which can then be rendered from different camera perspectives. This is how the frequently shown camera passages through a room or around an object are created. There are also first experiments with NeRFs for virtual reality experiences.

NeRFs could become the next level of generative artificial intelligence

But what if the ability to display photorealism and render from different angles could be used for generative AI? AI systems such as OpenAI’s DALL-E 2 or Google’s Imagen and Parti show the potential of controllable, generative artificial intelligence for images and graphics.

A first glimpse was offered in late 2021 by Google’s Dream Fields, an AI system that combines NeRF’s ability to generate 3D views with OpenAI’s CLIP’s ability to rate the content of images. The result: Dream Fields generates matching NeRFs for text descriptions.

Now Apple’s AI team is showing the generative AI system GAUDI, a “neural architect for immersive 3D scene generation”.

Apple GAUDI is a specialist for interiors

While Google, for example, is dedicated to the generation of individual objects with Dream Fields, the expansion of generative AIs to completely unrestricted 3D scenes remains an unsolved problem.

One reason is the restriction of the possible camera positions: While every possible sensible camera position can be mapped onto a dome for a single object, the sensible camera positions in 3D scenes are restricted by obstacles such as objects and walls. If these are not taken into account during generation, no usable 3D scenes are created.

Apple’s GAUDI model solves this problem with three specialized networks: A camera position decoder makes predictions for possible camera positions, ensuring that the output is a position that is valid for the architecture of the 3D scene.

Another decoder for the scene predicts a three-plane representation, providing a sort of 3D canvas on which the radiation field decoder draws the subsequent image using the volumetric rendering equation.


In experiments with four different datasets, including ARKitScences, a dataset of interior scans, the researchers show that GAUDI can reconstruct learned views while reaching the quality of existing approaches.

Video: Miguel Angel Bautista via Twitter

Apple also shows that GAUDI can generate new tracking shots through 3D interior scenes. The generation can be random, start from a source image or be controlled with a text encoder by entering text – for example “go through a corridor” or “go up the stairs”.

The quality of the videos generated by GAUDI is still low and full of artifacts. But with the AI ​​system, Apple lays another foundation for controllable generative AI systems that can render 3D objects and scenes.

A possible application: the generation of digital locations for Apple’s XR glasses. You can learn more about neural rendering in our DEEP MINDS KI podcast with Nvidia researcher Thomas Müller.

Sources: Github (project page), Arxiv (paper)

#Apple #GAUDI #system #turns #sentences #scenes

Leave a Comment

Your email address will not be published.