Google Deepmind unveils Genie 3, an AI model for generating interactive worlds

Google Deepmind has announced Genie 3, a new AI “world model” capable of generating interactive, three-dimensional environments from text prompts. According to the company’s official post, users can navigate these dynamic worlds in real time.

The model generates environments at a resolution of 720p and runs at 24 frames per second. Google Deepmind states that these generated worlds maintain consistency for several minutes, with a visual memory of about one minute. This means that if a user looks away from an object and then looks back, the object should remain in its original state and location. This is an increase from the 10 to 20 seconds of interaction possible with the previous version, Genie 2.

New capabilities and applications

A key feature introduced in Genie 3 is “promptable world events.” This allows users to modify the environment in real time using additional text commands after the initial world has been created. For example, a user could change the weather conditions or introduce new objects and characters into the scene.

Google Deepmind positions world models as a crucial technology for training AI agents. The company reports it has tested Genie 3 by having its SIMA agent, a generalist agent for 3D virtual settings, pursue goals within the generated worlds.

Current limitations

While the technology shows progress, Google Deepmind is transparent about its current limitations. The company has identified several key areas that are still under development:

  • Action Space: The range of direct actions an agent can perform within the world is currently constrained.
  • Multiple Agents: Simulating complex interactions between multiple independent agents remains a research challenge.
  • Geographic Accuracy: Genie 3 cannot yet simulate real-world locations with perfect accuracy.
  • Text Rendering: Legible text is often only rendered correctly if it was part of the initial prompt.
  • Interaction Duration: The model supports a few minutes of continuous interaction, not extended sessions.

Genie 3 is not being released to the public. Instead, it is available as a limited research preview for a small group of academics and creators. Google states this approach allows them to gather feedback and better understand the technology’s risks in a controlled manner.

Related posts:

Stay up-to-date: