Top researchers and companies in artificial intelligence, including Fei-Fei Li’s World Labs, Meta’s Yann LeCun, and Google DeepMind, are all promoting technology they call a “world model”. However, the term is being used to describe three fundamentally different approaches to building AI that can understand and interact with the world. An analysis by Entropy Town points out that this has led to growing confusion about what a world model actually is.
The term originally comes from cognitive science, describing the brain’s ability to build an internal simulation of reality to predict events and plan actions. In AI, the meaning has become ambiguous.
World Labs
World Labs, a startup led by Stanford professor Fei-Fei Li, has released a product called Marble. According to the analysis, this represents a “world model as interface”. Marble uses a technique called Gaussian splatting to turn text prompts or images into static 3D environments that a person can explore. It is primarily a tool for creating 3D assets for games or virtual reality, not a system for an AI to reason with.
Meta
In contrast, Yann LeCun, Meta’s chief AI scientist, uses the term to describe a “world model as cognition”. His approach focuses on building an internal, predictive brain for an AI agent. This model does not produce visuals for humans. Instead, it learns a compressed, internal understanding of the world to anticipate the consequences of actions and plan ahead. This concept is central to his research on Joint Embedding Predictive Architectures, or JEPA.
Google DeepMind
Google DeepMind occupies a middle ground with what Entropy Town calls a “world model as simulator”. Its Genie model can generate interactive, video-like worlds from a text prompt. In these simulated environments, an AI agent like DeepMind’s SIMA can train, learn about physics, and follow instructions. This turns the world model into a virtual training ground where an AI can learn skills before applying them in the real world.
These three distinct bets, an interface for humans, an internal brain for agents, and a simulator for training, all use the same label. The analysis notes the irony that funding for these advanced projects, often positioned as superior to language-only AI, was largely unlocked by the commercial success of large language models like ChatGPT.