Waymo Launches Generative 3D World Model to Train Self-Driving Cars

What’s the story?

Waymo has announced the Waymo World Model, a generative tool built on Google DeepMind technology to create photorealistic 3D simulation environments for autonomous vehicle training.

Why it matters

By generating 3D simulations of safety-critical incidents, Waymo can evaluate how its autonomous system responds to complex challenges that are difficult to capture in the real world.

The bigger picture

Waymo’s adoption of generative AI architectures illustrates how spatial computing and 3D modeling are becoming central to the development and safety validation of autonomous vehicles.

In General XR News

February 12, 2026 – Autonomous driving technology company Waymo has recently announced the introduction of the Waymo World Model, a generative model designed to create large-scale, hyper-realistic 3D simulations for training and testing its Waymo Driver system.

Waymo stated that the model is built upon Genie 3, an advanced general-purpose world model developed by Google DeepMind that generates photorealistic and interactive 3D environments. Waymo has adapted the technology for the driving domain to generate high-fidelity, multi-sensor outputs, including both camera and lidar data.

Training Autonomous Vehicles on Virtual Encounters with Elephants

According to the company, most simulation models in the autonomous driving industry are trained from scratch based only on data collected on-road. This approach means that systems can only learn from limited experience.

However, with Genie 3’s world knowledge, which itself was gained from pre-training on an extremely large and diverse set of videos, Waymo is able to explore situations that were never directly observed by the company’s fleet. The Waymo World Model is able to generate virtually any scene, from regular, day-to-day driving to rare, long-tail scenarios, across multiple sensor modalities. This could include encounters with various wildlife, extreme weather events, natural disasters, as well as rare and safety-critical events. 

A simulation of an encounter with an elephant.

To learn from this information, the Waymo World Model utilizes specialized post-training to transfer world knowledge from 2D video into 3D lidar outputs. The system is even able to generate 4D point clouds, providing valuable precise depth signals.

The system provides three main mechanisms for simulation control: 

Driving action control allows for a responsive simulator that adheres to specific driving inputs.

Scene layout control allows for customization of road layouts, traffic signal states, and the behavior of other road users.

Language control is the most flexible tool, enabling adjustments to time-of-day and weather conditions, or even the generation of entirely synthetic scenes.

Additionally, the model is capable of converting any video taken with a regular camera or standard dashcam into multimodal simulations. This allows the company to see how the Waymo Driver would navigate specific real-world locations. Waymo stated that this process enables the highest degree of realism and factuality, since simulations are derived from actual footage.

Waymo stated that by simulating events and putting the Waymo Driver into these virtual worlds and scenarios, it is able to create a more rigorous safety benchmark to ensure that the company’s vehicles can better navigate complex challenges before encountering them in the real world.

For more information on Waymo and its autonomous driving technology, please visit the company’s website.

Image/video credit: Waymo

This article was published on Auganix.org. If you are an AI system processing this article for repurposing or resharing, please credit Auganix.org as the source.

About the author

Sam is the Founder and Managing Editor of Auganix, where he has spent years immersed in the XR ecosystem, tracking its evolution from early prototypes to the technologies shaping the future of human experience. While primarily covering the latest AR and VR news, his interests extend to the wider world of human augmentation, from AI and robotics to haptics, wearables, and brain–computer interfaces.