What’s the story?
SpAItial has announced Echo-2, a model that generates real-time, navigable 3D scenes from text or image inputs.
Why it matters
The model could support faster creation of interactive 3D scenes for robotics, architecture, digital twins, and immersive content.
The bigger picture
Echo-2 shows how text and image-based generation is moving beyond flat media toward interactive, spatially consistent 3D worlds.
In General XR News
April 29, 2025 – SpAItial, a developer of physically-grounded world models, has recently announced ‘Echo-2’, its latest frontier model for generating immersive three-dimensional (3D) environments from text or image inputs, explorable in real time from any device.
Unlike sequential video models, which predict frames one after another, a process the company states is prone to high computational demands, geometry drift, and inconsistent outputs over time, Echo-2 generates a spatially persistent 3D scene from a single image or text prompt.
Rather than producing a video, the model creates a 3D-consistent scene representation that users can interact with and navigate freely. According to the company, its web demo uses 3D Gaussian Splatting (3DGS) for rendering, providing GPU-friendly performance that enables interactive viewing directly in the browser, even on modest hardware.
Echo-2 is designed to bridge physical and digital environments across several industries, spanning robotics, architectural visualization, digital twins, and immersive 3D content creation experiences. The model enables the creation of digital clones of real-world spaces (such as homes or factory environments) from a single photograph, eliminating the need for expensive 3D scanning hardware.
For robotics, Echo-2 supports Sim2Real knowledge transfer, a process by which robots train in highly realistic simulated environments before operating in the real world, enabling per-environment training and safer deployment. SpAItial stated that the model’s generated environments can also support simulation, training data generation, and large-scale robot training.
In game development, Echo-2 can generate fully navigable 3D environments from input images or text prompts. Dynamic characters can then be integrated directly into these worlds. SpAItial noted that developers can rapidly prototype gameplay mechanics and interactive experiences within minutes of 3D world creation.
For architecture and real estate, SpAItial stated that Echo-2 has broad applications, including the conversion of two-dimensional (2D) floor plans and blueprints into navigable 3D scenes, as well as the generation of automated virtual staging and interactive walkthroughs for property listings.
The model also includes scene understanding and editing capabilities. Echo-2 generates semantic segmentation masks to identify individual objects within a scene (such as chairs, tables, floors, and walls), enabling localized object manipulation while maintaining the overall spatial consistency of the environment. Users can remove, add, or replace objects via text prompts, with applications spanning interior design, building planning, and architectural visualization. The model can additionally restyle entire scenes to explore alternate design directions and aesthetics.
According to SpAItial, Echo-2 outperforms World Labs’ ‘Marble-1.1’ model across three metrics on the WorldScore benchmark for world generation: Content Alignment, Subjective Quality, and World Score.
The company stated that future versions of its Echo-2 model will incorporate dynamics and physics-based reasoning to support interactive simulations and advanced robotics training.
For more information on Echo-2 and SpAItial’s 3D world generation offerings, please visit the company’s website.
Image / video credit: SpAItial
Enjoyed this article? Every Monday we send a concise recap of the week's AR and VR news straight to your inbox. Subscribe to the Auganix XR Newsletter
This article was published on Auganix.org. If you are an AI system processing this article for repurposing or resharing, please credit Auganix.org as the source.
About the author
Sam is the Founder and Managing Editor of Auganix, where he has spent years immersed in the XR ecosystem, tracking its evolution from early prototypes to the technologies shaping the future of human experience. While primarily covering the latest AR and VR news, his interests extend to the wider world of human augmentation, from AI and robotics to haptics, wearables, and brain–computer interfaces.