NVIDIA Lyra 2.0: Open Source 3D World Generation from a Single Image

April 14, 2026

Share this post:

NVIDIA Lyra 2.0: Open Source 3D World Generation from a Single Image

NVIDIA's Spatial Intelligence Lab released Lyra 2.0, an open source framework that converts a single photograph into a navigable, geometrically consistent 3D world. The system is licensed under Apache 2.0, permits commercial use, and ships with model weights on Hugging Face and code on GitHub.

Lyra 2.0, NVIDIA Spatial Intelligence Lab

From Image to Interactive World

Lyra 2.0 takes one input image, generates a camera controlled video walkthrough of the scene, and reconstructs that walkthrough into 3D Gaussian Splats and surface meshes. Those outputs load directly into real time rendering engines and physics simulators.

The system is built on Wan 2.1-14B, a Diffusion Transformer trained for video generation. It produces frames at 832×480 resolution using 35 denoising steps. A distilled variant reduces that to 4 steps for faster iteration without retraining.

Lyra 1.0, released in September 2025, introduced the core pipeline for 3D and 4D scene generation from single images. Lyra 2.0 extends that foundation to long horizon exploration, letting users navigate through large spatial regions while the model maintains geometric consistency across the entire sequence. Other open source world simulators approaching this problem include Lingbot World, which builds on Wan2.2 for camera controlled environment generation under the same Apache 2.0 terms.

Keeping the World Consistent

Two failure modes undermine long horizon 3D generation. Spatial forgetting occurs when the model revisits a distant region and hallucinates geometry rather than reproducing what it already generated. Temporal drifting happens when small per frame errors accumulate until the scene diverges from the original input.

Lyra 2.0 addresses spatial forgetting through geometry based frame retrieval. Instead of feeding historical frames directly as image conditioning, the system routes information exclusively through 3D geometry. It retrieves historical frames with maximum visibility to the target viewpoint and establishes dense correspondences via canonical coordinate warping.

Temporal drifting is reduced through self augmentation training, where the model trains on its own degraded outputs. This closes the gap between training conditions and inference behavior. FramePack temporal compression runs alongside this strategy to limit how far reconstruction errors propagate across extended sequences.

Diverse Scene Generation

Lyra 2.0 handles a wide range of environment types from single input images, including fantasy interiors, futuristic urban landscapes, outdoor terrain, and architectural spaces.

Surreal Eastern fantasy

Futuristic street terrace

Scene example

On the DL3DV and Tanks and Temples benchmarks, Lyra 2.0 scores an LPIPS of 0.552, an FID of 51.33, and a style consistency of 85.07%. These figures measure perceptual quality, distribution fidelity, and visual coherence across generated frames.

Creating Simulation Ready 3D Environments with Interactive GUI

Lyra 2.0 ships an interactive GUI for scene exploration and camera trajectory planning. Users draw paths through the generated environment, and the model progressively extends the scene as the virtual camera moves forward.

Lyra 2.0 in NVIDIA Isaac Sim, delivery robot navigation

The GUI exports finished scenes directly to NVIDIA Isaac Sim for embodied AI and robotics training. A delivery robot navigating a new facility can be trained in a simulated version of that facility, built in minutes from a single photograph. The 3D Gaussian Splat export preserves full spatial geometry and texture detail for physics engine use.

Robot environment

Chinese temple

Staircase environment

Coastal scene

For filmmakers, the GUI offers a faster path to spatial previsualization than manual 3D modeling. Generate a scene from a location photograph, walk through it to plan shots, then export to a game engine. Explore more ways to generate and manipulate scenes in the AI FILMS Studio video workspace.

Open Source and Available for Commercial Use

Lyra 2.0 is released under the Apache 2.0 license, which permits commercial use, modification, and redistribution. Model weights are available on Hugging Face at nvidia/Lyra-2.0. Source code is on GitHub at nv-tlabs/lyra, with 927 stars and 55 forks at publication.

No managed inference endpoint exists at launch. Teams run the model locally on their own hardware. The Wan 2.1-14B base requires a GPU with sufficient VRAM for 14 billion parameter inference. NVIDIA LongLive, the lab's earlier release for real time interactive long video generation, shipped under non commercial terms. Lyra 2.0's Apache 2.0 license removes that restriction for commercial production pipelines.

Image to 3D approaches are converging across labs. Tencent's HunyuanWorld Mirror covers similar territory using Gaussian splatting for scene representation. Lyra 2.0 distinguishes itself with the anti forgetting and anti drifting mechanisms that sustain consistency across longer trajectories and with the Isaac Sim pipeline for simulation use.