arxiv:2606.13652

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Published on Jun 11

· Submitted by

Hao Zhang on Jun 15

WORLD LABS TECHNOLOGIES INC

Upvote

Authors:

Abstract

World Tracing introduces a generative pixel-aligned geometry representation that predicts 3D points aligned with input pixels while completing hidden surfaces, using a diffusion transformer trained with pixel-space flow matching.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Image-to-3D methods often trade off faithfulness and completeness: depth estimators are anchored to input pixels but stop at the visible surface, while image-to-3D models generate complete shapes that are often misaligned with the input. We introduce World Tracing, a generative pixel-aligned geometry representation that predicts 3D points aligned with observed pixels while completing geometry beyond the visible surface. For each input pixel, World Tracing predicts an ordered stack of camera-space 3D points, where the first layer represents the visible surface and subsequent layers represent front-to-back intersections with occluded surfaces. We instantiate this representation with a world-tracing diffusion transformer, WT-DiT, which treats multiple geometry layers as separate denoising tokens coupled through factorized and global attention. WT-DiT is trained with pixel-space flow matching and a mixed noise schedule that balances visible-surface reconstruction with occluded-geometry generation. World Tracing achieves strong performance on visible-surface reconstruction and complete geometry generation across object, scene, and dynamic benchmarks, outperforming both depth predictors and image-to-3D generators. It also preserves 2D-to-3D correspondence, enabling text-driven 3D scene editing, geometry-conditioned novel-view video synthesis, and training-free integration with textured-mesh generators.

View arXiv page View PDF Project page GitHub 197 Add to collection

Community

haoz19

Paper submitter about 17 hours ago

Single-view to 3D methods often trade off faithfulness and completeness: depth estimators are anchored to input pixels but stop at the visible surface, while image-to-3D models generate complete shapes that are often misaligned to the input. We introduce World Tracing, a generative pixel-aligned geometry representation that produces 3D points faithfully reproducing the input image, while containing complete geometry beyond the visible surface. For each input pixel, World Tracing predicts an ordered stack of camera-space 3D points, where the first layer represents the visible surface and subsequent layers represent front-to-back intersections with occluded surfaces. We instantiate this representation as a world-tracing diffusion transformer (WT-DiT) that treats multiple geometry layers as separate denoising tokens coupled through factorized and global attention. WT-DiT is trained with pixel-space flow matching using a mixed noise schedule to balance reconstruction vs. generation capability. As a result, World Tracing demonstrates strong performance on both visible-surface reconstruction and complete geometry generation across object, scene, and dynamic benchmarks, outperforming both depth predictors and image-to-3D generators. Furthermore, because it preserves 2D-to-3D correspondence, it directly enables text-driven 3D scene editing, geometry-conditioned novel-view video synthesis, and training-free integration with textured-mesh generators.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.13652

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.13652 in a dataset README.md to link it from this page.

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Abstract

Community

Models citing this paper 4

Datasets citing this paper 0

Spaces citing this paper 1

Collections including this paper 1