Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

Tianshi Cao; Jiawei Ren; Yuxuan Zhang; Jaewoo Seo; Jiahui Huang; Shikhar Solanki; Haotian Zhang; Mingfei Guo; Haithem Turki; Muxingzi Li; Yue Zhu; Sipeng Zhang; Zan Gojcic; Sanja Fidler; Kangxue Yin

✨ TL;DR

Asset Harvester converts sparse, real-world object observations from autonomous vehicle driving logs into complete 3D assets suitable for simulation. The system combines large-scale data curation, geometry-aware preprocessing, and a novel sparse-view-conditioned model (SparseViewDiT) to generate simulation-ready 3D objects from limited viewing angles.

01 · Problem

Closed-loop simulation is essential for autonomous vehicle development, requiring interactive 3D environments for testing and validation. While neural scene reconstruction can convert driving logs into 3D environments, it fails to produce complete 3D object assets needed for agent manipulation and novel-view synthesis from large viewpoint changes. Real-world AV data presents significant challenges: objects are observed from limited viewing angles with sparse coverage, captured by heterogeneous sensors under varying conditions. Existing image-to-3D methods struggle with these constraints, as they typically assume dense, controlled input views rather than the sparse, in-the-wild observations characteristic of driving logs.

02 · Approach

Asset Harvester employs a system-level design combining multiple components to handle real-world AV data challenges. The pipeline begins with large-scale curation of object-centric training tuples from driving logs, followed by geometry-aware preprocessing that handles heterogeneous sensor data. The core technical contribution is SparseViewDiT, a model explicitly designed for limited-angle views that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. The training recipe incorporates hybrid data curation strategies, augmentation techniques to handle data variability, and self-distillation to improve robustness. This end-to-end system transforms sparse object observations into complete, reusable 3D assets ready for simulation deployment.

03 · Key insights

What the paper shows.

01Real-world AV data requires system-level design rather than relying on a single model component, combining data curation, preprocessing, and robust training

02Sparse-view-conditioned multiview generation coupled with 3D Gaussian lifting enables reconstruction from limited viewing angles typical in driving scenarios

03Geometry-aware preprocessing across heterogeneous sensors is critical for handling the variability in real-world autonomous driving data

04Self-distillation and hybrid data augmentation strategies improve model robustness to in-the-wild conditions and sparse observations

04 · Results

The paper demonstrates that Asset Harvester successfully converts sparse object observations from real autonomous driving logs into complete, simulation-ready 3D assets. The system enables scalable extraction and conversion of objects captured under real-world conditions with limited viewing angles. The combination of SparseViewDiT with the full pipeline produces assets suitable for agent manipulation and large-viewpoint novel-view synthesis in closed-loop simulation environments, addressing key requirements for AV development and testing.

05 · Limitations

The paper does not explicitly detail quantitative performance metrics or comparison benchmarks against alternative methods. The system's dependence on multiple components (data curation, preprocessing, model training) may introduce complexity in deployment and maintenance. The approach's performance boundaries regarding minimum view requirements, object categories, or extreme viewing angle limitations are not thoroughly characterized. Computational costs and scalability limits for processing large-scale driving log datasets are not discussed in detail.

✨ Generated by Claude · Apr 21, 2026 · Read the PDF for authoritative content.

What the paper shows.

↘ Related papers