✨ TL;DR
Asset Harvester converts sparse, real-world object observations from autonomous vehicle driving logs into complete 3D assets suitable for simulation. The system combines large-scale data curation, geometry-aware preprocessing, and a novel sparse-view-conditioned model (SparseViewDiT) to generate simulation-ready 3D objects from limited viewing angles.
Closed-loop simulation is essential for autonomous vehicle development, requiring interactive 3D environments for testing and validation. While neural scene reconstruction can convert driving logs into 3D environments, it fails to produce complete 3D object assets needed for agent manipulation and novel-view synthesis from large viewpoint changes. Real-world AV data presents significant challenges: objects are observed from limited viewing angles with sparse coverage, captured by heterogeneous sensors under varying conditions. Existing image-to-3D methods struggle with these constraints, as they typically assume dense, controlled input views rather than the sparse, in-the-wild observations characteristic of driving logs.
Asset Harvester employs a system-level design combining multiple components to handle real-world AV data challenges. The pipeline begins with large-scale curation of object-centric training tuples from driving logs, followed by geometry-aware preprocessing that handles heterogeneous sensor data. The core technical contribution is SparseViewDiT, a model explicitly designed for limited-angle views that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. The training recipe incorporates hybrid data curation strategies, augmentation techniques to handle data variability, and self-distillation to improve robustness. This end-to-end system transforms sparse object observations into complete, reusable 3D assets ready for simulation deployment.
What the paper shows.
The paper demonstrates that Asset Harvester successfully converts sparse object observations from real autonomous driving logs into complete, simulation-ready 3D assets. The system enables scalable extraction and conversion of objects captured under real-world conditions with limited viewing angles. The combination of SparseViewDiT with the full pipeline produces assets suitable for agent manipulation and large-viewpoint novel-view synthesis in closed-loop simulation environments, addressing key requirements for AV development and testing.
The paper does not explicitly detail quantitative performance metrics or comparison benchmarks against alternative methods. The system's dependence on multiple components (data curation, preprocessing, model training) may introduce complexity in deployment and maintenance. The approach's performance boundaries regarding minimum view requirements, object categories, or extreme viewing angle limitations are not thoroughly characterized. Computational costs and scalability limits for processing large-scale driving log datasets are not discussed in detail.
✨ Generated by Claude · Apr 21, 2026 · Read the PDF for authoritative content.