PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

ICML 2026

1HKU 2Tencent Hunyuan 3ZJU 4THU 5SJTU 6BUAA
* Equal Contribution Corresponding Authors

Abstract

Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in functional logic and hierarchical physics. To bridge this gap, we introduce PhysForge, a decoupled two-stage framework supported by PhysDB, a large-scale dataset of 150,000 assets with four-tier physical annotations. First, a VLM acts as a "physical architect" to plan a Hierarchical Physical Blueprint defining material, functional, and kinematic constraints. Second, a physics-grounded diffusion model realizes this blueprint by synthesizing high-fidelity geometry alongside precise kinematic parameters via a novel KineVoxel Injection (KVI) mechanism. Experiments demonstrate that PhysForge produces functionally plausible, simulation-ready assets, providing a robust data engine for interactive 3D content and embodied agents.

Method Overview



PhysForge is a decoupled two-stage framework for physics-grounded 3D asset generation. In Stage 1 (VLM-based Planning), a fine-tuned VLM acts as a "physical architect": taking a single image, an optional 2D mask, and TRELLIS voxel as input, it autoregressively generates a Hierarchical Physical Blueprint that specifies per-part bounding boxes, parent-child relationships, joint types, and rich physical properties such as material, mass, intrinsic function, state machines, and atomic affordances. In Stage 2 (Diffusion-based Generation), a diffusion model conditioned on this blueprint employs our novel KineVoxel Injection (KVI) mechanism to synergistically synthesize high-fidelity geometry, texture, and precise kinematic parameters (joint origin, axis, and limits) within a unified denoising framework, producing functionally complete, simulation-ready assets.

Applications




PhysForge produces functionally complete, simulation-ready 3D assets that directly unlock a range of downstream applications. (a) Robotic Simulation: our assets can be imported into simulators such as RoboTwin, where the detailed part-level geometry and precise kinematic parameters allow robotic manipulators to realistically interact with functional parts. (b) Virtual Worlds: in game engines and interactive virtual worlds, every part is endowed with physics-grounded attributes, enabling developers to design sophisticated interaction logic directly. (c) Agent-Environment Interaction: our VLM-based framework opens a new modality of interaction—an embodied agent can directly query an asset in natural language and receive a text-based physical blueprint with bounding boxes, providing an explicit plan for manipulation.

BibTeX

@article{
}