Joint Control of Camera Motion and Object Dynamics for Coherent Video Generation

CVPR 2026
1The Chinese University of Hong Kong, Shenzhen, 2Shanghai Jiaotong University,
3Nanjing University, 4Beihang University
*Equal contribution, Corresponding author
Joint Control of Camera and Object Motion. Given a reference image, a set of 3D object trajectories, and a camera trajectory, SymphoMotion generates videos that are spatially consistent and faithfully reflect both object and camera motion.



Abstract

Controlling both camera motion and object dynamics is essential for coherent and expressive video generation, yet current methods typically handle only one motion type or rely on ambiguous 2D cues that entangle camera-induced parallax with true object movement. We present SymphoMotion, a unified motion-control framework that jointly governs camera trajectories and object dynamics within a single model. SymphoMotion features a Camera Trajectory Control mechanism that integrates explicit camera paths with geometry-aware cues to ensure stable, structurally consistent viewpoint transitions, and an Object Dynamics Control mechanism that combines 2D visual guidance with 3D trajectory embeddings to enable depth-aware, spatially coherent object manipulation. To support large-scale training and evaluation, we further construct RealCOD-25K, a comprehensive real-world dataset containing paired camera poses and object-level 3D trajectories across diverse indoor and outdoor scenes, addressing a key data gap in unified motion control. Extensive experiments and user studies show that SymphoMotion significantly outperforms existing methods in visual fidelity, camera controllability, and object-motion accuracy, establishing a new benchmark for unified motion control in video generation.

Method

Overview of SymphoMotion. SymphoMotion introduces two complementary mechanisms for camera and object motion control: Camera Trajectory Control (CTC) and Object Dynamics Control (ODC). Given a reference image, a text prompt, and the specified camera and object trajectories, CTC employs the Viewpoint Control Module (VCM) to integrate 3D geometric priors with camera motion for precise trajectory control. In parallel, ODC, powered by the Object Motion Module (OMM), combines 2D visual guidance with 3D motion cues for dynamic and spatially coherent object manipulation.

More unified (camera and object) control results

BibTeX


      @article{zhang2025symphomotion,
        title={SymphoMotion: Joint Control of Camera Motion and Object Dynamics for Coherent Video Generation},
        author={Zhang, Guiyu and Chen, Yabo and Xiang, Xunzhi and Huang, Junchao and Wang, Zhongyu and Jiang, Li},
        journal={arXiv preprint},
        year={2025}
      }