Traditional video editing methods struggle to insert objects with both temporal consistency and photorealistic appearance. We propose a hybrid pipeline for inserting 3D objects into videos, combining 3D Gaussian Splatting rendering for temporal consistency and a 2D diffusion-based enhancement for photorealistic lighting. In this example, a virtual bracelet is inserted onto a wrist in a dynamic scene. The 3D representation ensures temporal consistency and correct handling of occlusions as the wrist moves, while the 2D image priors enhance realism by synthesizing realistic shading. Our approach bridges the gap between 3D rendering and 2D diffusion models, achieving both temporal coherence and realism.
Our pipeline inserts a 3D bracelet into a video while maintaining temporal consistency and realistic lighting. 1) We first compute motion and occlusion using 3D Gaussian Splatting (3DGS), leveraging 2D tracking points to align the bracelet with the wrist’s motion and monocular depth maps to handle occlusions. 2) Next, we enhance realism through a shading-driven approach, decomposing the image into albedo and shading components. The shading is refined using a diffusion-based model to adapt the bracelet’s lighting to the scene, while the albedo ensures color consistency. 3) Finally, we apply temporal smoothing to the bracelet and shadows, optimizing the 3DGS model and interpolating frames to ensure smooth transitions across the video.
Our method achieves realistic bracelet insertion with proper lighting, shadows, and temporal consistency.
@inproceedings{gao2025gallery,
title={From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos},
author={Gao, Chenjian and Ding, Lihe and Han, Rui and Huang, Zhanpeng and Wang, Zibin and Xue, Tianfan},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025},
}