Understanding dynamic 3D scenes is crucial for extended reality (XR) and autonomous driving. Incorporating semantic information into 3D reconstruction enables holistic scene representations, unlocking immersive and interactive applications. To this end, we introduce TRASE, a novel tracking-free 4D segmentation method for dynamic scene understanding. TRASE learns a 4D segmentation feature field in a weakly-supervised manner, leveraging a soft-mined contrastive learning objective guided by SAM masks. The resulting feature space is semantically coherent and well-separated, and final object-level segmentation is obtained via unsupervised clustering. This enables fast editing, such as object removal, composition, and style transfer, by directly manipulating the scene's Gaussians. We evaluate TRASE on five dynamic benchmarks, demonstrating state-of-the-art segmentation performance from unseen viewpoints and its effectiveness across various interactive editing tasks.
Given dynamic reconstruction, we proceed to learn Gaussian features using our contrastive semantically-aware learning based on SAM masks. Once the features are properly learned, clustering (DBSCAN) is performed directly on the learned Gaussian features, and the corresponding segmentation field can be rendered. We demonstrate the applicability of our representation on various scene-editing applications. Some of them include segmentation of a target object by click/text prompt in our GUI, object removal or scene composition, and others.
Experience sync issues with the GIFs? Simply refresh the webpage to synchronize playback.
Experience sync issues with the GIFs? Simply refresh the webpage to synchronize playback.
Experience sync issues with the GIFs? Simply refresh the webpage to synchronize playback.
@article{li2024sadg,
title={SADG: Segment Any Dynamic Gaussian Without Object Trackers},
author={Li, Yun-Jin and Gladkova, Mariia and Xia, Yan and Cremers, Daniel},
journal={arXiv preprint arXiv:2411.19290},
year={2024}
}