SADG: Segment Any Dynamic Gaussian Without Object Trackers

1Technical University of Munich, 2Munich Center for Machine Learning,

Abstract

Understanding dynamic 3D scenes is fundamental for various applications, including extended reality (XR) and autonomous driving. Effectively integrating semantic information into 3D reconstruction enables holistic representation that opens opportunities for immersive and interactive applications. To this end, we introduce SADG, Segment Any Dynamic Gaussian Without Object Trackers, a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs. We propose to learn semantically-aware features by leveraging masks generated from the Segment Anything Model (SAM) and utilizing our novel contrastive learning objective based on hard pixel mining. The learned Gaussian features can be effectively clustered without further post-processing. This enables fast computation for further object-level editing, such as object removal, composition, and style transfer by manipulating the Gaussians in the scene. Due to the lack of consistent evaluation protocol, we extend several dynamic novel-view datasets with segmentation benchmarks that allow testing of learned feature fields from unseen viewpoints. We evaluate SADG on proposed benchmarks and demonstrate the superior performance of our approach in segmenting objects within dynamic scenes along with its effectiveness for further downstream editing tasks.

GUI Demo

Approach

Teaser Image

Given dynamic reconstruction, we proceed to learn Gaussian features using our contrastive semantically-aware learning based on SAM masks. Once the features are properly learned, clustering (DBSCAN) is performed directly on the learned Gaussian features, and the corresponding segmentation field can be rendered. We demonstrate the applicability of our representation on various scene-editing applications. Some of them include segmentation of a target object by click/text prompt in our GUI, object removal or scene composition, and others.

Segment Object in Novel Views by Click Prompt

Experience sync issues with the GIFs? Simply refresh the webpage to synchronize playback.

Segment Object in Novel Views by Text Prompt

Text Prompt: hands with cookie

Text Prompt: pan and stove

Text Prompt: two hands

Text Prompt: dog

Experience sync issues with the GIFs? Simply refresh the webpage to synchronize playback.

Scene Editing in Dynamic Scene

Object Removal

Object Style Transfer

Object Composition

Experience sync issues with the GIFs? Simply refresh the webpage to synchronize playback.

BibTeX


        @misc{li2024sadgsegmentdynamicgaussian,
          title={SADG: Segment Any Dynamic Gaussian Without Object Trackers}, 
          author={Yun-Jin Li and Mariia Gladkova and Yan Xia and Daniel Cremers},
          year={2024},
          eprint={2411.19290},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2411.19290}, 
        }