LangSplat: 3D Language Gaussian Splatting
Stores distilled language features on Gaussians and splats them for open-vocabulary 3D grounding, avoiding costly NeRF volume rendering.
Authors / Team
Minghan Qin · Researcher
Year
2024
Deep Dive
LangSplat targets open-vocabulary querying in 3D by attaching language features to 3D Gaussians and rendering them with a tile-based splatter analogous to RGB splatting, avoiding ray marching through a NeRF. A scene-wise language autoencoder reduces memory versus storing full CLIP embeddings, and hierarchical semantics derived from segmentation priors sharpen object boundaries. Experiments report large speedups over prior NeRF-grounded language fields at high resolution with improved grounding quality.
What we learn
- 01
Splatting language features on explicit primitives brings inference closer to RGB splatting cost.
- 02
Scene-specific latents and segmentation priors mitigate the fuzzy boundaries of CLIP features in 3D.
Verbatim quote
"Humans live in a 3D world and commonly use natural language to interact with a 3D scene."— source ↗
Tags
Links
Sources