Main Limitations of 3DGS
Geometric precision issues: 3DGS uses semi-transparent Gaussian ellipsoids, making surfaces inherently 'fuzzy' — precise Mesh geometry cannot be extracted. Gaussians are discrete points lacking topological information, preventing direct surface normal extraction and making geometric editing (deformation, cutting) difficult. Memory and storage overhead: a typical 3DGS model contains 1-5 million Gaussians with 59 parameters each, resulting in files typically 100MB-1GB in size — far larger than NeRF's few megabytes. Rendering also requires additional memory for sorting indices and intermediate results, with peak memory reaching 2-3× the model size.
Limited handling of reflective and transparent surfaces: 3DGS uses spherical harmonics for color encoding, but SH is low-frequency and cannot express sharp specular highlights. Glass refraction, water surface reflection, and subsurface scattering in translucent materials are all 3DGS blind spots. Limited out-of-distribution viewpoint extrapolation: performance is excellent near training viewpoints but degrades farther away, as Gaussian distributions are optimized for training viewpoints with limited generalization to unseen regions. These limitations have sparked extensive research: SuGaR (surface-aligned Gaussians), 2D Gaussian Splatting, Ref-GS (reflective Gaussians), and other directions continue to emerge.
The Core Idea of 4D Gaussian Splatting
The original 3DGS assumes scenes are completely static, producing ghosting and blur for dynamic scenes (human motion, object movement). The core idea of 4D Gaussian Splatting (4DGS) is: instead of storing independent 3DGS for each time point, learn a Deformation Field describing how Gaussians change over time. Given canonical space Gaussian G₀ and time t, the deformation field Φ_t maps it to Gaussian G_t at time t, with deformation including position change Δμ(t), rotation change ΔR(t), and scale change ΔS(t).
4DGS borrows HexPlane's idea: decomposing 4D space (3D + time) into 6 2D planes (3 spatial: XY/XZ/YZ + 3 spatiotemporal: XT/YT/ZT), each represented as a feature grid, with arbitrary spacetime features obtained through interpolation and fusion. The MLP predicting deformation parameters from HexPlane features is very small (few layers, tens of neurons), ensuring real-time performance. This design is compact, continuous (can render any moment), and efficient — the key to 4DGS expressing dynamic scenes while maintaining real-time rendering.
4DGS Training, Applications, and Future
4DGS training has two phases: first train canonical space 3DGS (ignoring time variation), then fix canonical Gaussians and train the deformation field. Training data can be multi-viewpoint synchronized video or single-viewpoint video (requiring stronger regularization). The loss function adds temporal smoothness and sparsity constraints on top of standard rendering loss. 4DGS applications include: virtual human motion capture, motion analysis (sports/medical), film effects dynamic scene reconstruction, temporal interpolation (slow motion), and bullet-time effects.
Current 4DGS limitations: difficulty handling topological changes (objects appearing/disappearing), massive computation and storage costs for large-scale dynamic scenes, and purely data-driven approach ignoring physical laws. Future research directions include: physics-guided deformation (rigid body dynamics, cloth simulation), semantic decomposition (modeling scene objects independently), combining generative capabilities with diffusion models, and real-time capture systems supporting live streaming and AR/VR applications. 3DGS is evolving from 'photo-realistic static reconstruction' toward a new era of 'unified spacetime dynamic representation'.