印刻万物 TOP3DGS印刻万物TOP3DGS

Extended notes · Theory

Anatomy of a Splat: Shape, Color, and Opacity

Imagine filling space with countless translucent colored drops. We deconstruct the core parameters of a Gaussian and see how shape, color, and orientation recreate real-world light and shadow.

Cross-checked against public sources

Position and Covariance: Defining Shape

The first fundamental parameter of a 3D Gaussian is the mean vector μ = (x, y, z), defining its center position in world space. During initialization, these means come from the sparse point cloud generated by SfM, and are continuously optimized during training to best explain the training images.

The covariance matrix (Σ) defines 'what shape'. Directly optimizing a 3×3 matrix faces the challenge of maintaining positive-definiteness. 3DGS uses a clever decomposition Σ = RSS^TR^T: S is a diagonal scaling matrix with 3 parameters (sx, sy, sz), and R is a rotation matrix represented by 4 quaternion parameters. This decomposition ensures that as long as scaling parameters are positive, the covariance matrix remains automatically positive-definite. Using anisotropic (non-spherical) Gaussians, a single flat ellipsoid can cover an entire wall, greatly reducing the number of Gaussians needed.

Spherical Harmonics: Encoding View-Dependent Color

Real-world object colors are view-dependent. Storing only a fixed RGB value cannot express view-dependent effects like specular reflection or ambient light. 3DGS solves this with Spherical Harmonics (SH) — orthogonal basis functions defined on a sphere, like a 'Fourier transform on the sphere'. Each Gaussian's color is expressed as a weighted sum of spherical harmonic basis functions across different directions.

3DGS typically uses 3rd-order spherical harmonics, requiring 16 × 3 = 48 color-related parameters per Gaussian (16 coefficients × 3 RGB channels). 0th-order (1 coefficient) can only express uniform color; 1st-order handles simple directional variation; 3rd-order handles complex view-dependent effects. During rendering, SH computation is just simple polynomial arithmetic that can be efficiently parallelized on GPUs, achieving an excellent balance between expressiveness and computational efficiency.

Opacity and Volume Rendering

The final key parameter of each Gaussian is opacity (α) in [0, 1]. 3DGS rendering is essentially volume rendering: the final pixel color is a weighted blend C = Σ cᵢαᵢGᵢTᵢ, where accumulated transmittance Tᵢ is the product of transmittances of all preceding Gaussians. This is identical to NeRF's volume rendering, ensuring physical correctness.

Opacity plays a dual role during training: as an optimization variable, important Gaussians gradually become more 'solid'; and as a pruning criterion, Gaussians with persistently low opacity (below 0.005) are deleted to save resources. 3DGS also resets all Gaussian opacities at specific training stages (e.g., iteration 3000), forcing the model to re-learn which Gaussians are truly important — analogous to cellular apoptosis in biological systems.

From 3D to 2D: Projection and Rasterization

The key mathematical insight is that a 3D Gaussian projected onto 2D remains a Gaussian. The 3D covariance matrix Σ is transformed to 2D via the Jacobian: Σ₂D = JWΣ₃DWᵀJᵀ. This property allows the projection to be computed efficiently through pure mathematics, without complex ray tracing. All 59 parameters (position 3, scale 3, rotation 4, SH color 48, opacity 1) collectively define a Gaussian's complete 3D state, and millions of such Gaussians enable real-time rendering through fast GPU rasterization.

Related learning path

understand-gaussian-splatting · Module 02

Sources