Skip to content

Volume Rendering, Radiance Fields, and Gaussian Splatting

Volume Rendering, Radiance Fields, and Gaussian Splatting curated visual

Visual: rendering pipeline comparing NeRF ray sampling/alpha compositing with Gaussian splat projection/rasterization and robotics-map constraints.

Classical mapping often asks for surfaces: points, planes, meshes, occupancy, or signed distance. Neural and differentiable rendering methods ask a different question: what color would a camera ray see if the scene were represented as a continuous field of density and radiance?

NeRF and 3D Gaussian Splatting are not magic replacements for geometry. They are image-formation models. Their strength is photorealistic view synthesis and differentiable optimization from images. Their weakness is that radiance, density, exposure, camera pose, dynamics, and geometry can explain each other unless the capture and priors are controlled.



2. Volume Rendering From First Principles

A camera ray is:

text
r(s) = o + s d

where o is camera center, d is ray direction, and s is distance along the ray. A radiance field predicts:

text
sigma(x) = volume density
c(x, d)  = emitted/reflected color toward direction d

The probability that light reaches distance s without being absorbed is transmittance:

text
T(s) = exp( - integral_s_near^s sigma(r(u)) du )

The rendered color is:

text
C(r) = integral_s_near^s_far T(s) sigma(r(s)) c(r(s), d) ds

Discrete alpha compositing approximates this with samples:

text
alpha_i = 1 - exp(-sigma_i delta_i)
T_i = product_j<i (1 - alpha_j)
C = sum_i T_i alpha_i c_i

This is the core of NeRF-style rendering and also the conceptual bridge to Gaussian splatting alpha compositing.


3. NeRF

NeRF represents the field with a neural network:

text
(sigma, color) = MLP( gamma(x), gamma(d) )

where gamma is a positional encoding that helps the MLP represent high frequency detail. Training minimizes photometric reconstruction error over known camera poses:

text
min_theta sum_pixels || C_pred(r; theta) - C_image ||^2

The original NeRF pipeline depends on:

  • calibrated images,
  • camera poses, often from structure-from-motion,
  • static scenes or explicit dynamic modeling,
  • many rays and samples per ray,
  • differentiable rendering through the volume integral.

What NeRF Learns

NeRF does not directly learn a mesh. It learns density and view-dependent color. Geometry is often inferred from density peaks or converted to a mesh later. This is why NeRF can render compelling views while still having imperfect metric geometry, floaters, or density in empty space.


4. 3D Gaussian Splatting

3D Gaussian Splatting represents a scene as many anisotropic Gaussian primitives:

text
G_k = { mean mu_k, covariance Sigma_k, opacity alpha_k, color coefficients }

The 3D Gaussian density has the shape:

text
G_k(x) = exp( -0.5 * (x - mu_k)^T Sigma_k^-1 (x - mu_k) )

Each Gaussian is projected into the image as an ellipse, sorted or otherwise handled for visibility, then alpha-composited:

text
C_pixel = sum_k T_k alpha_k c_k
T_k = product_j<k (1 - alpha_j)

The major practical change from NeRF is rendering speed. Instead of sampling an MLP many times along every ray, 3DGS rasterizes optimized primitives. The canonical 2023 method initializes from sparse SfM points, optimizes positions, anisotropic covariances, opacity, and color, and performs density control by splitting/pruning Gaussians.


5. Radiance Fields vs Robotics Maps

NeedRadiance field / 3DGSTSDF / ESDF / occupancy
photorealistic view synthesisstrongweak to moderate
metric free-space guaranteeweakstrong if sensor model is correct
collision checkingindirectdirect with ESDF/occupancy
dynamic obstacle semanticsneeds extra modellayered maps and trackers
online local planningcurrently hardstandard
differentiable image supervisionstrongusually indirect
long-term map maintenanceactive researchmature engineering patterns

For autonomy, radiance fields are most useful as appearance-rich scene models, simulation assets, map compression/reconstruction research, sensor simulation, and perception pretraining context. They should not be treated as certified collision maps without additional occupancy, uncertainty, and change-detection machinery.


6. Pose, Calibration, and Scale

The rendering loss assumes the ray is correct:

text
r = camera_ray(K, T_WC, pixel)

If intrinsics, extrinsics, rolling shutter, exposure, or pose are wrong, the field can compensate by growing blurry density, duplicate surfaces, or view- dependent color. Joint pose/radiance optimization is possible, but it changes the problem into bundle adjustment with a very flexible scene prior.

Important pose failure modes:

  • SfM scale ambiguity for monocular captures,
  • rolling-shutter images treated as global shutter,
  • moving objects baked into the static field,
  • exposure/white-balance differences modeled as geometry,
  • sparse camera baselines causing depth ambiguity,
  • reflective/transparent surfaces violating simple radiance assumptions.

7. Practical Intuition

NeRF

NeRF asks: "Along this ray, where is opacity and what color is emitted toward this camera?" It is powerful because the renderer is differentiable and the field is continuous. It is slow because many samples and network evaluations are needed.

3D Gaussian Splatting

3DGS asks: "Which learned ellipsoids project into this pixel, and how do their opacities and colors composite?" It is fast because rasterization handles many primitives efficiently. It is less implicit than NeRF but still not a clean surface map by default.


8. Implementation Checklist

  • Start with calibrated images and reliable camera poses; inspect reprojection errors before training.
  • Normalize scene scale and coordinate frames; document T_world_camera convention.
  • Mask dynamic objects, mirrors, sky, and saturated regions when they would corrupt static geometry.
  • Split train/test views by viewpoint, not adjacent frames only.
  • Track photometric metrics and geometric diagnostics separately.
  • For NeRF, tune near/far bounds and sampling so rays do not waste most samples in empty space.
  • For 3DGS, monitor Gaussian count, opacity collapse, oversized covariances, and floaters.
  • Use held-out camera paths to find overfitting to training views.
  • Do not use a radiance field as a planner map unless occupancy/free-space semantics are explicitly derived and validated.
  • Preserve original images, poses, masks, and training config for replayability.

9. Common Failure Modes

SymptomLikely cause
floaters in empty spacepose noise, insufficient views, weak density prior
blurry surfacespose/exposure mismatch or under-capacity field
duplicated geometrydynamic objects, loop/pose error, rolling shutter
good novel views but bad meshradiance fits images without clean density zero crossing
transparent objects look wrongsimple volume model does not capture refraction/reflection
scale wrongmonocular pose source without metric scale
3DGS splats explode in sizecovariance optimization unconstrained or sparse views
simulator collisions wrongappearance field used as geometry without occupancy validation

10. How It Connects to Perception and Mapping

  • View synthesis can create realistic validation views from captured sites.
  • Differentiable rendering can supervise geometry or semantics from images.
  • Radiance fields can enrich HD maps with appearance for localization research.
  • Gaussian splats can provide fast, inspectable scene playback.
  • Neural LiDAR/radiance mapping methods borrow volume-rendering ideas but still need range, occupancy, and uncertainty models for autonomy-grade planning.

The first-principles bridge is the measurement model. Classical SLAM predicts features or ranges; radiance-field SLAM predicts pixels. Both optimize a state so predicted measurements match observed measurements.


11. Sources

Public research notes collected from public sources.