Skip to content

Dynamic 4D Gaussian SLAM

Related docs: Photoreal City-Scale 4D Reconstruction, Dynamic 4D Neural/Gaussian Reconstruction, WildGS-SLAM, Dynamic-Object-Aware SLAM, Semantic SLAM, Splat-SLAM, and Gaussian Splatting for Driving.

Executive Summary

Dynamic 4D Gaussian SLAM is a 2025-2026 research wave that extends Gaussian SLAM from static 3D scenes to time-varying scenes. Instead of treating moving objects only as outliers to remove, these methods try to represent motion explicitly through dynamic Gaussians, deformation fields, motion probabilities, static/dynamic splits, or time-aware reliability estimates.

The page covers the main taxonomy: 4DGS-SLAM, 4DTAM, D4DGS-SLAM, Dy3DGS-SLAM, and DAGS-SLAM. These systems differ in sensors and modeling choices, but they share the same core problem: dynamic objects can corrupt pose tracking and pollute maps, while modeling them increases dimensionality, ambiguity, and compute.

For AVs and airside autonomy, dynamic Gaussian SLAM is important but early. It is most useful for research, offline dynamic-scene reconstruction, map cleaning, and simulation. It is not yet a production localization replacement for multi-sensor state estimation.

Keep the boundary explicit: dynamic Gaussian SLAM estimates pose while maintaining a time-aware Gaussian map. Dynamic street reconstruction methods such as Street Gaussians, DrivingGaussian, OmniRe, S3Gaussian, PVG, OG-Gaussian, and EmerNeRF usually consume externally estimated poses, tracks, or priors and produce renderable 4D scene assets. They are important adjacent methods, but they should not be counted as SLAM backbones without a live tracking and mapping loop.

Core Idea

Static Gaussian SLAM assumes one persistent scene. Dynamic 4D Gaussian SLAM adds time:

text
Static 3DGS map:
  G_i = position, covariance, opacity, appearance

Dynamic / 4D map:
  G_i(t) = position(t), deformation(t), opacity(t), appearance(t), motion/reliability state

Common approaches:

  • Split Gaussians into static and dynamic sets.
  • Add MLP or control-point deformation fields.
  • Use optical flow, depth, masks, or semantics to supervise motion.
  • Maintain per-Gaussian motion probability or reliability.
  • Filter dynamic or unreliable points out of pose tracking.
  • Optimize dynamic rendering and static localization jointly.

The hard part is that pose and scene motion can explain the same image residual. Without enough depth, priors, or temporal constraints, camera motion and object motion are ambiguous.

Pipeline

Typical dynamic Gaussian SLAM pipeline:

  1. Ingest RGB, RGB-D, or RGB plus predicted depth.
  2. Estimate camera pose from static or reliable regions.
  3. Generate dynamic cues such as optical flow, depth disagreement, segmentation, or uncertainty.
  4. Classify pixels or Gaussians as static, dynamic, or unreliable.
  5. Build a static Gaussian map for localization support.
  6. Build dynamic Gaussians or deformation fields for moving regions.
  7. Render RGB/depth/flow from the 4D map.
  8. Optimize photometric, geometric, flow, and regularization losses.
  9. Update motion probability, reliability, or dynamic state over time.
  10. Evaluate trajectory, rendering, dynamic reconstruction, and map cleanliness.

Method Taxonomy

MethodInputDynamic modelMain idea
4DGS-SLAMRGB-DStatic/dynamic Gaussian sets plus control-point/MLP transformation fieldsReconstruct dynamic radiance fields instead of removing all dynamic content
4DTAMRGB with depth measurements or predictionsDynamic surface Gaussians plus MLP warp fieldJoint non-rigid tracking and mapping via differentiable rendering
D4DGS-SLAMRGB-D-style dynamic scenes4DGS map with dynamics-aware InfoModuleEstimate dynamics, visibility, and reliability, then filter unstable dynamic points for tracking
Dy3DGS-SLAMMonocular RGBProbabilistic fusion of optical-flow and depth masksDynamic mask and motion loss for monocular dynamic Gaussian SLAM
DAGS-SLAMRGB-D benchmarksPer-Gaussian spatiotemporal motion probabilityUse YOLO priors plus geometry and uncertainty scheduling to reduce semantic compute

Strengths

  • Directly addresses dynamic-object contamination.
  • Can preserve useful dynamic-scene information rather than deleting everything that moves.
  • Produces time-varying visual assets useful for simulation and replay.
  • Per-Gaussian motion or reliability states are a natural fit for map QA.
  • Filtering unreliable dynamic regions can improve pose tracking.
  • DAGS-SLAM-style scheduling points toward mobile compute tradeoffs.
  • 4DTAM and 4DGS-SLAM create evaluation protocols for a difficult underexplored problem.

Limitations

  • The optimization problem is high-dimensional and ill-posed.
  • Camera ego-motion and object motion are hard to disentangle.
  • Motion masks, flow, segmentation, and depth priors can fail under blur, occlusion, lighting change, and reflective surfaces.
  • Dynamic objects that stop for long periods can be mistaken for static map structure.
  • Dynamic reconstruction quality does not guarantee pose accuracy.
  • Compute and memory grow quickly with time-varying Gaussians.
  • Many methods are benchmarked on indoor RGB-D or short dynamic sequences, not long AV routes.
  • Uncertainty outputs are not yet calibrated safety covariances.

AV Relevance

Dynamic 4D Gaussian SLAM matters to AVs because roads, depots, terminals, and airports are never perfectly static. It can help with:

  • Offline removal or modeling of dynamic objects in mapping logs.
  • Dynamic scene replay and simulation.
  • Static-background extraction from traffic-heavy routes.
  • Visual QA of ghost artifacts.
  • Research into motion-aware visual localization.

It should not replace production tracking or localization. AVs need explicit object tracking, occupancy prediction, LiDAR/radar/camera fusion, map-frame localization, and safety monitors. Dynamic Gaussian maps may become useful supporting artifacts, but the production stack must still know what is currently occupied and what is safe to drive through.

Indoor/Outdoor Notes

Indoor: Strong fit for labs, rooms, corridors, robots, people, and handheld RGB-D dynamic benchmarks. Multipath is less central than occlusion, texture, and moving people.

Outdoor: Harder because dynamic objects are larger, faster, farther away, and more numerous. Lighting and weather also vary more.

Airside: Airside scenes are a severe dynamic test: aircraft, tugs, belt loaders, buses, people, cones, baggage carts, fuel trucks, jet bridges, shadows, wet pavement, and reflections. Dynamic 4D Gaussian methods are useful for offline analysis, but not as a primary stand-approach pose source.

Comparison

FamilyWhat it does with dynamicsProduction caveat
Classical dynamic-aware SLAMRemove or downweight moving objectsLess photorealistic, often stronger pose assumptions
WildGS-SLAMUncertainty-weight dynamic distractors in monocular Gaussian SLAMStatic-map extraction, not full 4D motion field
4DGS-SLAM / 4DTAMModel dynamic geometry over timeHigh-dimensional and early-stage
Dy3DGS-SLAMMonocular dynamic masks and motion lossDepends on learned/estimated masks and depth
DAGS-SLAMPer-Gaussian motion probability with scheduled semanticsPractical direction, still benchmark-stage

Evaluation

Evaluate both localization and dynamic reconstruction:

  • ATE/RPE for camera tracking.
  • Static-map accuracy and ghost-object rate.
  • Dynamic-object reconstruction quality.
  • Flow/depth rendering error where ground truth exists.
  • PSNR, SSIM, LPIPS for static and dynamic views.
  • Motion-mask precision/recall.
  • Tracking robustness during occlusion.
  • Runtime, memory, and model growth over time.
  • Ability to relocalize after dynamic occlusions.

For AV/airside work, add false-static insertions, false-dynamic removal of real infrastructure, effect on downstream localization, repeated-day consistency, and tests with stopped aircraft or parked GSE that later move.

Implementation Notes

  • Keep static localization maps separate from dynamic replay maps.
  • Store dynamic object state with timestamps and provenance.
  • Do not let one route with parked equipment define permanent infrastructure.
  • Validate semantic and flow dependencies on local camera domains.
  • Use LiDAR/radar/RTK truth where possible to separate camera error from dynamic-scene modeling error.
  • Monitor GPU memory as a first-class metric.
  • Treat per-Gaussian motion probability as a QA signal, not a safety-certified occupancy probability.
  • For airside, use operations metadata when available to distinguish parked aircraft from infrastructure.

Practical Recommendation

Use dynamic 4D Gaussian SLAM for research, replay, simulation, and map-cleaning studies. For production AV localization, keep dynamic objects in the perception/tracking stack and keep static map localization tied to validated metric maps. Dynamic Gaussian maps are promising supporting evidence, not operational authority.

Sources

Public research notes collected from public sources.