Scene Flow for Dynamic Object Removal

What It Is

Scene flow estimates a 3D motion vector for each point between successive LiDAR sweeps.
For world models, the key use is not only motion prediction but also selective removal: dynamic points can be excluded from static maps, and static background can be excluded when isolating moving actors.
It is a flow-based complement to binary MOS methods such as LiDAR-MOS, 4DMOS, InsMOS, and StreamMOS.
It can also feed future occupancy and flow models such as StreamingFlow.
The output is valuable for map hygiene, occupancy clearing, actor extraction, and replay simulation.

Remove ego motion first, then estimate residual point motion between two LiDAR sweeps.
Points with coherent residual motion become dynamic candidates.
Points with near-zero residual motion and persistent occupancy become static-map candidates.
The dynamic mask removes ghosting from map fusion; the static mask can be subtracted to isolate moving objects for tracking or behavior modeling.
Scene flow keeps velocity direction and magnitude, so the world model can reason about where removed dynamic objects are going rather than only that they are moving.

Input: two consecutive LiDAR sweeps or a short temporal sequence.
Input: ego poses, calibration, timestamps, and de-skewing metadata.
Optional input: ground masks, semantic masks, object tracks, radar Doppler, or MOS masks.
Output: per-point flow_tx_m, flow_ty_m, and flow_tz_m style displacement vectors.
Output: is_dynamic or equivalent moving/static point label.
Output: static-only cloud for mapping and localization.
Output: dynamic-only cloud for actor extraction, tracking, occupancy flow, or simulation.

Synchronize LiDAR sweeps and interpolate ego poses.
De-skew rotating scans and transform sweeps into a consistent frame.
Estimate scene flow with an offline teacher such as Neural Scene Flow Priors, a distilled student such as ZeroFlow, or a supervised/self-supervised model such as DeFlow.
Convert residual flow to dynamic/static labels using magnitude, temporal consistency, and object-level clustering.
Fuse low-motion persistent points into the static map.
Route high-motion points to trackers, occupancy-flow prediction, or dynamic obstacle layers.
Keep confidence and audit logs so removed points can be reviewed when map quality or safety behavior changes.

Argoverse 2 scene flow uses 0.1 s LiDAR sweeps and object tracks to derive piecewise-rigid flow labels.
Its submission format includes per-point flow components and an is_dynamic boolean.
Argoverse 2 treats a point as dynamic when its ego-motion-removed ground-truth flow norm is greater than 0.05 m.
Flow metrics should include endpoint error, accuracy, outlier rate, and dynamic/static classification quality.
Removal metrics should include static map completeness, dynamic ghost reduction, and false removal of fixed structures.
Always report whether labels come from human/object annotations, offline NSFP-style pseudo-labels, ZeroFlow-style distillation, or a production tracker.

Class-agnostic: it can flag motion for objects not present in the road-object taxonomy.
Produces velocity vectors, not just moving/static labels.
Supports both dynamic object removal for map building and static background removal for actor isolation.
Can use unlabeled logs through NSFP or ZeroFlow-style pseudo-labeling.
Works naturally with occupancy flow and world-model prediction.
Gives a direct way to audit whether a static map contains moving-object ghosts.

Ego-pose, timestamp, or scan de-skew errors can become false scene flow.
Slow apron motion may fall below a generic dynamic threshold.
Newly revealed static surfaces can look dynamic because no correspondence existed in the previous sweep.
Occlusions, sparse far-range points, reflective aircraft surfaces, and repeated structures can produce wrong correspondences.
Removing all dynamic-looking points can erase useful structure from aircraft, doors, service equipment, or temporarily moved static assets.
Flow does not provide semantic intent; a moving baggage cart and a rolling loose object both need downstream interpretation.

High fit for maintaining static maps around stands, service roads, baggage halls, and pushback corridors.
Useful for removing tug, cart, bus, personnel, and aircraft-motion ghosts from repeated mapping runs.
Thresholds must be tuned for low-speed operations; a road benchmark dynamic cutoff can miss inching tugs or creeping belt loaders.
Aircraft are special: parked aircraft can be static map context, while pushback or taxi motion must be removed from static layers.
Radar Doppler and track logic should cross-check dynamic declarations in rain, spray, glare, and partial occlusion.
Store both removed and retained point layers so safety review can reconstruct why the world model changed.