StreamMOS

What It Is

StreamMOS is a streaming LiDAR moving object segmentation method.
It was accepted to IEEE Robotics and Automation Letters and released on arXiv in 2024.
The method addresses temporal inconsistency across independent MOS inferences.
It uses memory across inferences rather than only fusing scans inside a single inference window.
The target task remains point-wise moving object segmentation.
It is especially relevant when a vehicle must produce continuous high-rate dynamic masks.

Maintain short-term memory of historical features.
Treat those features as spatial priors for moving objects in current inference.
Maintain long-term memory of previous predictions.
Refine current predictions at voxel and instance levels through voting over stored predictions.
Use a multi-view encoder with cascaded projection and asymmetric convolution.
Link feature propagation and prediction refinement so the same object is less likely to flicker between frames.

Input: streaming LiDAR scans.
Input: temporal context from previous inference states.
Input: derived multi-view representations rather than only one projection.
Output: point-wise moving/static labels for the current frame.
Output: stateful feature and prediction memory used by later frames.
Assumption: stream ordering, timestamps, and memory resets are handled correctly by the runtime.

The architecture combines a multi-view motion encoder with dual-span memory.
Short-term memory transfers feature-level information across nearby frames.
Long-term memory stores previous segmentation predictions for later voting.
Cascaded projection extracts complementary representations from LiDAR data.
Asymmetric convolution targets motion features with lower compute than heavy full 4D processing.
Instance-level voting improves object integrity when predictions are noisy.

The paper evaluates on SemanticKITTI and the Sipailou Campus dataset.
Reported results show competitive MOS performance with improved temporal continuity.
The method is designed for online use where consecutive predictions are not independent.
Training still relies on labeled MOS sequences.
Evaluation should include both point-level MOS metrics and temporal consistency checks.
The official paper states code will be released at the NEU-REAL StreamMOS repository.

Directly targets the flicker problem seen in frame-independent MOS.
Memory is useful for briefly occluded or sparse moving objects.
More runtime-friendly than methods that require large 4D windows for every frame.
Long-term voting can stabilize segmentation without a separate tracker.
Multi-view encoding reduces reliance on any single projection geometry.
Conceptually close to production streaming perception requirements.

Memory can propagate false positives after an object stops or leaves the scene.
Stateful inference creates reset, synchronization, and recovery concerns.
Drift in ego-motion or object alignment can corrupt memory.
Long-term voting may lag abrupt start/stop events.
Dataset performance may not transfer to low-speed apron motion without retraining.
Debugging is harder than stateless MOS because errors can be caused by prior frames.

Good fit for continuous dynamic masks around stands where objects are often temporarily occluded.
Useful when baggage carts or personnel pass behind aircraft gear and reappear.
Needs explicit memory invalidation around localization jumps, route resets, and sensor dropouts.
Airport apron speeds are low, so temporal consistency must not hide subtle starts.
Long-term memory should be bounded near aircraft to avoid stale dynamic masks on static fuselage points.
Best used with radar Doppler or track confirmation for safety-critical moving-object declarations.

Treat the memory state as part of the runtime safety contract.
Add diagnostics for memory age, reset events, and number of active dynamic voxels.
Run offline replay tests with dropped frames and out-of-order timestamps.
Compare against stateless 4DMOS on identical clips to measure consistency gain.
Publish dynamic-mask confidence and memory confidence separately.
Keep a deterministic fallback that clears memory on localization discontinuity.