MotionSeg3D

What It Is

MotionSeg3D is the IROS 2022 method "Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation."
It predicts point-wise moving/static labels for the current LiDAR scan.
The method sits in the same moving-object segmentation lineage as LiDAR-MOS, but adds stronger spatial-temporal fusion and point-level refinement.
It is a practical dynamic object removal method: remove points predicted as moving before static mapping, localization map updates, or occupancy fusion.
It complements later temporal methods such as 4DMOS, InsMOS, StreamMOS, and flow/forecasting methods such as StreamingFlow.

Use two range-image branches instead of one mixed input branch.
The appearance branch encodes the current LiDAR range image.
The motion branch encodes residual images generated from previous ego-motion-compensated scans.
Motion-guided attention fuses the branches so temporal evidence can emphasize the parts of the current scan that are actually moving.
A point refinement head back-projects range-view features to 3D points and uses sparse convolution to clean object borders.
The design is coarse-to-fine: fast range-view segmentation first, point-space correction second.

Input: sequential rotating-LiDAR scans.
Input: calibration, poses, or ego-motion estimates used to align previous scans to the current scan.
Input: residual range images generated from current and past scans.
Training input: SemanticKITTI-MOS labels and the authors' KITTI-Road-MOS labels.
Output: per-point moving/static logits or labels for the current scan.
Output: dynamic mask for removing moving objects, or static mask for preserving map-quality points.

Align recent scans into the current frame and generate temporal residual images.
Project the current scan into a range image.
Encode current appearance and temporal residual cues with separate branches.
Fuse multi-scale features through motion-guided attention.
Decode a range-view moving/static prediction.
Back-project features and predictions to 3D points.
Refine point labels with the point head, then threshold confidence for downstream removal.

Primary benchmark: SemanticKITTI-MOS.
Additional training/evaluation data: KITTI-Road-MOS labels released with the MotionSeg3D codebase.
Main metric: point-level moving-object IoU, usually reported with static/moving IoU and mIoU-style summaries.
The paper reports online operation at sensor frame rate.
For airside use, evaluate both MOS metrics and map effects: ghost removal, loss of static structure, and false static points left in the map.
Compare against LiDAR-MOS for a range-view baseline and 4DMOS for a 4D sparse-convolution baseline on the same clips.

Improves over simple residual concatenation by explicitly separating appearance and motion branches.
Keeps the fast range-view backbone style used by mature LiDAR segmentation stacks.
Point-space refinement reduces boundary artifacts from range projection.
Public code, pretrained-style workflows, and KITTI-Road-MOS labels make reproduction practical.
Does not need object boxes or semantic instance IDs at inference time.
Easier to deploy than heavier stateful or full 4D models when the vehicle already runs range-image LiDAR perception.

Residual images are only as good as ego-motion compensation, timestamp alignment, and scan de-skewing.
Very slow apron motion can fall below the learned residual pattern and be labeled static.
Range projection can lose detail for multi-LiDAR rigs, non-repetitive solid-state LiDAR, and unusual vertical fields of view.
The point refinement head improves borders but does not solve occlusion or sparse far-range actors.
Training on road datasets can bias the model away from aircraft, belt loaders, dollies, cones, and crouched personnel.
False positives can over-remove static map structure around curbs, stand markings, jet bridges, and parked equipment.

Good first upgrade beyond LiDAR-MOS for LiDAR-only dynamic object removal on airport aprons.
Particularly useful for cleaning SLAM or localization maps when tugs, baggage carts, buses, and ground crew pass through repeated survey routes.
Needs airport-specific validation at 1-5 km/h and for stop/start behavior near aircraft stands.
Multi-LiDAR vehicles should test per-sensor inference plus late fusion before forcing all sensors into one synthetic range image.
Use conservative removal thresholds for online safety; a false static moving object is worse than losing some static map density.
Pair with radar Doppler, object tracking, or StreamMOS-style temporal memory before using the output as a safety-critical dynamic declaration.