InsMOS

What It Is

InsMOS is an instance-aware moving object segmentation method for LiDAR data.
It predicts point-wise moving labels while also detecting instance information for main traffic participants.
The method was published at IROS 2023 and has an official MIT-licensed implementation.
It addresses a key MOS weakness: fragmented moving labels without object-level consistency.
The authors later extend the line of work into SegNet4D.
It is best understood as MOS with instance reasoning, not a full 4D panoptic segmentation system.

Use a sequence of point clouds as input.
Quantize the temporal sequence into 4D voxels.
Run 4D sparse convolutions to extract motion features.
Inject the motion features into the current scan.
Detect instance information in the current scan so object-level motion can guide point labels.
Fuse spatio-temporal features and predicted instance information through an upsample fusion module.

Input: sequential LiDAR point clouds.
Input: ego-motion aligned temporal context.
Training input: point MOS labels plus instance or bounding-box labels for traffic participants.
Output: per-point moving/static labels.
Output: detected instance information for vehicles, pedestrians, cyclists, or similar actor classes.
Output use case: instance-consistent dynamic masks for tracking, mapping, and prediction.

The public code organizes dataloaders, model modules, scripts, and visualization utilities.
The method uses 4D sparse voxel motion encoding as the temporal backbone.
It extracts current-frame instance features to determine which detected instances are actually moving.
The upsample fusion module maps fused features back to point-wise MOS predictions.
The repository provides dataset preparation for SemanticKITTI, KITTI-road, and added instance labels.
It is a LiDAR-only model at inference time.

The paper evaluates on the LiDAR-MOS benchmark based on SemanticKITTI.
It compares against prior state of the art for moving object segmentation.
It reports improved MOS performance by integrating instance information.
It also reports generalization to Apollo using a model pre-trained on SemanticKITTI.
Training requires more annotation structure than pure binary MOS because instance supervision is part of the design.
Evaluation remains point-level MOS, with instance quality acting as an internal mechanism rather than the main metric.

Instance reasoning improves spatial integrity of moving-object masks.
Better handles temporarily static actors by reasoning about object instances rather than isolated points.
Directly useful for track birth because moving labels can be associated with object hypotheses.
Public code and pretrained model release lower reproduction cost.
Bridges binary MOS and richer 4D segmentation without requiring full panoptic labels.
Good conceptual fit for mixed static/dynamic fleets around airport stands.

Requires instance labels or bounding boxes during training, increasing airside annotation cost.
Instance detector bias can suppress unusual airport actors such as belt loaders, aircraft tugs, dollies, cones, and FOD.
Very large objects such as aircraft may not match road-object scale assumptions.
Slow motion and stop-start behavior can still be ambiguous.
False instance grouping can move static points into dynamic masks or split one moving object.
Sparse far-range points reduce both instance and motion confidence.

High fit for GSE and personnel because object-level consistency matters more than isolated moving points.
Useful for distinguishing parked GSE from GSE beginning to maneuver near an aircraft.
Needs airport-specific instance taxonomy and bounding boxes before safety use.
Aircraft should probably be treated as separate static/movable structural classes, not generic vehicles.
Instance-aware masks can support apron prediction and right-of-way logic when fused with trackers.
Should be paired with conservative obstacle persistence for safety around crouched workers and occluded dollies.

Build an airside label schema that separates instance actor classes from static background.
Start with SemanticKITTI pretrained weights only for representation transfer, not final acceptance.
Add annotation examples for low-speed starts, reversing, stopped-before-moving, and towing interactions.
Inspect object-level false positives, not only point-level IoU.
For ROS integration, publish both point labels and instance IDs when available.
Use temporal smoothing outside the network to prevent one-frame dynamic mask flicker.