Instantaneous Motion Perception

What It Is

Instantaneous Motion Perception refers here to the CVPR 2024 method "Instantaneous Perception of Moving Objects in 3D."
The paper introduces the S'More framework for subtle motion detection and motion-flow estimation from LiDAR.
It targets small, safety-critical object motion that begins before standard trackers produce a stable velocity.
Examples include a parked vehicle starting to move or a vehicle stopping and reversing.
The method is object-centric rather than full-scene panoptic segmentation.
It includes a benchmark extracted from Waymo for small-motion evaluation.

Standard LiDAR motion estimation struggles with the "swimming" artifact: sparse point samples vary across frames even for static objects.
The method uses local occupancy completion to densify the visible object surface cue.
Occupancy completion is learned jointly with moving-object detection and motion-flow estimation.
It avoids full object shape completion and focuses on the visible surface where motion evidence is strongest.
The model is trained specifically in the subtle-motion regime.
It detects and quantifies motion as soon as objects begin to move.

Input: sequential LiDAR point clouds over a short temporal window.
Input: object proposals or objects of interest, with background and large-motion objects filtered out.
Input assumption: ego-motion can be estimated reliably by ICP, GPS/INS, or a similar localization stack.
Intermediate representation: per-object voxelized occupancy grids.
Output: binary small-motion decision for each object.
Output: point-level or object-level motion flow estimate for the moving object.

The framework first identifies objects of interest.
It voxelizes each object's sequential LiDAR points.
An encoder-decoder network predicts local occupancy completion.
Completed occupancy features feed a motion detector.
The same features feed a motion-flow predictor.
Occupancy supervision uses accumulated nearby-frame LiDAR points to create denser visible-surface targets.

The paper contributes a benchmark for subtle vehicle motion from the Waymo dataset.
It compares against ICP, point-to-plane ICP, generalized ICP, CenterPoint, FastNSF, and tracking-style baselines.
Evaluation focuses on detection F1 and motion estimation quality under latency-sensitive settings.
The authors report superior performance for subtle motions compared with standard 3D motion estimation approaches.
The published scope focuses on vehicles; the paper leaves human-category extension for future work.
The method is a CVPR 2024 paper with open-access PDF and arXiv version.

Directly targets early motion cues that matter before a track has several frames of evidence.
Explicitly handles LiDAR swimming artifacts through occupancy completion.
Complements MOS: it estimates subtle object motion, not only moving/static point labels.
Object-centric processing can be plugged into a detector/tracker stack.
Particularly relevant for start/stop events near conflict points.
The benchmark frames a useful metric for reaction-time-sensitive perception.

Requires reliable object proposals or track candidates before subtle motion inference.
Vehicle-focused training may not transfer to pedestrians, dollies, aircraft tugs, or articulated GSE.
Assumes ego-motion is reliable enough to subtract vehicle motion.
Small objects with few points can still lack enough surface evidence for occupancy completion.
Full-scene coverage is not provided; it is a targeted object module.
It can miss hazards that are not proposed by the upstream detector.

High value for slow start detection around gates, crossings, and service roads.
Relevant to apron cases where a parked tug, belt loader, baggage train, or bus begins moving at low speed.
Needs extension beyond road vehicles to GSE shapes and personnel.
Could run only on objects near the ego vehicle's planned path or aircraft clearance envelope.
the reference airside AV stack's RTK/IMU/GTSAM stack is a good source of ego-motion for this method's assumptions.
Should be fused with radar Doppler where available, because radar gives direct radial velocity for many moving actors.

Use as a second-stage module after a detector/tracker, not as a replacement for dense obstacle detection.
Build an airside subtle-motion validation set with start, stop, reverse, inching, and tow initiation events.
Include negative examples where LiDAR sampling causes apparent motion on static aircraft and parked GSE.
Measure time-to-detect relative to track-velocity baselines, not only final F1.
Publish an "early motion" flag with confidence and estimated motion vector.
Do not trigger aggressive planning behavior from this module alone; use it to increase caution and prediction priority.