Scene Flow Datasets and Benchmarks

Last updated: 2026-05-09

Why It Matters

Scene flow estimates 3D motion for points or voxels between frames. It is the motion primitive behind occupancy flow, dynamic object segmentation, point-cloud forecasting, and flow-aware planning. For airside autonomy, scene flow is useful because not all hazards are clean tracked boxes: a pushback tug, wing sweep, baggage train, walking ground crew, or loose object can be better represented as moving 3D structure than as a single centroid trajectory.

The benchmarks below span real LiDAR, stereo image scene flow, and synthetic dense supervision. Use them to separate algorithm capability from airside domain fit.

Dataset/Benchmark Table

Dataset / benchmark	Source URL	Domain and sensors	Labels / task	Best use	Main transfer risk
Argoverse 2 3D Scene Flow	https://argoverse.github.io/user-guide/tasks/3d_scene_flow.html	AV2 LiDAR sweeps at 0.1 s intervals with ego motion, object boxes, and ground masks	N x 3 flow vectors plus dynamic/static segmentation; object boxes generate piecewise-rigid labels	Real AV LiDAR scene flow, dynamic/static point segmentation, long-range road motion	Flow labels are box-derived and road-centric; aircraft articulation and low-speed GSE are absent
AV2 Scene Flow Challenge / Bucket Normalized EPE	https://www.argoverse.org/sceneflow.html	AV2 and multi-dataset challenge setup with leaderboard support	Supervised and unsupervised tracks with expanded range and cross-dataset generalization emphasis	Modern evaluation protocol that reduces bias toward easy/background points	Challenge success does not guarantee airport object taxonomy or multi-LiDAR deployment
Waymo Scene Flow Labels	https://waymo.com/research/scalable-scene-flow-from-point-clouds-in-the-real-world/	Waymo Open Dataset LiDAR, labels derived from tracked 3D objects	Per-point motion direction and magnitude; metrics account for ego motion and object type	Large-scale real-world LiDAR flow training and full-cloud inference evaluation	Labels are derived from tracked boxes and urban road actors, not aircraft/GSE
KITTI Scene Flow 2015	https://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo	Stereo image pairs with dynamic scenes and semi-automatic ground truth	Stereo disparity at two times plus optical flow; scene flow outlier rate	Classic benchmark for image-based scene flow, optical-flow/depth consistency, and foreground/background breakdowns	Camera stereo benchmark is sparse for LiDAR-first AVs and much smaller than modern AV datasets
FlyingThings3D	https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets	Synthetic rendered scenes for disparity, optical flow, and scene flow	Dense synthetic ground truth for optical flow, disparity, disparity change, and scene flow	Pretraining, ablations, dense supervision, and sanity checks for geometry learning	Synthetic object motion and visual appearance must be bridged to real LiDAR/camera data

Metrics

Metric	Benchmark family	What to report
EPE3D	LiDAR point scene flow	Mean and percentile Euclidean endpoint error in meters, split by static/dynamic, distance, and actor type
Bucket Normalized EPE	AV2 challenge-style evaluation	Bucketed error that reduces domination by background or high-frequency easy points
Dynamic/static segmentation F1	AV2-style output and MOS coupling	Precision, recall, and F1 for dynamic points, with speed thresholds stated explicitly
Outlier rate	KITTI-style scene flow	D1, D2, Fl, and SF outlier percentages; report foreground, background, all, and non-occluded where applicable
Flow angular and speed error	Airside planning transfer	Direction error and speed magnitude error for slow GSE, walking personnel, and articulated equipment
Temporal consistency	World-model inputs	Jitter, sign flips, and frame-to-frame flow stability over multi-frame windows
Runtime	Deployment	Mean/P95/P99 latency on target hardware and full point-cloud size, not only downsampled points

Airside/Indoor/Outdoor Transfer

Transfer path	Useful signal	Airside gap
AV2/Waymo road LiDAR to airside	Real LiDAR density, ego-motion compensation, tracked actor flow, dynamic/static segmentation	Low-speed apron vehicles, aircraft pushback, articulated aircraft parts, sparse open apron structure
KITTI stereo to airside cameras	Stereo/depth/flow consistency and foreground/background outlier accounting	Older sensor setup, small benchmark, and no airport traffic
FlyingThings3D to real data	Dense labels for pretraining and controlled geometric failure analysis	Synthetic-to-real domain gap in texture, LiDAR sparsity, weather, lighting, and scale
Scene flow to occupancy flow	Per-point motion vectors can supervise per-voxel flow and future occupancy	Voxelization can hide small FOD and thin moving structures unless resolution and labels are validated

Validation Guidance

Establish baseline EPE3D and dynamic F1 on AV2 or Waymo before training on private airside logs.
Report dynamic-object performance separately from static background. A model can look accurate by predicting mostly ego-motion on static surfaces.
Add low-speed thresholds relevant to apron motion. A 0.5 m/s dynamic cutoff may miss creeping GSE, tow bars, or aircraft pushback motion.
Evaluate long, thin, and articulated structures: belt-loader conveyors, baggage carts, dollies, aircraft tails/wings, jet bridges, and personnel partially occluded by equipment.
Validate multi-LiDAR flow before fusion and after fusion. Per-sensor time offset or extrinsic error can look like scene motion.
Feed scene-flow outputs into downstream occupancy, tracking, and planning tests. Standalone EPE is not enough for safety acceptance.

Sources

Argoverse 2 scene flow task: https://argoverse.github.io/user-guide/tasks/3d_scene_flow.html
AV2 Scene Flow Challenge: https://www.argoverse.org/sceneflow.html
BucketedSceneFlowEval repository: https://github.com/argoverse/BucketedSceneFlowEval
Waymo scene flow research page: https://waymo.com/research/scalable-scene-flow-from-point-clouds-in-the-real-world/
Waymo Open Dataset download page with scene flow labels: https://waymo.com/open/download
Waymo scene flow paper: https://arxiv.org/abs/2103.01306
KITTI Scene Flow 2015 benchmark: https://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo
KITTI Object Scene Flow paper: https://www.cvlibs.net/publications/Menze2015CVPR.pdf
FlyingThings3D / Scene Flow Datasets: https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets
Scene Flow Datasets paper: https://arxiv.org/abs/1512.02134

SLAM Methods

Methods

Scene Flow Datasets and Benchmarks ​

Why It Matters ​

Dataset/Benchmark Table ​

Metrics ​

Airside/Indoor/Outdoor Transfer ​

Validation Guidance ​

Sources ​