Skip to content

Scene Flow Datasets and Benchmarks

Last updated: 2026-05-09

Why It Matters

Scene flow estimates 3D motion for points or voxels between frames. It is the motion primitive behind occupancy flow, dynamic object segmentation, point-cloud forecasting, and flow-aware planning. For airside autonomy, scene flow is useful because not all hazards are clean tracked boxes: a pushback tug, wing sweep, baggage train, walking ground crew, or loose object can be better represented as moving 3D structure than as a single centroid trajectory.

The benchmarks below span real LiDAR, stereo image scene flow, and synthetic dense supervision. Use them to separate algorithm capability from airside domain fit.

Dataset/Benchmark Table

Dataset / benchmarkSource URLDomain and sensorsLabels / taskBest useMain transfer risk
Argoverse 2 3D Scene Flowhttps://argoverse.github.io/user-guide/tasks/3d_scene_flow.htmlAV2 LiDAR sweeps at 0.1 s intervals with ego motion, object boxes, and ground masksN x 3 flow vectors plus dynamic/static segmentation; object boxes generate piecewise-rigid labelsReal AV LiDAR scene flow, dynamic/static point segmentation, long-range road motionFlow labels are box-derived and road-centric; aircraft articulation and low-speed GSE are absent
AV2 Scene Flow Challenge / Bucket Normalized EPEhttps://www.argoverse.org/sceneflow.htmlAV2 and multi-dataset challenge setup with leaderboard supportSupervised and unsupervised tracks with expanded range and cross-dataset generalization emphasisModern evaluation protocol that reduces bias toward easy/background pointsChallenge success does not guarantee airport object taxonomy or multi-LiDAR deployment
Waymo Scene Flow Labelshttps://waymo.com/research/scalable-scene-flow-from-point-clouds-in-the-real-world/Waymo Open Dataset LiDAR, labels derived from tracked 3D objectsPer-point motion direction and magnitude; metrics account for ego motion and object typeLarge-scale real-world LiDAR flow training and full-cloud inference evaluationLabels are derived from tracked boxes and urban road actors, not aircraft/GSE
KITTI Scene Flow 2015https://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereoStereo image pairs with dynamic scenes and semi-automatic ground truthStereo disparity at two times plus optical flow; scene flow outlier rateClassic benchmark for image-based scene flow, optical-flow/depth consistency, and foreground/background breakdownsCamera stereo benchmark is sparse for LiDAR-first AVs and much smaller than modern AV datasets
FlyingThings3Dhttps://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasetsSynthetic rendered scenes for disparity, optical flow, and scene flowDense synthetic ground truth for optical flow, disparity, disparity change, and scene flowPretraining, ablations, dense supervision, and sanity checks for geometry learningSynthetic object motion and visual appearance must be bridged to real LiDAR/camera data

Metrics

MetricBenchmark familyWhat to report
EPE3DLiDAR point scene flowMean and percentile Euclidean endpoint error in meters, split by static/dynamic, distance, and actor type
Bucket Normalized EPEAV2 challenge-style evaluationBucketed error that reduces domination by background or high-frequency easy points
Dynamic/static segmentation F1AV2-style output and MOS couplingPrecision, recall, and F1 for dynamic points, with speed thresholds stated explicitly
Outlier rateKITTI-style scene flowD1, D2, Fl, and SF outlier percentages; report foreground, background, all, and non-occluded where applicable
Flow angular and speed errorAirside planning transferDirection error and speed magnitude error for slow GSE, walking personnel, and articulated equipment
Temporal consistencyWorld-model inputsJitter, sign flips, and frame-to-frame flow stability over multi-frame windows
RuntimeDeploymentMean/P95/P99 latency on target hardware and full point-cloud size, not only downsampled points

Airside/Indoor/Outdoor Transfer

Transfer pathUseful signalAirside gap
AV2/Waymo road LiDAR to airsideReal LiDAR density, ego-motion compensation, tracked actor flow, dynamic/static segmentationLow-speed apron vehicles, aircraft pushback, articulated aircraft parts, sparse open apron structure
KITTI stereo to airside camerasStereo/depth/flow consistency and foreground/background outlier accountingOlder sensor setup, small benchmark, and no airport traffic
FlyingThings3D to real dataDense labels for pretraining and controlled geometric failure analysisSynthetic-to-real domain gap in texture, LiDAR sparsity, weather, lighting, and scale
Scene flow to occupancy flowPer-point motion vectors can supervise per-voxel flow and future occupancyVoxelization can hide small FOD and thin moving structures unless resolution and labels are validated

Validation Guidance

  1. Establish baseline EPE3D and dynamic F1 on AV2 or Waymo before training on private airside logs.
  2. Report dynamic-object performance separately from static background. A model can look accurate by predicting mostly ego-motion on static surfaces.
  3. Add low-speed thresholds relevant to apron motion. A 0.5 m/s dynamic cutoff may miss creeping GSE, tow bars, or aircraft pushback motion.
  4. Evaluate long, thin, and articulated structures: belt-loader conveyors, baggage carts, dollies, aircraft tails/wings, jet bridges, and personnel partially occluded by equipment.
  5. Validate multi-LiDAR flow before fusion and after fusion. Per-sensor time offset or extrinsic error can look like scene motion.
  6. Feed scene-flow outputs into downstream occupancy, tracking, and planning tests. Standalone EPE is not enough for safety acceptance.

Sources

Public research notes collected from public sources.