Skip to content

Multi-Sensor Calibration Release Benchmark

Last updated: 2026-05-09

Purpose

This protocol defines release benchmarking for multi-sensor calibration packages used by perception, SLAM/localization, mapping, and runtime assurance. A calibration package is released only when it is accurate enough, compatible with the active artifact set, monitored in the field, and proven not to degrade downstream safety metrics.

Benchmark Scope

Pair or transformMetricsDownstream effect
Camera intrinsicsReprojection error, distortion residual, coverageImage detection, projection, LiDAR-camera calibration
LiDAR-camera extrinsicsReprojection residual, edge alignment, calibration-status confidenceFusion, unknown/OOD projection, semantic point labeling
LiDAR-LiDAR extrinsicsOverlap ICP residual, rotation/translation delta, fitnessPoint-cloud aggregation, occupancy, map building
LiDAR-IMU extrinsics/timeMotion excitation residual, time offset, deskew qualitySLAM, localization, velocity, mapping
Radar-camera/LiDARAssociation residual, velocity consistency, time offsetAdverse-weather perception and tracking
Sensor-kit-to-baseSurvey/CAD delta, ego box consistency, route clearanceCollision envelope and planner clearance
Map datum/frameSurvey residual, tile seam alignment, geofence deltaLocalization and route/geofence correctness

Benchmark Tiers

TierPurposeRequired for
K-B0 smokeSchema, frame tree, serial, and manifest checksEvery calibration package
K-B1 lab targetTarget-based static validationNew install or repaired sensor
K-B2 route targetlessAirside route/overlap validationProduction release
K-B3 fault injectionKnown extrinsic/time perturbationsMonitor qualification and threshold changes
K-B4 downstream replayPerception-SLAM metrics under candidate calibrationBehavior-affecting release
K-B5 field watchCanary drift and alert performanceSite/fleet promotion

Procedure

  1. Freeze candidate calibration package, active map, model, runtime, sensor firmware, and benchmark manifest.
  2. Run K-B0 to verify package structure, sensor serials, TF tree hash, schema, and compatibility.
  3. Run K-B1 using approved targets/fixtures when the physical installation changed.
  4. Run K-B2 on route data with sufficient overlap, feature diversity, speed, lighting, and operational context.
  5. Run K-B3 perturbations for each monitored sensor pair and verify monitor state/action.
  6. Run K-B4 downstream replay for object detection, occupancy/free-space, localization, and map alignment.
  7. Run K-B5 canary watch and compare drift metrics to release envelope.
  8. Publish pass, pass with ODD restriction, inconclusive, or block.

Metrics

MetricDefinitionRelease interpretation
Reprojection RMSEPixel error for projected 3D features/targetsMust pass by camera, range, and image region
LiDAR overlap residualRegistration error in shared FOVMust pass with feature-quality prerequisite
Extrinsic deltaDifference from previous approved transformLarge unexplained delta requires physical inspection
Time offsetEstimated sensor-to-reference offsetMust be inside timestamp validation envelope
Deskew residualMotion distortion left after time/extrinsic correctionBlocks SLAM/map release if high
Downstream ATE/RPELocalization error under candidate calibrationNo critical route regression
False-free-space deltaChange in protected-zone false-free-spaceAny regression is safety review; blocker if false-free-space occurs
Unknown/OOD action deltaChange in unknown/OOD detection/actionCannot suppress safety-relevant unknowns
Monitor detection latencyTime from injected drift to red/yellow stateMust be before unsafe consumer output

Suggested Gate Pattern

Exact thresholds are program-specific. A defensible release pattern is:

GatePass conditionBlock condition
CAL-0 provenancePackage links sensors, vehicle, firmware, tool, route, operator, signaturesUnknown sensor or untraceable transform
CAL-1 static accuracyTarget/CAD/survey residuals inside installation envelopeBaseline is already outside tolerance
CAL-2 route robustnessResiduals pass in representative airside geometryPass only in easy lab scene
CAL-3 downstream no-regressLocalization, occupancy, object projection, map QA do not regress in critical slicesCalibration improves residual but harms safety metric
CAL-4 monitor actionDrift/time perturbations detected and consumed by runtime responseMonitor logs but planner/runtime does not act
CAL-5 compatibilityPackage activates only on eligible vehicles, sensors, maps, and runtimeOTA can install on wrong serial or sensor kit
CAL-6 field watchCanary residuals and alert rates remain inside envelopeDrift cluster or unresolved false-free-space event

Fault Injection

InjectionExpected result
Camera yaw/pitch perturbationCalibration monitor red/yellow and projection error increase
LiDAR vertical translation perturbationOverlap residual or occupancy/free-space monitor detects inconsistency
LiDAR-IMU time offsetDeskew/localization metric degrades and timing/calibration monitor fires
Wrong sensor serial packageCompatibility matrix blocks activation
Feature-poor route segmentMonitor reports unknown/prerequisite failure, not false green
Wet/glare route sliceDownstream free-space and OOD conservatism remain valid

Evidence Artifacts

ArtifactContents
Benchmark manifestCandidate package, active artifacts, routes, datasets, sensor serials
Calibration reportMetrics by pair, route, range, image region, feature prerequisite
Downstream reportDetection/free-space/localization/map deltas and failure packets
Fault-injection reportPerturbation values, detection latency, monitor action
Compatibility recordManifest eligibility and blocked negative tests
Canary reportResidual trends, alerts, maintenance findings, closure
Release recommendationPass, restricted pass, inconclusive, or block
  • 40-runtime-systems/software-operations/sensor-calibration-fleet-ops.md
  • 20-av-platform/sensors/calibration-tracking.md
  • 20-av-platform/sensors/multi-lidar-calibration.md
  • 60-safety-validation/verification-validation/slam-map-benchmark-protocol.md
  • 60-safety-validation/verification-validation/perception-slam-leaderboard-interpretation.md
  • 60-safety-validation/runtime-assurance/monitor-qualification-evidence.md

Sources

Public research notes collected from public sources.