Skip to content

Perception-SLAM Map Reliability Evidence Case

Last updated: 2026-05-09

Purpose

This evidence case defines the safety argument and artifact package required to release a perception-SLAM stack and its HD/localization map for airside autonomous ground vehicle operation. It treats map reliability as a safety property: the vehicle must localize, perceive, and reason about free space without being misled by stale infrastructure, transient aircraft/GSE, sensor corruption, map ghosts, or uncalibrated uncertainty.

This file is the top-level evidence wrapper for:

Safety Claim

Claim MSR-1: For the approved airside ODD, the released perception-SLAM stack and map package provide localization, obstacle interpretation, static-world representation, and uncertainty signals that are reliable enough to support the operational safety case.

This claim is accepted only when the evidence below is complete, reproducible, and traceable to a specific vehicle configuration, map version, software build, sensor calibration, ODD, and release gate decision.

Argument Structure

Argument nodeRequired evidenceAcceptance rule
ODD is explicitAirport, route class, speed envelope, lighting, weather, GNSS availability, apron/taxiway/service-road zone, aircraft proximity classODD file and scenario taxonomy are versioned; no test result is accepted without ODD tags
Map is geometrically reliableGround-truth survey, repeated traversals, ATE/RPE, scan-to-map residuals, loop-closure error, map tile QAAll mandatory benchmark slices pass the SLAM map benchmark protocol
Map is semantically reliableStatic/movable/dynamic/FOD/hazard layer labels, reviewer agreement, false-free-space analysisNo critical false-free-space defect in release-candidate tiles
Localization failures are boundedDegradation detection, covariance/score calibration, relocalization latency, fallback behaviorFailures trigger speed reduction, controlled stop, map quarantine, or remote review within design limits
Perception and SLAM withstand credible corruptionsSensor dropout, beam loss, fog/rain/spray, time skew, extrinsic drift, GNSS denial, stale map, moving-object injectionCorruption campaign passes release thresholds in the robustness protocol
Statistical claims are defensiblePre-registered metrics, confidence intervals, sequential test rules, sample independence controlsStatistical release decision follows perception-slam-statistical-validity-protocol.md
Uncertainty is useful operationallyCalibration curves, ECE/NLL/Brier, conformal coverage, risk bins, alert precisionCalibration gates pass in uncertainty-calibration-perception-slam-release-gates.md
Fleet monitoring closes the loopData contract, event triggers, map defect reports, SGO-style incident metadata, post-release dashboardsFleet data satisfies perception-slam-fleet-data-contract.md

ODD and Airside Relevance

The evidence case must be sliced by airside operating context, not only by aggregate mileage:

ODD sliceReliability concernMinimum evidence
Apron stand approachAircraft geometry, temporary GSE, personnel, chocks, cones, jet bridge occlusionStand-specific map QA, aircraft-present/absent traversals, movable-static classification review
Service road transitRepeated route localization, occluding parked vehicles, road markings, speed bumpsMulti-session localization drift and relocalization tests
Taxiway crossing supportGeofence precision, clearance state, line marking interpretation, wide open spaces with weak featuresMap-to-geofence consistency, GNSS/INS degradation tests, route-level hazard review
Depot/maintenance areaDense parked vehicles, charging equipment, frequent layout changesHigh-frequency map-change detection and quarantine evidence
Night/wet/heavy rainLiDAR/camera degradation, reflections, missing ground returnsWeather-tagged benchmark and corruption campaign results
De-icing or jet blast adjacencySpray, thermal distortion, debris, visibility changesOperational exclusion or explicit sensor robustness evidence

Evidence Artifacts

Each release candidate creates an immutable evidence bundle:

ArtifactOwnerRequired content
Evidence manifestSafety validation leadBuild ID, map ID, calibration ID, vehicle config, ODD, test campaign IDs, approval status
Map package manifestMapping leadTile hashes, coordinate frames, source sessions, survey reference, semantic layer versions
Dataset manifestData platform ownerBag/MCAP IDs, scenario tags, weather/light tags, sensor health, privacy/export controls
Benchmark reportV&V leadMetric tables, confidence intervals, failed slices, residual risk notes
Robustness campaign reportPerception ownerCorruption matrix, fault injection seeds, severity levels, observed failure modes
Calibration reportML/perception ownerReliability diagrams, ECE/NLL/Brier, conformal coverage, calibration-set provenance
Shadow-mode reportFleet ops ownerDisengagements, interventions, map-localization alerts, route exposure, operator notes
Defect disposition logSafety boardOpen defects, severity, mitigations, waiver rationale, expiry date
Release decision recordRelease managerGate results, sign-offs, rollback target, post-release monitoring window

Core Metrics

MetricDefinitionSafety interpretation
Absolute trajectory error (ATE)Global pose error against ground truth or surveyed trajectoryLong-horizon localization and map alignment health
Relative pose error (RPE)Local motion error over fixed intervalsShort-horizon stability relevant to planning and control
Scan-to-map residualDistance/intensity/semantic residual between current scan and mapOnline indicator of stale map or localization degradation
Relocalization successFraction of localization-loss events recovered within time/distance budgetAbility to recover without unsafe drift
False-free-space rateCases where the map/perception stack marks occupied or hazardous space as traversableCritical safety metric; zero tolerance in protected zones
Static preservationFraction of valid static structure retained across map updatesProtects localization features and infrastructure geometry
Dynamic ghost rateTransient aircraft/GSE/person points promoted to static mapPrevents stale obstacles and localization bias
Movable-static review precisionCorrect routing of temporary barriers, cones, carts, and parked GSE to review/quarantinePrevents unsafe automatic map publication
Uncertainty coverageEmpirical error rate inside declared pose/object/map confidence setsDetermines whether runtime monitors can trust uncertainty

Release Gates

GateEntry conditionPass conditionBlock condition
G0 configuration freezeCandidate software, sensor calibration, and map package are hashedNo untracked binary/model/map artifactsMissing config traceability
G1 offline benchmarkPublic/proxy and internal airside datasets are processedAll critical metrics pass by ODD sliceAny critical false-free-space or localization-loss defect
G2 corruption and fault injectionBaseline benchmark has passedNo unmitigated catastrophic/high-severity failure under credible corruptionsSensor fault produces silent overconfidence
G3 statistical validitySample plan and metric definitions are lockedConfidence/credible intervals meet the statistical protocolCherry-picked data, leaked test set, or underpowered claim
G4 uncertainty calibrationIndependent calibration and test partitions existECE/NLL/conformal coverage gates pass by risk sliceOverconfident errors near aircraft, people, or geofence boundaries
G5 shadow modeVehicle operates non-autonomously or under safety operatorNo unresolved critical events; intervention rate below thresholdRepeated unexplained localization or map inconsistency alerts
G6 safety board releaseAll reports are completeSafety, mapping, perception, data, fleet ops, and release owner signOpen critical defect without approved ODD restriction
G7 post-release watchOTA/map release is deployed to limited fleet7-day watch passes with no new critical regressionRollback, route disable, or map quarantine triggered

Failure Modes Covered

Failure modeDetection evidenceMitigation evidence
Stale map after stand layout changeMap-change detector, reviewer report, scan-to-map residualTile quarantine, temporary ODD restriction, remote operator bulletin
Aircraft/GSE ghost in static mapDynamic rejection test, aircraft-present/absent pair comparisonMovable-static layer, human review, map publication block
False free space around FOD or chocksFOD/hazard slice evaluation, airside dynamic map cleaning checksPreserve as current-world alert, reduce speed, route block
GNSS denial or multipathGNSS dropout and spoofing testsLiDAR-inertial fallback, covariance inflation, controlled stop
Sensor extrinsic driftCalibration residual monitors, cross-sensor consistencyMaintenance hold, calibration refresh, fault isolation
Time synchronization skewTimestamp fault injection, residual jump detectionTime-sync alert, data invalidation, degraded mode
Weather-induced overconfidenceRain/fog/wet-ground corruption tests and calibration binsSensor health gating, radar/thermal fallback, ODD restriction
Loop closure into wrong placeRepeated-route benchmark, topological consistency checksCandidate loop review, map rollback, relocalization guard
Overfit benchmark releaseDataset lineage, holdout controls, pre-registrationLocked test set, blind review, fleet validation

Owner Handoffs

FromToHandoff package
MappingV&VMap package, source traversals, survey reference, map-change log
Perception/SLAMV&VBuild, model weights, calibration, runtime diagnostics schema
V&VSafety boardBenchmark, robustness, calibration, and statistical decision reports
Data platformV&V and safetyDataset manifests, data quality report, retention/legal flags
Fleet operationsSafety boardShadow-mode exposure, interventions, operator reports, route restrictions
Safety boardRelease managementSigned release decision, ODD restrictions, rollback conditions
Release managementFleet operationsDeployment plan, watch metrics, rollback package, communication notes

Minimum Evidence by Release Type

Release typeRequired gatesNotes
Patch with no perception/SLAM/map behavior changeG0, regression subset, G6, G7Requires static proof that behavior is unaffected
Map tile update inside existing airportG0, G1 tile benchmark, G3 sample check, G5 limited shadow, G6, G7Use stricter rules near aircraft stands and taxiway crossings
New airport or materially new ODDG0 through G7Treat as a new validation campaign
Sensor calibration updateG0, G1, G2 extrinsic/time-sync faults, G4, G6, G7Include cross-sensor alignment and calibration drift evidence
Model/perception update affecting uncertaintyG0 through G7Calibration and conformal evidence cannot be waived

Sources

Public research notes collected from public sources.