Skip to content

Replay and Scenario Mining Operations

Last updated: 2026-05-09

Why It Matters

Autonomous fleet logs contain many routine miles and a small number of high-value moments. Scenario mining turns uncurated logs into replayable evidence: near conflicts, strange object interactions, failed localization, blocked routes, rare weather, confusing ground markings, and other long-tail cases that should become regression tests.

This page covers the operational loop from mined fleet event to replayable scenario asset. It does not define simulator physics or the full safety validation strategy.

Operating Model

  1. Ingest candidate events from triggers, operator notes, incident reports, model disagreement, anomaly detectors, and natural-language scenario search.
  2. Index clips with map context, ego trajectory, actor tracks, weather, lighting, airport zone, model versions, and intervention metadata.
  3. Mine scenarios using both rule queries and embedding or language search. Argoverse's scenario-mining task frames the problem as retrieving specific safety-relevant scenarios from large multi-modal logs localized to HD maps.
  4. Normalize each accepted scenario into a scenario record: intent, actors, dynamic sequence, trigger conditions, ODD tags, source clip, and expected system response.
  5. Represent dynamic replay intent using ASAM OpenSCENARIO concepts where practical: entities, storyboard, maneuvers, events, actions, triggers, conditions, and external road-network references.
  6. Represent object and scene annotations using ASAM OpenLABEL-compatible fields where practical: object identity, class, 2D/3D geometry, segmentation, relations, actions, intentions, and taxonomy references.
  7. Promote scenarios by state: candidate, triaged, replay_ready, regression_required, retired.

Evidence Artifacts

ArtifactMinimum contentsOwner
Scenario mining queryQuery text or rule, search index version, time window, filters, requesterScenario curator
Candidate clip manifestSource log IDs, timestamps, map version, sensor availability, model versionsData platform
Triage recordWhy the clip matters, duplicate check, severity, regression prioritySafety validation
Scenario metadataActors, maneuvers, triggers, ODD tags, expected behavior, acceptance metricScenario curator
Annotation packageOpenLABEL-style labels, taxonomy version, QA result, reviewerLabel operations
Replay packageSimulator version, maps, seed, initial state, scenario file, runtime configSimulation owner
Regression resultPass/fail, metric deltas, videos, logs, model version, waiver if anySafety validation

Acceptance Checks

  • Every replay scenario links back to immutable raw log, map, label, and processing snapshots.
  • Scenario metadata has enough structure for search, replay selection, and coverage accounting.
  • Scenario labels use a controlled taxonomy and record the schema version.
  • The replay package can be executed by a clean worker without local manual files.
  • The expected behavior is measurable: clearance, stop distance, yield behavior, route recovery, localization bound, or intervention avoidance.
  • Regression-required scenarios are included in release gates before a model can be promoted.
  • Retired scenarios keep a reason, replacement scenario if any, and last passing release.

Failure Modes

Failure modeConsequenceControl
Scenario remains a video bookmarkCannot run regression or measure improvementRequire replay package before promotion
Query results are not versionedMining cannot be repeated after index changesStore query and index version
Duplicate scenarios flood the suiteRelease gates become slow without added coverageCluster and deduplicate before promotion
Labels drift across teamsScenario semantics change over timeVersion taxonomy and run label QA
Replay omits map or weather contextTest no longer represents the field eventStore map, zone, weather, lighting, and initial state
Expected behavior is vagueReview becomes subjectiveDefine quantitative pass criteria
Scenario suite only includes failuresOverfits to known bad cases and misses normal behaviorMaintain balanced coverage by ODD and maneuver
  • 50-cloud-fleet/mlops/data-flywheel-airside.md
  • 50-cloud-fleet/data-platform/fleet-data-pipeline.md
  • 30-autonomy-stack/simulation/simulators-for-airside.md
  • 30-autonomy-stack/end-to-end-driving/airside-autonomy-benchmark-spec.md
  • 60-safety-validation/verification-validation/airside-scenario-taxonomy.md
  • 60-safety-validation/verification-validation/shadow-mode.md
  • 60-safety-validation/verification-validation/testing-validation-methodology.md

Sources

Public research notes collected from public sources.