Model Governance and Release Evidence
Last updated: 2026-05-09
Why It Matters
An autonomy model release is not just a better checkpoint. It is a controlled change to vehicle behavior, data assumptions, safety evidence, runtime compatibility, and rollback posture. The release system must prove which model version is approved, what data and tests support it, where it is allowed to run, and how the fleet can return to the previous safe version.
Use this page for model release evidence. It does not replace OTA controls, software supply-chain evidence, or the safety case; it is the MLOps evidence packet that those systems consume.
Operating Model
- Register every deployable model in a model registry before release review. Use immutable model versions and mutable aliases such as
candidate,shadow,champion, androllbackfor deployment routing. - Attach release metadata to the model version: training run ID, code commit, dataset snapshots, label schema, feature schema, calibration package, runtime container, hardware target, and ODD scope.
- Treat release approval as a claims-and-evidence review. The release claim states what improved, what did not regress, which ODD is covered, and which operational risk is being reduced.
- Require named approval from the model owner, data owner, runtime owner, safety owner, and fleet operations owner before moving the
championalias. - Move through gates: offline metrics, scenario replay, shadow execution, limited canary, fleet expansion. Each gate either promotes, holds, or rejects the exact model version.
- Keep rollback executable. The rollback model must be compatible with the active runtime, map schema, calibration schema, and config bundle.
Evidence Artifacts
| Artifact | Minimum contents | Owner |
|---|---|---|
| Model registry record | Registered model, immutable version, aliases, tags, release notes | MLOps |
| Training provenance | Run ID, code commit, dependency lock, training config, random seeds, hardware | Model owner |
| Dataset manifest | Iceberg/DVC snapshot IDs, label schema, excluded data, leakage checks | Data owner |
| Evaluation report | Primary metrics, calibration, uncertainty, class slices, airport and weather slices | Model owner |
| Scenario replay report | Required scenario suite, new mined scenarios, failures, waivers | Safety validation |
| Shadow-mode report | Disagreement with champion, intervention correlation, latency and resource use | Fleet operations |
| Safety case link | Claim IDs supported by this release and evidence IDs attached to each claim | Safety owner |
| Release decision record | Approvers, residual risks, rollout plan, rollback trigger, expiry date | Release manager |
Acceptance Checks
- The model can be loaded by registry alias and by immutable version.
- The model version has dataset, code, config, and runtime provenance sufficient to rebuild or explain the release.
- The evaluation report includes both aggregate metrics and operational slices for airport zone, lighting, weather, vehicle platform, and object class.
- No critical scenario replay regression is open without an approved safety waiver and an explicit operational mitigation.
- Shadow-mode evidence covers the same ODD requested for release.
- The release packet states which previous model version is the rollback target and verifies runtime compatibility.
- The deployment decision references the relevant safety case claims and technical documentation record.
Failure Modes
| Failure mode | Consequence | Control |
|---|---|---|
| Alias moved without evidence | Fleet runs a model that was not reviewed | Require signed release decision before alias mutation |
| Metric-only approval | Model improves averages while regressing rare safety cases | Gate on scenario replay and ODD slices |
| Dataset snapshot missing | Release cannot be reproduced or audited | Block release unless dataset IDs are immutable |
| Shadow evidence from a different ODD | Approval does not support target deployment | Tie evidence to airport, route, weather, and vehicle class |
| Runtime incompatibility | Model passes offline tests but fails on vehicle | Validate TensorRT/ONNX/runtime bundle before canary |
| Rollback model not executable | Recovery depends on a manual hotfix | Keep rollback alias and compatible artifact bundle current |
| Approval expires silently | Old evidence is reused after data or ODD drift | Require evidence expiry and periodic revalidation |
Related Repository Docs
40-runtime-systems/ml-deployment/production-ml-deployment.md40-runtime-systems/ml-deployment/av-cicd-devops-pipeline.md50-cloud-fleet/mlops/data-flywheel-airside.md50-cloud-fleet/ota/software-update-management-system-ops.md60-safety-validation/safety-case/safety-case-evidence-traceability.md60-safety-validation/verification-validation/testing-validation-methodology.md60-safety-validation/standards-certification/eu-ai-act-machinery-compliance-dossier.md
Sources
- MLflow, "Model Registry Workflows." https://www.mlflow.org/docs/latest/ml/model-registry/workflow/
- Waymo, "Safe to Deploy: How We Know The Waymo Driver Is Ready For The Road," 2025-06. https://waymo.com/blog/2025/06/safe-to-deploy/
- Waymo, "Building a credible case for safety: Waymo's approach for the determination of absence of unreasonable risk." https://waymo.com/research/building-a-credible-case-for-safety-waymos-appro/
- Regulation (EU) 2024/1689, Artificial Intelligence Act, Articles 10-12 and Annex IV. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689
- ISO/IEC 5259-5:2025, "Artificial intelligence - Data quality for analytics and machine learning (ML) - Part 5: Data quality governance framework." https://www.iso.org/standard/84150.html