Skip to content

Safety Case Evidence Traceability

Last updated: 2026-05-09

A safety case is only useful if its claims, assumptions, hazards, requirements, tests, field evidence, and change decisions can be traced. Autonomous systems change continuously through software, model, map, configuration, calibration, ODD, and operating-procedure updates. The safety case must therefore be a living evidence graph, not a static PDF assembled for a single audit.

Practical Evidence and Artifact Model

Use stable evidence IDs across the repository, issue tracker, CI, data lake, model registry, map registry, incident system, and release system.

ArtifactMinimum fieldsLinks to
Claim recordClaim ID, statement, scope, owner, confidence, status, review dateParent/child claims, evidence, assumptions
Hazard recordHazard ID, operational scenario, severity, exposure, controllability or risk class, mitigationsRequirements, tests, incidents
Requirement recordRequirement ID, source, safety/security/ML/system type, acceptance criteriaHazards, design, tests, releases
Assumption recordAssumption ID, rationale, validity condition, monitor, expiry/review triggerClaims and operating evidence
Evidence recordEvidence ID, type, source, date, version, quality rating, result, limitationsClaims, requirements, release
Scenario recordScenario ID, ODD slice, parameters, coverage rationale, pass/fail criteriaHazards, tests, datasets
Dataset recordDataset ID, source clips, labeling policy, redaction state, split, coverage, exclusionsML claims and model releases
Model/map/config recordArtifact ID, manifest, validation bundle, deployment cohorts, rollbackRequirements and release
Incident recordIncident ID, timeline, active artifact versions, root cause, CAPAHazards, assumptions, tests
Change impact recordChange ID, affected claims, required evidence refresh, approvalSUMS and safety board

Evidence should be rated for quality. A track test with calibrated measurement and pass/fail criteria is stronger than an informal demo video. A simulation campaign is only useful if scenario coverage, simulator validity, and pass/fail criteria are explicit.

Traceability Model

The minimum trace graph:

text
ODD -> Hazard -> Safety goal -> Requirement -> Design control
    -> Verification/validation evidence -> Release decision
    -> Field monitoring evidence -> Safety-case update

For ML components:

text
System safety requirement -> ML safety requirement -> Data requirement
    -> Dataset version -> Training run -> Model artifact -> Verification set
    -> Integration test -> Shadow/canary evidence -> Field monitor

For maps and configuration:

text
Site ODD assumption -> Map/config requirement -> Map diff/config change
    -> Replay/site validation -> Release approval -> Vehicle active manifest
    -> Field disagreement/incident monitor

Deployment Operations

1. Baseline the safety case at each release

Every production release should freeze:

  • Top-level safety claims and claim status.
  • Active hazard list and mitigations.
  • Requirements included in the release.
  • Evidence versions used for release approval.
  • Known limitations and accepted residual risks.
  • Active ODD and site assumptions.
  • Vehicle/software/model/map/config/calibration manifests.

The baseline does not stop future work. It gives incident investigators and auditors a precise answer to "what was believed and approved at the time?"

2. Require change impact analysis

Changes that can affect safety-case evidence include:

  • Code, firmware, model, map, configuration, calibration, or parameter changes.
  • ODD expansion, new site, new route, new vehicle variant, new sensor mount.
  • New operator workflow, teleoperation mode, dispatch rule, or maintenance procedure.
  • Field evidence showing an assumption is false or weaker than expected.
  • Supplier component, vulnerability, or cybersecurity architecture change.

The change impact record should name affected claims and either attach refreshed evidence or explicitly justify why existing evidence remains valid.

3. Automate trace checks

At minimum, CI or release tooling should fail or warn when:

  • A safety requirement has no verification evidence.
  • A high-severity hazard has no mitigated residual-risk decision.
  • A model release lacks dataset lineage and verification results.
  • A map release lacks semantic diff and validation report.
  • An incident CAPA references no hazard, requirement, or safety-case change.
  • Evidence is older than its review interval.
  • A claim depends on an expired assumption.

4. Review with independent challenge

Use safety board reviews for major releases and targeted reviews for minor changes. The reviewer should challenge:

  • Is the claim scoped tightly enough?
  • Does the evidence actually support the claim?
  • Are assumptions monitored in the field?
  • Are negative results and limitations recorded?
  • Did the release change invalidate previous evidence?
  • Are cybersecurity and privacy constraints represented where they affect safety?

Evidence Patterns

Evidence typeGood evidenceWeak evidence
SimulationScenario catalog, parameter ranges, simulator validity, pass/fail criteria, results"Ran many miles" without coverage or acceptance criteria
Track/site testInstrumented test, calibrated targets, repeatability, environmental notesOne-off demo
Field dataVersioned fleet metrics, exposure denominator, incident/near-miss linkageAnecdotes or aggregate miles only
ML verificationFrozen dataset, coverage analysis, slice metrics, robustness tests, lineageSingle headline accuracy metric
Runtime monitorRequirement-to-monitor mapping, threshold rationale, false positive/negative analysisUncalibrated alert
Safety analysisHARA/STPA/FMEA/SOTIF with trace to controls and testsSpreadsheet detached from design and evidence
CybersecurityTARA, SBOM/CVE/VEX, penetration or red-team evidence, incident drillsScanner output with no disposition

Risks and Failure Modes

Failure modeConsequenceControl
Static PDF safety caseRelease and field evidence drift from claimsEvidence graph with release baselines
Orphaned hazardsHigh-risk items lack requirements or testsTrace coverage dashboard
Argument by volumeLarge evidence set hides weak supportEvidence quality rating and claim-level review
ML data lineage missingModel cannot be reproduced or auditedDataset IDs, split manifests, model registry links
Assumptions unmonitoredODD or site changes silently invalidate claimsAssumption owners, expiry, field monitors
Incident CAPA not linkedLessons do not update assuranceIncident-to-hazard/claim change request
Tool links rotAudit cannot retrieve evidenceImmutable evidence store and stable IDs
Security/safety splitCyber compromise risks omitted from safety caseShared hazards for security-controlled safety functions
  • 60-safety-validation/standards-certification/certification-guide.md
  • 60-safety-validation/safety-case/failure-modes-analysis.md
  • 60-safety-validation/safety-case/incident-reporting-post-market-monitoring.md
  • 60-safety-validation/verification-validation/testing-validation-methodology.md
  • 60-safety-validation/runtime-assurance/runtime-verification-monitoring.md
  • 50-cloud-fleet/ota/software-update-management-system-ops.md
  • 50-cloud-fleet/map-operations/hd-map-lifecycle-operations.md
  • 40-runtime-systems/ml-deployment/production-ml-deployment.md

Sources

Public research notes collected from public sources.