Skip to content

Edge Runtime Supervision and Configuration Management

Last updated: 2026-05-09

Why It Matters

Vehicle edge software fails in operationally boring ways: a stale config, a missing container, a bad route between modules, a local override left after field support, or an offline device that never receives the intended deployment. Runtime supervision and configuration management must make the desired state explicit, compare it with reported state, and keep the vehicle safe when cloud connectivity is unavailable.

This page covers edge desired-state operations for applications, runtime modules, config bundles, and local supervision. It does not replace secure update signing or safety controller design.

Operating Model

  1. Define one signed desired-state manifest per vehicle class and ODD. The manifest includes module images, versions, config bundle hashes, routes, environment, resources, restart policy, and rollback target.
  2. Deploy through an edge orchestrator pattern. AWS IoT Greengrass uses deployments of components and configurations to things or thing groups. Azure IoT Edge uses deployment manifests with $edgeAgent, $edgeHub, module twins, and desired properties. Eclipse Kanto uses desired-state specifications and domain update agents.
  3. Run a local supervisor that compares desired state, reported state, process health, resource health, and dependency health.
  4. Keep offline behavior explicit. The vehicle should continue with the last approved manifest, preserve local reported state, and sync state when connectivity returns.
  5. Separate safety-critical runtime state from convenience configuration. Any config that can alter autonomy behavior, speed limits, ODD limits, sensor use, or planner policy requires release approval.
  6. Roll out by rings: lab, single vehicle, airport canary, limited fleet, full fleet. Each ring has a hold period and health threshold.

Evidence Artifacts

ArtifactMinimum contentsOwner
Desired-state manifestModules, versions, routes, config hashes, resources, restart policyRuntime owner
Config bundleTyped settings, schema version, default values, ODD scope, signerConfig owner
Deployment recordTarget vehicles/groups, rollout ring, start/end time, result, failuresFleet operations
Reported-state snapshotRunning modules, image digests, config hashes, health, uptimeEdge supervisor
Drift reportDesired vs reported diff, local overrides, stale devices, pending restartsRuntime SRE
Offline-state logLast approved manifest, offline duration, queued state changes, sync resultFleet operations
Rollback recordTrigger, previous manifest, affected vehicles, recovery verificationRelease manager

Acceptance Checks

  • Every running module and config hash appears in the active desired-state manifest.
  • The supervisor reports module health, restart count, resource saturation, route status, and config drift.
  • Vehicles with stale or unknown reported state are excluded from rollout expansion.
  • Offline vehicles receive or reject the latest approved manifest deterministically when they reconnect.
  • Config schema validation runs before deployment and on the vehicle before activation.
  • A local override cannot persist without a ticket, expiry, and visible drift status.
  • Rollback activation is tested for the same vehicle class and runtime version before fleet rollout.

Failure Modes

Failure modeConsequenceControl
Desired state stored only in cloud UICannot reconstruct what was intendedStore signed manifests in version control and release records
Module is healthy but route is brokenSensor or telemetry path silently failsSupervise routes and dependencies, not only processes
Offline device misses config updateFleet runs mixed behavior without visibilityReport last manifest and offline duration on reconnect
Local support override remains activeVehicle behavior diverges from evidenceDrift detection with expiry and release-manager review
Rollout target group is too broadBad config reaches too many vehiclesRinged rollout with health gates
Config schema is weakRuntime accepts invalid units or missing fieldsTyped schema validation before activation
Restart loop hides root causeService appears managed but unavailableAlert on restart rate and preserve failure logs
  • 40-runtime-systems/software-operations/on-vehicle-supply-chain-runtime-security.md
  • 50-cloud-fleet/ota/software-update-management-system-ops.md
  • 50-cloud-fleet/ota/ota-fleet-management.md
  • 40-runtime-systems/ml-deployment/production-ml-deployment.md
  • 40-runtime-systems/monitoring-observability/teleoperation-systems.md
  • 50-cloud-fleet/operations/fleet-sre-incident-response.md
  • 60-safety-validation/runtime-assurance/fail-operational-architecture.md

Sources

Public research notes collected from public sources.