Skip to content

Vehicle Middleware: DDS, SOME/IP, and Zenoh

Last updated: 2026-05-09

Why It Matters

Vehicle middleware is the contract between autonomy software, production ECUs, diagnostics, logging, simulation, and fleet systems. The wrong boundary turns middleware into a safety hazard: discovery storms, stale samples, serialization mismatch, or unbounded queues can look like perception, planning, or actuator failures.

For an AV stack, DDS, SOME/IP, and Zenoh should not be treated as interchangeable message buses. DDS is a data-centric publish/subscribe fit for ROS 2 graphs and QoS-controlled autonomy data. SOME/IP is the common AUTOSAR service interface for production ECU services. Zenoh is useful for routed pub/sub/query patterns across edge, depot, cloud, and constrained networks. The architecture should make their roles explicit and keep safety-critical actuation protected by independent watchdogs and end-to-end checks.

Architecture Decisions

BoundaryUseAvoid
ROS 2 / DDS domainHigh-rate autonomy topics, component composition, local service calls, rosbag/MCAP replay, QoS experiments.Treating default DDS discovery and QoS as production-safe without profiling.
AUTOSAR / SOME/IP domainVersioned services to vehicle ECUs, diagnostics gateways, vehicle state services, command/status APIs.Streaming raw sensor firehoses or unbounded perception debug blobs through ECU service interfaces.
Zenoh domainFleet edge routing, store/query access, cross-subnet robotics data, depot tools, selective upload metadata.Direct safety actuation or replacing local safety supervision.
Safety channelHeartbeat, E-stop, brake enable, speed limit, geofence status, safety controller command validation.Depending on any middleware alone as the safety mechanism.

Recommended layout:

Autonomy graph
ROS 2 nodes <-> RMW DDS or RMW Zenoh
        |
        +-- middleware gateway with explicit schema and QoS mapping
        |
        +-- AUTOSAR Adaptive service boundary
        |       +-- SOME/IP services: vehicle state, diagnostics, actuator APIs
        |       +-- DDS service communication where AUTOSAR interoperability requires it
        |
        +-- Fleet edge boundary
                +-- Zenoh router/store/query: logs, metadata, depot tooling

The gateway is a controlled product, not glue code. It owns schema conversion, timestamp preservation, QoS downgrade rules, sequence numbers, health state, version negotiation, and replay behavior. It should reject messages it cannot map exactly.

Evidence Artifacts

  • Interface matrix covering ROS topic/service/action names, DDS IDL, AUTOSAR service interfaces, SOME/IP service IDs, event groups, methods, and fields.
  • QoS contract for each autonomy topic: reliability, durability, history depth, deadline, lifespan, liveliness, and maximum serialized size.
  • ARXML, IDL, and generated code version records tied to a software release.
  • Gateway mapping tests showing exact field conversions, units, coordinate frames, time bases, covariance semantics, and error handling.
  • Discovery and startup captures for cold boot, node restart, service restart, and a depot laptop joining the network.
  • Latency and jitter histograms for sensor-to-fusion, planner-to-controller, controller-to-gateway, and gateway-to-ECU paths.
  • Rosbag or MCAP replay traces proving middleware configuration can reproduce critical scenarios without hidden live-network dependencies.

Acceptance Checks

  • Adding a diagnostic tool, RViz instance, or depot client cannot starve the control graph or change safety-controller timing.
  • QoS mismatches are detected at startup and surfaced as release-blocking configuration errors.
  • Gateway loss produces a typed degraded state and bounded safe response, not silent stale commands.
  • Every command crossing into the production ECU domain carries sequence, timestamp, source identity, and application-level validity checks.
  • Large sensor messages have explicit maximum sizes and backpressure policy.
  • Service version changes are backward-compatible or blocked by manifest checks.
  • Replay can reconstruct middleware-visible state for a safety event with the same schemas and time base used on vehicle.

Failure Modes

Failure modeSymptomControl
DDS discovery stormCPU spike, missed deadlines, high multicast trafficStatic peers, participant limits, domain isolation, discovery server or router where appropriate.
QoS mismatchSubscriber receives nothing or receives stale samplesContract tests and startup compatibility checks.
Stale command replayOld command is accepted after reconnectLifespan, sequence checks, command lease, safety-controller timeout.
Serialization driftFields swapped, unit mismatch, dropped covarianceSchema lockstep, generated code, golden sample tests.
Bridge backpressureLogger or fleet link blocks autonomy trafficSeparate queues, drop policy by criticality, network shaping.
Time-base driftFusion or incident replay cannot align samplesHardware timestamping, gPTP monitoring, clock-offset logs.
Duplicate service instanceTwo gateways offer the same vehicle serviceService registry checks and single-writer ownership rules.
Security bypassUnauthorized client publishes command-like dataNetwork segmentation, DDS/Zenoh security where used, application authorization, ECU-side validation.

Sources

Public research notes collected from public sources.