Sensor Dropout, Latency, and Jitter Stress Protocol

Last updated: 2026-05-09

Purpose

This protocol validates localization, SLAM, perception fusion, and safety monitors under missing, delayed, bursty, and reordered sensor data. It covers both sensor-origin faults and runtime delivery faults: network congestion, middleware queues, CPU/GPU saturation, logging overhead, driver stalls, and replay bursts. A passing system must degrade predictably, bound uncertainty, reject stale data, and preserve enough evidence for operational response.

This protocol complements time-sync fault injection, timestamp shift sweep, replay time semantics and TF/message-filter validation, SLAM integrity under timing errors, and the perception-SLAM corruption protocol.

Stress Model

Stress	Injection method	Expected safe behavior
Random dropout	Drop messages independently by topic and rate	Fusion handles sparse data or degrades without false certainty
Burst dropout	Remove contiguous frames or packets	Localization availability drops only inside approved bounds or monitor stops
Latency offset	Delay topic delivery without changing source timestamp	Stale-data monitor rejects or marks output uncertain
Latency jitter	Add variable delivery delay	Queues and synchronizers avoid unbounded lag
Reordering	Deliver older messages after newer messages	Consumers reject out-of-order data or process deterministically
Duplication	Replay duplicate frames or transforms	Deduplication or idempotent processing prevents false confidence
Rate collapse	Lower sensor publish rate or driver poll rate	Topic state monitor and fusion health detect degraded input
Compute saturation	Add CPU/GPU/memory/logging load	Real-time watchdogs trip before stale outputs reach planner
Middleware queue pressure	Reduce queue depth, change QoS, or burst playback	Drops are visible and bounded
Mixed fault	Combine dropout with jitter and timing skew	Safety action remains conservative under realistic compound faults

Required Setup

Item	Requirement
Candidate stack	Frozen build, map, calibration, QoS profiles, queue sizes, and runtime parameters
Fault injector	Deterministic topic/network/runtime injector with seed and exact fault schedule
Replay/HIL coverage	Offline replay for breadth, HIL/SIL for middleware behavior, closed course for critical physical confirmation
Ground truth	Pose, objects, occupancy, and route/zone labels for each tested slice
Runtime telemetry	Per-topic rate, inter-arrival time, source stamp age, callback latency, queue depth, executor delay, CPU/GPU/memory
Monitor set	Topic state, stale-data, time sync, TF/message-filter, localization integrity, safety action, and data recorder health
Evidence capture	Rosbag2/MCAP with raw and derived topics, diagnostics, fault schedule, and host logs

Fault Matrix

Modality	Dropout cases	Latency/jitter cases	High-risk metric
LiDAR	Packet/beam/frame dropout, partial sensor blackout	Frame delay, point packet skew, motion-compensation delay	Scan-match residual, false free space, pose drift
Camera	Frame drops, exposure pipeline stalls	Variable image arrival, multi-camera skew	Misprojected detections, track age, fusion drops
Radar	Track dropout, return thinning	Delayed tracks during moving actors	Dynamic obstacle position error
IMU	Sample loss, burst gaps	Delayed or jittered high-rate samples	Preintegration residual, yaw/velocity drift
GNSS/INS	Fix loss, degraded status, delayed fixes	Stale absolute updates	Global pose jump, covariance consistency
Wheel odometry	Encoder drop, duplicated ticks	Delayed odom during acceleration/turning	Local odometry drift and slip misclassification
TF	Missing transforms, delayed frame tree updates	Future/past transform availability	Extrapolation failures and wrong-frame outputs
Map runtime	Tile lookup delay, stale overlay, failed tile load	Map bundle/tile metadata delay	Route/geofence mismatch and map quarantine recall

Procedure

Freeze the candidate build, map, calibration, QoS, queue, and fault-injection manifest.
Run clean baseline replay/HIL for every dataset slice and record deterministic output hashes where possible.
Inject single-modality dropout sweeps at light, moderate, severe, and release-boundary levels.
Inject latency sweeps using fixed delay, variable jitter, burst delay, and reordering.
Run compute-stress cases with logging enabled at the production event-capture tier.
Run compound cases for the highest-risk sensor pairs, such as LiDAR plus IMU jitter, camera plus LiDAR skew, and GNSS plus wheel odometry dropout.
Compare shifted/stressed output against clean baseline and ground truth.
Verify monitors, dashboards, and safety actions trip within budget.
Create failure packets for any wrong-pose, false-free-space, stale-obstacle, or missing-diagnostic event.

Metrics

Metric	Definition	Required slicing
Topic availability	Fraction of expected messages received and accepted	Sensor, route, ODD, fault severity
Inter-arrival jitter	Distribution of time between message arrivals	p50, p95, p99, p99.9 and burst max
Source stamp age	Host receive or processing time minus source stamp	Sensor and consumer node
End-to-end latency	Source stamp to safety-relevant output or control input	p50, p95, p99, p99.9
Queue age and depth	Age of oldest queued message and queue occupancy	Node and executor
Message-filter miss rate	Messages dropped due to sync window, queue, or no transform	Filter instance and topic pair
TF availability	Transform lookup success/failure, extrapolation past/future, cache miss	Frame pair and consumer
Localization availability	Time pose remains valid inside alert/protection limit	Route and fault severity
Pose integrity	Pose error, covariance/protection level, NEES/NIS, residuals	ODD slice and modality fault
Perception safety	False free space, missed obstacle, stale track use, object age	Critical actor class and zone
Safety action latency	Fault start to alert, degrade, stop, or route hold	Fault type and severity
Data evidence completeness	Presence of raw, diagnostics, fault schedule, and monitor traces	Every release-critical run

Pass and Block Gates

Gate	Pass condition	Block condition
DLJ0 baseline	Clean baseline meets existing localization, perception, and runtime gates	Baseline already violates release thresholds
DLJ1 topic health	Dropout, rate collapse, and burst gaps are detected within monitor budget	Topic loss is invisible to diagnostics
DLJ2 stale rejection	Delayed, reordered, or duplicate data is rejected or clearly marked stale	Consumer accepts stale/future data as current
DLJ3 bounded queueing	Queue age and executor latency stay below approved output age limits or trigger degraded mode	Queue backlog grows while outputs remain nominal
DLJ4 graceful degradation	Accuracy/availability degrades according to severity and uncertainty increases with error	Confidence remains nominal while error grows
DLJ5 safety output	No high-confidence false free space or stale obstacle near aircraft, people, FOD, or geofence	Any critical stale/false-safe output
DLJ6 recovery	After fault removal, outputs recover without map-frame discontinuity or unexplained relocalization	Recovery creates a pose jump or hidden map mismatch
DLJ7 observability	Fleet telemetry contains enough fields to explain the stress event	Missing topic, queue, timestamp, or monitor evidence

Operational Response

Condition	Response
Brief dropout inside validated envelope	Continue mission, increment reliability counter, retain compressed event context
Sustained dropout or rate collapse	Degrade speed/route, increase uncertainty, create maintenance or sensor-health ticket
Stale safety-critical input	Controlled stop or remote-assist handoff; mark session unsafe for map building
Compute-induced latency spike	Reduce logging tier if approved, shed non-safety workload, or controlled stop if latency persists
Repeated route-specific jitter	Open infrastructure/network investigation and pause route expansion
Map-building run with timing/data stress	Quarantine generated map tile and exclude log from release dataset until reviewed

Evidence Artifacts

Artifact	Contents
Stress manifest	Fault type, sensor/topic, severity, start/end time, seed, expected monitor response
Runtime metrics report	Rates, latency percentiles, jitter, queue age, executor delay, CPU/GPU/memory
Fusion health report	Message-filter drops, TF failures, sync matches, object/track age, stale-data rejects
Localization report	Pose error, covariance/protection level, residuals, availability, recovery
Safety action report	Alert/degrade/stop timing and planner/control response
Failure packet	Minimal reproducible slice, fault schedule, screenshots/plots, defect ID
Release disposition	Pass, block, inconclusive, or pass with ODD/route/logging restriction

Sources

ROS 2 Kilted documentation, message_filters: https://docs.ros.org/en/kilted/p/message_filters/
ROS 2 Kilted documentation, tf2 MessageFilter: https://docs.ros.org/en/kilted/Tutorials/Intermediate/Tf2/Using-Stamped-Datatypes-With-Tf2-Ros-MessageFilter.html
ROS 2 documentation, Quality of Service settings: https://docs.ros.org/en/kilted/Concepts/Intermediate/About-Quality-of-Service-Settings.html
rosbag2 README and playback options: https://github.com/ros2/rosbag2
Autoware topic state monitor: https://autowarefoundation.github.io/autoware_universe/main/system/autoware_topic_state_monitor/
Autoware diagnostics API: https://autowarefoundation.github.io/autoware-documentation/latest/design/autoware-interfaces/ad-api/features/diagnostics/
Autoware NDT scan matcher documentation: https://autowarefoundation.github.io/autoware_core/main/localization/autoware_ndt_scan_matcher/
MCAP specification: https://mcap.dev/spec
Ouster sensor time synchronization guide: https://static.ouster.dev/sensor-docs/image_route1/image_route2/time_sync/time-sync.html

SLAM Methods

Methods

Sensor Dropout, Latency, and Jitter Stress Protocol ​

Purpose ​

Stress Model ​

Required Setup ​

Fault Matrix ​

Procedure ​

Metrics ​

Pass and Block Gates ​

Operational Response ​

Evidence Artifacts ​

Sources ​