Skip to content

Sensor Dropout, Latency, and Jitter Stress Protocol

Last updated: 2026-05-09

Purpose

This protocol validates localization, SLAM, perception fusion, and safety monitors under missing, delayed, bursty, and reordered sensor data. It covers both sensor-origin faults and runtime delivery faults: network congestion, middleware queues, CPU/GPU saturation, logging overhead, driver stalls, and replay bursts. A passing system must degrade predictably, bound uncertainty, reject stale data, and preserve enough evidence for operational response.

This protocol complements time-sync fault injection, timestamp shift sweep, replay time semantics and TF/message-filter validation, SLAM integrity under timing errors, and the perception-SLAM corruption protocol.

Stress Model

StressInjection methodExpected safe behavior
Random dropoutDrop messages independently by topic and rateFusion handles sparse data or degrades without false certainty
Burst dropoutRemove contiguous frames or packetsLocalization availability drops only inside approved bounds or monitor stops
Latency offsetDelay topic delivery without changing source timestampStale-data monitor rejects or marks output uncertain
Latency jitterAdd variable delivery delayQueues and synchronizers avoid unbounded lag
ReorderingDeliver older messages after newer messagesConsumers reject out-of-order data or process deterministically
DuplicationReplay duplicate frames or transformsDeduplication or idempotent processing prevents false confidence
Rate collapseLower sensor publish rate or driver poll rateTopic state monitor and fusion health detect degraded input
Compute saturationAdd CPU/GPU/memory/logging loadReal-time watchdogs trip before stale outputs reach planner
Middleware queue pressureReduce queue depth, change QoS, or burst playbackDrops are visible and bounded
Mixed faultCombine dropout with jitter and timing skewSafety action remains conservative under realistic compound faults

Required Setup

ItemRequirement
Candidate stackFrozen build, map, calibration, QoS profiles, queue sizes, and runtime parameters
Fault injectorDeterministic topic/network/runtime injector with seed and exact fault schedule
Replay/HIL coverageOffline replay for breadth, HIL/SIL for middleware behavior, closed course for critical physical confirmation
Ground truthPose, objects, occupancy, and route/zone labels for each tested slice
Runtime telemetryPer-topic rate, inter-arrival time, source stamp age, callback latency, queue depth, executor delay, CPU/GPU/memory
Monitor setTopic state, stale-data, time sync, TF/message-filter, localization integrity, safety action, and data recorder health
Evidence captureRosbag2/MCAP with raw and derived topics, diagnostics, fault schedule, and host logs

Fault Matrix

ModalityDropout casesLatency/jitter casesHigh-risk metric
LiDARPacket/beam/frame dropout, partial sensor blackoutFrame delay, point packet skew, motion-compensation delayScan-match residual, false free space, pose drift
CameraFrame drops, exposure pipeline stallsVariable image arrival, multi-camera skewMisprojected detections, track age, fusion drops
RadarTrack dropout, return thinningDelayed tracks during moving actorsDynamic obstacle position error
IMUSample loss, burst gapsDelayed or jittered high-rate samplesPreintegration residual, yaw/velocity drift
GNSS/INSFix loss, degraded status, delayed fixesStale absolute updatesGlobal pose jump, covariance consistency
Wheel odometryEncoder drop, duplicated ticksDelayed odom during acceleration/turningLocal odometry drift and slip misclassification
TFMissing transforms, delayed frame tree updatesFuture/past transform availabilityExtrapolation failures and wrong-frame outputs
Map runtimeTile lookup delay, stale overlay, failed tile loadMap bundle/tile metadata delayRoute/geofence mismatch and map quarantine recall

Procedure

  1. Freeze the candidate build, map, calibration, QoS, queue, and fault-injection manifest.
  2. Run clean baseline replay/HIL for every dataset slice and record deterministic output hashes where possible.
  3. Inject single-modality dropout sweeps at light, moderate, severe, and release-boundary levels.
  4. Inject latency sweeps using fixed delay, variable jitter, burst delay, and reordering.
  5. Run compute-stress cases with logging enabled at the production event-capture tier.
  6. Run compound cases for the highest-risk sensor pairs, such as LiDAR plus IMU jitter, camera plus LiDAR skew, and GNSS plus wheel odometry dropout.
  7. Compare shifted/stressed output against clean baseline and ground truth.
  8. Verify monitors, dashboards, and safety actions trip within budget.
  9. Create failure packets for any wrong-pose, false-free-space, stale-obstacle, or missing-diagnostic event.

Metrics

MetricDefinitionRequired slicing
Topic availabilityFraction of expected messages received and acceptedSensor, route, ODD, fault severity
Inter-arrival jitterDistribution of time between message arrivalsp50, p95, p99, p99.9 and burst max
Source stamp ageHost receive or processing time minus source stampSensor and consumer node
End-to-end latencySource stamp to safety-relevant output or control inputp50, p95, p99, p99.9
Queue age and depthAge of oldest queued message and queue occupancyNode and executor
Message-filter miss rateMessages dropped due to sync window, queue, or no transformFilter instance and topic pair
TF availabilityTransform lookup success/failure, extrapolation past/future, cache missFrame pair and consumer
Localization availabilityTime pose remains valid inside alert/protection limitRoute and fault severity
Pose integrityPose error, covariance/protection level, NEES/NIS, residualsODD slice and modality fault
Perception safetyFalse free space, missed obstacle, stale track use, object ageCritical actor class and zone
Safety action latencyFault start to alert, degrade, stop, or route holdFault type and severity
Data evidence completenessPresence of raw, diagnostics, fault schedule, and monitor tracesEvery release-critical run

Pass and Block Gates

GatePass conditionBlock condition
DLJ0 baselineClean baseline meets existing localization, perception, and runtime gatesBaseline already violates release thresholds
DLJ1 topic healthDropout, rate collapse, and burst gaps are detected within monitor budgetTopic loss is invisible to diagnostics
DLJ2 stale rejectionDelayed, reordered, or duplicate data is rejected or clearly marked staleConsumer accepts stale/future data as current
DLJ3 bounded queueingQueue age and executor latency stay below approved output age limits or trigger degraded modeQueue backlog grows while outputs remain nominal
DLJ4 graceful degradationAccuracy/availability degrades according to severity and uncertainty increases with errorConfidence remains nominal while error grows
DLJ5 safety outputNo high-confidence false free space or stale obstacle near aircraft, people, FOD, or geofenceAny critical stale/false-safe output
DLJ6 recoveryAfter fault removal, outputs recover without map-frame discontinuity or unexplained relocalizationRecovery creates a pose jump or hidden map mismatch
DLJ7 observabilityFleet telemetry contains enough fields to explain the stress eventMissing topic, queue, timestamp, or monitor evidence

Operational Response

ConditionResponse
Brief dropout inside validated envelopeContinue mission, increment reliability counter, retain compressed event context
Sustained dropout or rate collapseDegrade speed/route, increase uncertainty, create maintenance or sensor-health ticket
Stale safety-critical inputControlled stop or remote-assist handoff; mark session unsafe for map building
Compute-induced latency spikeReduce logging tier if approved, shed non-safety workload, or controlled stop if latency persists
Repeated route-specific jitterOpen infrastructure/network investigation and pause route expansion
Map-building run with timing/data stressQuarantine generated map tile and exclude log from release dataset until reviewed

Evidence Artifacts

ArtifactContents
Stress manifestFault type, sensor/topic, severity, start/end time, seed, expected monitor response
Runtime metrics reportRates, latency percentiles, jitter, queue age, executor delay, CPU/GPU/memory
Fusion health reportMessage-filter drops, TF failures, sync matches, object/track age, stale-data rejects
Localization reportPose error, covariance/protection level, residuals, availability, recovery
Safety action reportAlert/degrade/stop timing and planner/control response
Failure packetMinimal reproducible slice, fault schedule, screenshots/plots, defect ID
Release dispositionPass, block, inconclusive, or pass with ODD/route/logging restriction

Sources

Public research notes collected from public sources.