Online Perception Runtime Monitoring and ODD Boundary Enforcement
Detecting Silent ML Degradation, Input/Output Anomaly Detection, Cross-Modal Consistency, and Automated Graceful Degradation for Airport Airside Autonomous GSE
Last updated: 2026-04-11
Key Takeaway: Sensor hardware can be fully functional while perception ML models silently produce unreliable outputs -- domain shift from a new airport, seasonal lighting changes, novel aircraft types, or adversarial natural conditions (jet exhaust shimmer, de-icing spray optical effects, puddle reflections) cause neural network degradation that no hardware diagnostic can detect. This document defines the perception quality monitoring layer that sits between hardware sensor health (covered in
sensor-degradation-health-monitoring.md) and formal temporal logic safety monitors (covered inruntime-verification-monitoring.md). The system continuously evaluates whether ML model outputs are trustworthy by monitoring input distribution drift, output consistency patterns, cross-modal agreement, OOD detection scores, calibration stability, and temporal anomaly trends -- all within a <5ms compute budget on Orin AGX. A composite Perception Health Score (PHS) drives graduated system responses from normal operation through speed reduction, increased margins, teleop escalation, to safe stop. When combined with the existing Simplex architecture (neural AC + classical Frenet BC), this monitoring layer provides the missing link between "sensors are working" and "perception outputs are correct" -- the gap that no existing airside AV competitor has addressed, and that ISO 3691-4, UL 4600, and the EU AI Act increasingly demand evidence for.
Table of Contents
- The Silent Degradation Problem
- Architecture Overview
- Input Distribution Monitoring
- Output Consistency Checking
- Cross-Modal Consistency
- OOD Detection Integration
- ODD Boundary Monitoring
- Perception Confidence Aggregation
- Calibration Drift Detection
- Temporal Anomaly Detection
- Response Actions and Graceful Degradation
- Implementation on Orin
- Certification and Safety Case
- Implementation Roadmap
- Key Takeaways
- References
1. The Silent Degradation Problem
1.1 Why Perception Fails Without Sensor Faults
Hardware sensor health monitoring (see 20-av-platform/sensors/sensor-degradation-health-monitoring.md) catches physical degradation: dirty lenses, reduced point counts, intensity drops, angular coverage loss. But perception can fail catastrophically even when every sensor is physically perfect:
| Failure Mode | Sensor Status | ML Model Behavior | Consequence |
|---|---|---|---|
| Domain shift (new airport) | All nominal | Trained on Airport A's tarmac texture, Airport B has different reflectivity | Detection confidence drops 15-30%, false negatives on distant objects |
| Seasonal lighting change | All nominal | Winter sun angle creates shadow patterns never seen in training | Shadow regions misclassified as obstacles or vice versa |
| Novel aircraft type | All nominal | A380 wingspan never in training data | Bounding box regression fails, clearance computed wrong |
| Jet exhaust shimmer | All nominal | Thermal distortion warps point cloud geometry at 50-100m behind engines | Ghost detections, split objects, position errors up to 2m |
| De-icing spray optical effects | LiDAR sees normal point count | Glycol droplet scatter creates dense false point returns at 5-15m | False obstacle wall, phantom emergency stops |
| Puddle reflections | All nominal | Specular reflection creates mirrored point cloud below ground plane | Ground plane estimation fails, false obstacle below vehicle |
| Model staleness | All nominal | Distribution drifts over weeks as airport operations change | Gradual mAP decline of 0.5-1% per month undetected |
| Adversarial weather combination | All nominal | Low sun + wet tarmac + light fog -- individually tolerable, combined unseen | Compound degradation exceeding any single-condition training |
The critical insight: sensor health monitoring answers "are my sensors working?" while perception monitoring answers "are my ML models producing trustworthy outputs?" Both are necessary. Neither is sufficient alone.
1.2 The Monitoring Gap
Current reference airside AV stack Safety Stack (with existing documents):
Hardware Health Runtime Verification Safety Architecture
┌─────────────┐ ┌──────────────────┐ ┌─────────────┐
│ sensor-degr. │ │ runtime-verif. │ │ simplex │
│ health-mon. │ │ monitoring │ │ architecture│
├─────────────┤ ├──────────────────┤ ├─────────────┤
│ Point count │ │ STL temporal │ │ Neural AC │
│ Max range │ │ logic monitors │ │ Classical BC│
│ Intensity │ │ OOD detection │ │ Arbitrator │
│ Angular cov. │ │ Shield synthesis │ │ │
│ Beam uniform.│ │ Geofence │ │ │
└──────┬───────┘ └────────┬─────────┘ └──────┬──────┘
│ │ │
│ ┌───────────────────┤ │
│ │ GAP: No ML │ │
│ │ perception │ │
│ │ quality monitor │ │
│ └───────────────────┘ │
│ │
"Are sensors "Is the system "Which stack
physically violating safety controls the
working?" specifications?" vehicle?"This document fills the gap: online perception quality monitoring that detects when ML models are producing unreliable outputs even though hardware is fine and no explicit safety specification has been violated yet.
1.3 Distinction from Related Systems
| System | What It Monitors | Detection Method | This Document |
|---|---|---|---|
Sensor health (sensor-degradation-health-monitoring.md) | Physical sensor function | Point count, range, intensity statistics | Assumes sensors are healthy; monitors ML output quality |
Runtime verification (runtime-verification-monitoring.md) | Safety specification compliance | STL temporal logic over system state | Catches violations after they occur; this catches degradation before violations |
Weather-adaptive ODD (weather-adaptive-odd-management.md) | Environmental conditions | METAR, sensor feeds, weather classification | Monitors environment; this monitors perception response to environment |
Uncertainty quantification (uncertainty-quantification-calibration.md) | Per-prediction confidence | MC-Dropout, evidential, conformal | Provides per-detection uncertainty; this aggregates into system-level health |
| This document | ML perception quality | Input/output statistics, cross-modal consistency, OOD scores, calibration drift | Continuous trustworthiness assessment of the perception pipeline as a whole |
1.4 Threat Model for Perception Degradation
We categorize perception degradation by onset speed and scope:
Onset Speed
Sudden Gradual
(<1 second) (hours/days/weeks)
┌──────────────────┬──────────────────────┐
Local │ Jet exhaust │ Seasonal lighting │
(one │ De-icing spray │ Tarmac re-surfacing │
sensor │ Puddle reflection │ Sensor aging │
or │ Bird strike │ Calibration drift │
zone) │ Sun glare │ Vegetation growth │
├──────────────────┼──────────────────────┤
Global │ Fog onset │ Distribution drift │
(all │ Heavy rain burst │ Model staleness │
percep.│ Power brownout │ New fleet GSE types │
) │ GPU thermal throt. │ Airport layout change │
└──────────────────┴──────────────────────┘Each quadrant requires different monitoring strategies:
- Sudden + Local: Per-frame cross-modal checks, spatial anomaly detection
- Sudden + Global: Input distribution monitoring, ODD state machine
- Gradual + Local: Calibration drift detection, per-sensor trend analysis
- Gradual + Global: Temporal anomaly detection (CUSUM/EWMA), fleet comparison
2. Architecture Overview
2.1 Monitoring Pipeline
Perception Pipeline (existing)
┌──────────────────────────────────────────────────────────────────┐
│ LiDAR Points ──► PointPillars ──► CenterPoint ──► Detections │
│ │ │ │ │
│ │ Backbone Features │ │
│ │ (BEV grid) │ │
│ │ │ │ │
└───────┼─────────────────┼───────────────────────────────┼────────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────┐
│ PERCEPTION MONITORING LAYER │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────────────┐ │
│ │ INPUT │ │ FEATURE │ │ OUTPUT │ │
│ │ MONITORS │ │ MONITORS │ │ MONITORS │ │
│ ├─────────────┤ ├──────────────┤ ├───────────────────────┤ │
│ │ Point density│ │ BEV stats │ │ Detection count CUSUM │ │
│ │ Intensity │ │ Mahalanobis │ │ Bbox size EWMA │ │
│ │ Spatial cov. │ │ Energy score │ │ Class distribution │ │
│ │ Ground plane │ │ Feature norm │ │ Confidence calibration│ │
│ └──────┬──────┘ └──────┬───────┘ │ Track consistency │ │
│ │ │ └───────────┬───────────┘ │
│ │ │ │ │
│ ┌──────┴──────────────┴────────────────────┴───────────────┐ │
│ │ CROSS-MODAL CONSISTENCY │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ LiDAR vs Radar detection agreement │ │
│ │ Multi-LiDAR overlap consistency │ │
│ │ GTSAM innovation sequence monitoring │ │
│ │ Tracking prediction vs observation residuals │ │
│ └──────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────┴──────────────────────────────┐ │
│ │ TEMPORAL ANOMALY DETECTION │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ CUSUM on per-minute statistics │ │
│ │ EWMA on per-hour trends │ │
│ │ Calibration drift (multi-LiDAR mutual information) │ │
│ └──────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────┴──────────────────────────────┐ │
│ │ PERCEPTION HEALTH SCORE (PHS) │ │
│ │ Bayesian fusion of all monitor outputs │ │
│ │ PHS ∈ [0, 1] │ │
│ └──────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────┴──────────────────────────────┐ │
│ │ ODD BOUNDARY ENFORCEMENT │ │
│ │ State machine: NORMAL → DEGRADED → ... │ │
│ └──────────────────────────┬────────────────────────────────┘ │
└─────────────────────────────┼────────────────────────────────────┘
│
▼
Response Actions
(speed, margins, mode, teleop, stop)2.2 Compute Budget Allocation
Total budget: <5ms per perception cycle (10 Hz = 100ms cycle, 5% allocated to monitoring).
| Monitor Category | Budget | Runs On | Frequency |
|---|---|---|---|
| Input distribution monitors | 0.5 ms | GPU (piggyback on perception) | Every frame |
| Feature-space OOD | 0.8 ms | GPU (uses existing backbone features) | Every frame |
| Output consistency (CUSUM/EWMA) | 0.3 ms | CPU | Every frame |
| Cross-modal consistency | 0.8 ms | CPU | Every frame |
| Calibration drift | 0.2 ms | CPU | Every 10th frame (1 Hz) |
| Temporal anomaly (long-term) | 0.1 ms | CPU | Every 60th frame (~0.17 Hz) |
| PHS aggregation + ODD state machine | 0.3 ms | CPU | Every frame |
| Total | 3.0 ms typical, 4.2 ms worst-case | Mixed |
This leaves 1-2ms headroom within the 5ms budget for future monitors.
2.3 ROS Node Architecture
┌──────────────────────────┐
│ /perception_monitor │
│ (C++ nodelet, 10 Hz) │
├──────────────────────────┤
Subscribes: │ │ Publishes:
/rslidar/points_* ────────────►│ Input monitors │───► /monitor/input_health
/perception/backbone_features ─►│ Feature OOD │───► /monitor/ood_score
/perception/detections ────────►│ Output consistency │───► /monitor/output_health
/radar/detections ─────────────►│ Cross-modal checks │───► /monitor/cross_modal
/gtsam/innovation ─────────────►│ Calibration drift │───► /monitor/calibration
/odom/fused ───────────────────►│ Temporal analysis │───► /monitor/temporal_health
│ │
│ PHS aggregation ────────│───► /monitor/perception_health
│ ODD state machine ──────│───► /monitor/odd_state
│ │───► /monitor/response_action
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ /safety_monitor/ │
│ (Existing arbitrator │
│ from Simplex arch.) │
└──────────────────────────┘The perception monitor runs as a C++ nodelet in the same process as the perception pipeline to access backbone features at zero serialization cost. The nodelet publishes diagnostic topics at 10 Hz and a latched /monitor/odd_state topic for the Simplex arbitrator.
3. Input Distribution Monitoring
3.1 Rationale
The simplest and most robust way to detect perception degradation is to check whether the model's input looks like what it was trained on. If the input distribution has shifted beyond what the model has seen, the outputs are unreliable regardless of confidence scores. Input monitoring catches problems that output monitoring cannot: a model may produce confident but wrong predictions on OOD inputs.
3.2 Point Cloud Statistical Monitors
For each incoming LiDAR frame (aggregated across 4-8 RoboSense sensors), we compute lightweight statistics and compare against reference distributions established during commissioning/validation.
Monitor 1: Point Density Histogram
Bin the aggregated point cloud into a 2D BEV grid and compute the density histogram. Compare against the reference via KL divergence.
// point_density_monitor.hpp
#pragma once
#include <array>
#include <cmath>
#include <vector>
struct PointDensityMonitor {
// BEV grid: 200m x 200m at 2m resolution = 100x100 = 10,000 bins
static constexpr int GRID_SIZE = 100;
static constexpr float CELL_SIZE = 2.0f; // meters
static constexpr float GRID_ORIGIN = -100.0f; // meters from ego
static constexpr int NUM_BINS = GRID_SIZE * GRID_SIZE;
// Reference distribution (set during commissioning)
std::array<float, NUM_BINS> ref_density{};
float ref_total_points = 0.0f;
// Current frame statistics
std::array<int, NUM_BINS> current_counts{};
int current_total = 0;
// Smoothed KL divergence
float kl_ewma = 0.0f;
static constexpr float EWMA_ALPHA = 0.05f; // ~20 frame smoothing
// Thresholds (calibrated during commissioning)
float kl_warn_threshold = 0.15f; // DEGRADED
float kl_alert_threshold = 0.35f; // RESTRICTED
float kl_critical_threshold = 0.60f; // SUSPENDED
void set_reference(const std::array<float, NUM_BINS>& ref, float total) {
ref_density = ref;
ref_total_points = total;
}
void reset_frame() {
current_counts.fill(0);
current_total = 0;
}
void add_point(float x, float y) {
int gx = static_cast<int>((x - GRID_ORIGIN) / CELL_SIZE);
int gy = static_cast<int>((y - GRID_ORIGIN) / CELL_SIZE);
if (gx >= 0 && gx < GRID_SIZE && gy >= 0 && gy < GRID_SIZE) {
current_counts[gy * GRID_SIZE + gx]++;
current_total++;
}
}
// Compute symmetrized KL divergence between current and reference
float compute_kl_divergence() {
if (current_total == 0 || ref_total_points == 0.0f) return 1.0f;
float kl = 0.0f;
float inv_current = 1.0f / static_cast<float>(current_total);
float inv_ref = 1.0f / ref_total_points;
for (int i = 0; i < NUM_BINS; i++) {
// Add Laplace smoothing to avoid log(0)
float p = static_cast<float>(current_counts[i]) * inv_current + 1e-7f;
float q = ref_density[i] * inv_ref + 1e-7f;
// Symmetrized KL: 0.5 * (KL(P||Q) + KL(Q||P))
kl += 0.5f * (p * std::log(p / q) + q * std::log(q / p));
}
// Update EWMA
kl_ewma = EWMA_ALPHA * kl + (1.0f - EWMA_ALPHA) * kl_ewma;
return kl_ewma;
}
enum class Status { NORMAL, DEGRADED, RESTRICTED, SUSPENDED };
Status get_status() const {
if (kl_ewma >= kl_critical_threshold) return Status::SUSPENDED;
if (kl_ewma >= kl_alert_threshold) return Status::RESTRICTED;
if (kl_ewma >= kl_warn_threshold) return Status::DEGRADED;
return Status::NORMAL;
}
};Computational cost: The BEV binning piggybacks on existing PointPillars voxelization (already iterates over all points). The KL divergence over 10,000 bins costs ~0.05ms on a single ARM core.
Monitor 2: Intensity Distribution
LiDAR intensity changes with surface wetness, contamination, and atmospheric conditions. Track the intensity histogram and detect shifts.
struct IntensityDistMonitor {
// 256-bin intensity histogram (RoboSense reports 0-255)
static constexpr int NUM_BINS = 256;
std::array<int, NUM_BINS> current_hist{};
std::array<float, NUM_BINS> ref_hist{};
int frame_count = 0;
// MMD (Maximum Mean Discrepancy) with RBF kernel
// More robust than KL for multimodal distributions
float compute_mmd_approx() {
// Linear-time MMD approximation using histogram comparison
// Wasserstein-1 distance as computationally cheaper alternative
float cdf_current = 0.0f, cdf_ref = 0.0f;
float wasserstein = 0.0f;
float total_current = 0.0f;
for (int i = 0; i < NUM_BINS; i++) {
total_current += current_hist[i];
}
if (total_current < 100) return 1.0f; // Too few points
float inv_total = 1.0f / total_current;
for (int i = 0; i < NUM_BINS; i++) {
cdf_current += current_hist[i] * inv_total;
cdf_ref += ref_hist[i];
wasserstein += std::abs(cdf_current - cdf_ref);
}
return wasserstein / NUM_BINS;
}
};Monitor 3: Spatial Coverage Map
Track which angular sectors have sufficient point density. Detects partial blockage, sensor misalignment, or environmental blind spots (e.g., rain reducing range in specific directions).
struct SpatialCoverageMonitor {
// 36 azimuth sectors x 8 range rings
static constexpr int NUM_AZIMUTH = 36; // 10 degrees each
static constexpr int NUM_RANGE = 8; // 0-10, 10-20, ..., 60-70, 70+ meters
static constexpr int NUM_CELLS = NUM_AZIMUTH * NUM_RANGE;
std::array<int, NUM_CELLS> current_counts{};
std::array<float, NUM_CELLS> ref_fill_ratio{}; // Expected fill per cell
float coverage_score() {
// Fraction of cells that have at least 50% of expected points
int sufficient = 0;
for (int i = 0; i < NUM_CELLS; i++) {
float expected = ref_fill_ratio[i] * 100.0f; // Normalized
if (expected < 5.0f) continue; // Skip cells with few expected points
if (current_counts[i] >= 0.5f * expected) sufficient++;
}
int active_cells = 0;
for (int i = 0; i < NUM_CELLS; i++) {
if (ref_fill_ratio[i] * 100.0f >= 5.0f) active_cells++;
}
return (active_cells > 0) ?
static_cast<float>(sufficient) / active_cells : 0.0f;
}
};3.3 BEV Feature Statistics
After the PointPillars backbone produces BEV features (typically a (C, H, W) tensor, e.g., 64 channels x 200 x 200), we compute lightweight statistics from the feature map. This is near-zero cost because the features are already in GPU memory.
# bev_feature_monitor.py -- runs as CUDA kernel alongside perception
import torch
class BEVFeatureMonitor:
"""Monitor backbone BEV feature statistics for distribution shift."""
def __init__(self, num_channels=64, ewma_alpha=0.02):
self.alpha = ewma_alpha
self.C = num_channels
# Reference statistics (set during commissioning)
self.ref_mean = None # (C,)
self.ref_var = None # (C,)
self.ref_skew = None # (C,)
self.ref_kurt = None # (C,)
self.ref_l2_norm = None # scalar
# Running EWMA of current statistics
self.ewma_mean = None
self.ewma_var = None
def set_reference(self, mean, var, skew, kurt, l2_norm):
self.ref_mean = mean
self.ref_var = var
self.ref_skew = skew
self.ref_kurt = kurt
self.ref_l2_norm = l2_norm
self.ewma_mean = mean.clone()
self.ewma_var = var.clone()
@torch.no_grad()
def update(self, bev_features: torch.Tensor) -> dict:
"""Compute feature statistics from BEV tensor.
Args:
bev_features: (C, H, W) BEV feature map from backbone
Returns:
dict with anomaly scores
"""
C, H, W = bev_features.shape
flat = bev_features.view(C, -1) # (C, H*W)
# Per-channel statistics -- all computed in single pass
current_mean = flat.mean(dim=1) # (C,)
current_var = flat.var(dim=1) # (C,)
current_l2 = flat.norm(dim=1).mean() # scalar
# Standardized Mahalanobis-like distance from reference
if self.ref_var is not None:
mean_shift = ((current_mean - self.ref_mean) ** 2 /
(self.ref_var + 1e-8)).mean().item()
var_ratio = (current_var / (self.ref_var + 1e-8))
var_shift = ((var_ratio - 1.0) ** 2).mean().item()
l2_shift = abs(current_l2.item() - self.ref_l2_norm.item()) / \
(self.ref_l2_norm.item() + 1e-8)
else:
mean_shift = 0.0
var_shift = 0.0
l2_shift = 0.0
# Update EWMA
if self.ewma_mean is not None:
self.ewma_mean = self.alpha * current_mean + \
(1 - self.alpha) * self.ewma_mean
self.ewma_var = self.alpha * current_var + \
(1 - self.alpha) * self.ewma_var
# Composite feature anomaly score (0 = normal, 1 = severely anomalous)
feature_anomaly = min(1.0,
0.4 * sigmoid_scale(mean_shift, center=2.0, steepness=1.5) +
0.3 * sigmoid_scale(var_shift, center=1.5, steepness=2.0) +
0.3 * sigmoid_scale(l2_shift, center=0.5, steepness=3.0)
)
return {
'mean_shift': mean_shift,
'var_shift': var_shift,
'l2_shift': l2_shift,
'feature_anomaly_score': feature_anomaly,
}
def sigmoid_scale(x, center=1.0, steepness=2.0):
"""Map [0, inf) to [0, 1) with configurable center and steepness."""
return 1.0 / (1.0 + (-steepness * (x - center)).__exp__() if isinstance(x, float)
else torch.exp(-steepness * (x - center)))Cost: ~0.2ms on Orin GPU (fused kernel), executed in the same CUDA stream as the backbone immediately after feature extraction. Zero additional data transfer.
3.4 Reference Distribution Establishment
Reference distributions are established during commissioning at each airport:
- Collection phase (first 2 weeks of supervised operation): Record input statistics under nominal conditions across all times of day, representative weather conditions
- Stratification: Separate references for day/night, dry/wet, summer/winter
- Percentile-based thresholds: Set warning at 95th percentile of observed KL/MMD during commissioning, alert at 99th, critical at 99.9th
- Fleet sharing: Upload reference distributions to cloud; new airports bootstrap from most-similar existing airport, then refine locally
# reference_calibration.py -- run offline after commissioning
import numpy as np
from scipy.stats import percentileofscore
def calibrate_thresholds(kl_values: np.ndarray,
conditions: np.ndarray) -> dict:
"""Calibrate monitor thresholds from commissioning data.
Args:
kl_values: (N,) KL divergence values from N commissioning frames
conditions: (N,) condition labels ('day_dry', 'night_wet', etc.)
Returns:
Per-condition threshold dictionary
"""
thresholds = {}
for condition in np.unique(conditions):
mask = conditions == condition
values = kl_values[mask]
thresholds[condition] = {
'warn': float(np.percentile(values, 95)),
'alert': float(np.percentile(values, 99)),
'critical': float(np.percentile(values, 99.9)),
'n_samples': int(mask.sum()),
}
return thresholds4. Output Consistency Checking
4.1 Rationale
Even without ground truth, perception outputs exhibit statistical regularity. A model operating correctly on an airport apron produces a stable number of detections per frame, consistent bounding box sizes for known object types, stable class distributions, and smooth tracking trajectories. Deviations from these patterns -- sudden drops in detection count, anomalous bounding box sizes, class distribution shifts -- indicate potential model degradation.
4.2 CUSUM Detector for Detection Count
The Cumulative Sum (CUSUM) algorithm detects small persistent shifts in a signal's mean. It is more sensitive than simple thresholding and has well-characterized statistical properties.
// cusum_monitor.hpp
#pragma once
#include <algorithm>
#include <cmath>
class CUSUMMonitor {
public:
// Parameters
float target_mean; // Expected value (from commissioning)
float sigma; // Expected standard deviation
float k; // Allowance parameter (slack, typically 0.5*sigma)
float h; // Decision threshold (typically 4-5*sigma)
// State
float s_high = 0.0f; // Upper CUSUM (detects increase)
float s_low = 0.0f; // Lower CUSUM (detects decrease)
bool alarm_high = false;
bool alarm_low = false;
int frames_since_alarm = 0;
CUSUMMonitor(float mean, float std, float allowance_factor = 0.5f,
float threshold_factor = 4.0f)
: target_mean(mean), sigma(std),
k(allowance_factor * std),
h(threshold_factor * std) {}
struct Result {
bool alarm;
bool direction_high; // true = increase, false = decrease
float cusum_high;
float cusum_low;
float deviation; // How far from target (in sigma units)
};
Result update(float observation) {
float deviation = (observation - target_mean) / sigma;
// Upper CUSUM: detects shift upward (e.g., spike in detections = ghosts)
s_high = std::max(0.0f, s_high + (observation - target_mean) - k);
// Lower CUSUM: detects shift downward (e.g., drop = missed objects)
s_low = std::max(0.0f, s_low - (observation - target_mean) - k);
alarm_high = (s_high > h);
alarm_low = (s_low > h);
if (alarm_high || alarm_low) {
frames_since_alarm = 0;
} else {
frames_since_alarm++;
}
return {
alarm_high || alarm_low,
alarm_high,
s_high,
s_low,
deviation
};
}
void reset() {
s_high = 0.0f;
s_low = 0.0f;
alarm_high = false;
alarm_low = false;
}
};Application to detection count:
// detection_count_monitor.cpp
struct DetectionCountMonitor {
// Separate CUSUM monitors per class
CUSUMMonitor total_count{35.0f, 8.0f}; // ~35 objects typical on busy apron
CUSUMMonitor aircraft_count{2.0f, 1.0f}; // 1-3 aircraft at nearby stands
CUSUMMonitor personnel_count{8.0f, 4.0f}; // Highly variable
CUSUMMonitor gse_count{12.0f, 5.0f}; // Ground support equipment
CUSUMMonitor unknown_count{1.0f, 1.5f}; // Should be rare
struct FrameResult {
bool any_alarm;
bool detection_drop; // Most dangerous: missing objects
bool detection_spike; // Possible ghosts/false positives
bool unknown_spike; // Novel objects appearing
float worst_deviation;
};
FrameResult update(int total, int aircraft, int personnel,
int gse, int unknown) {
auto r_total = total_count.update(static_cast<float>(total));
auto r_aircraft = aircraft_count.update(static_cast<float>(aircraft));
auto r_personnel = personnel_count.update(static_cast<float>(personnel));
auto r_gse = gse_count.update(static_cast<float>(gse));
auto r_unknown = unknown_count.update(static_cast<float>(unknown));
return {
r_total.alarm || r_aircraft.alarm || r_personnel.alarm,
r_total.alarm && !r_total.direction_high, // Drop in total
r_total.alarm && r_total.direction_high, // Spike in total
r_unknown.alarm && r_unknown.direction_high, // Unknown objects
std::max({std::abs(r_total.deviation),
std::abs(r_aircraft.deviation),
std::abs(r_personnel.deviation)})
};
}
};4.3 EWMA for Bounding Box Size Distribution
Exponentially Weighted Moving Average (EWMA) tracks gradual shifts in bounding box dimensions. A shrinking average box size may indicate the model is "losing" distant detections. Growing boxes may indicate false merging of nearby objects.
// ewma_monitor.hpp
class EWMAMonitor {
public:
float lambda; // Smoothing factor (0.05-0.3)
float target; // Target value
float sigma; // Process standard deviation
float L; // Control limit factor (typically 2.5-3.0)
float ewma_value;
float control_limit;
bool alarm = false;
EWMAMonitor(float target, float sigma, float lambda = 0.1f,
float L = 3.0f)
: lambda(lambda), target(target), sigma(sigma), L(L),
ewma_value(target) {
// EWMA control limits (asymptotic)
// UCL/LCL = target +/- L * sigma * sqrt(lambda / (2 - lambda))
control_limit = L * sigma * std::sqrt(lambda / (2.0f - lambda));
}
bool update(float observation) {
ewma_value = lambda * observation + (1.0f - lambda) * ewma_value;
alarm = std::abs(ewma_value - target) > control_limit;
return alarm;
}
float deviation_ratio() const {
return std::abs(ewma_value - target) / control_limit;
}
};
// Apply to bounding box monitoring
struct BBoxSizeMonitor {
// Track mean bounding box dimensions per class
// Aircraft: L~40m, W~35m, H~12m (large variance by type)
// GSE: L~4m, W~2m, H~2.5m
// Personnel: L~0.6m, W~0.6m, H~1.7m
EWMAMonitor aircraft_length{40.0f, 15.0f, 0.05f};
EWMAMonitor gse_length{4.0f, 1.5f, 0.1f};
EWMAMonitor personnel_height{1.7f, 0.2f, 0.1f};
// Also track confidence score distribution
EWMAMonitor mean_confidence{0.82f, 0.08f, 0.05f};
bool update(const std::vector<Detection>& detections) {
// Compute per-class mean dimensions from current frame
// ... (aggregate by class, compute means, feed to EWMA)
// Returns true if any EWMA is in alarm state
return aircraft_length.alarm || gse_length.alarm ||
personnel_height.alarm || mean_confidence.alarm;
}
};4.4 Class Distribution Anomaly Detection
Track the empirical class distribution across a sliding window and compare against the expected distribution using chi-squared test.
struct ClassDistributionMonitor {
static constexpr int NUM_CLASSES = 10; // Airside class taxonomy
// 0: Aircraft, 1: Baggage cart, 2: Belt loader, 3: Catering truck,
// 4: Fuel truck, 5: Pushback tug, 6: Personnel, 7: Cone/barrier,
// 8: FOD, 9: Unknown
static constexpr int WINDOW_SIZE = 100; // 10 seconds at 10 Hz
std::array<std::array<int, NUM_CLASSES>, WINDOW_SIZE> history{};
int write_idx = 0;
bool window_full = false;
// Expected class proportions (from commissioning data)
std::array<float, NUM_CLASSES> expected_proportions = {
0.05f, // Aircraft (few but large)
0.15f, // Baggage cart
0.08f, // Belt loader
0.05f, // Catering truck
0.03f, // Fuel truck
0.06f, // Pushback tug
0.35f, // Personnel
0.15f, // Cone/barrier
0.03f, // FOD
0.05f // Unknown
};
float chi_squared_threshold = 25.0f; // ~p<0.005 for 9 dof
float compute_chi_squared() {
if (!window_full) return 0.0f;
// Sum class counts over window
std::array<float, NUM_CLASSES> observed{};
float total = 0.0f;
for (int i = 0; i < WINDOW_SIZE; i++) {
for (int c = 0; c < NUM_CLASSES; c++) {
observed[c] += history[i][c];
total += history[i][c];
}
}
if (total < 50.0f) return 0.0f; // Insufficient data
float chi2 = 0.0f;
for (int c = 0; c < NUM_CLASSES; c++) {
float expected = expected_proportions[c] * total;
if (expected < 1.0f) continue; // Avoid division by near-zero
float diff = observed[c] - expected;
chi2 += (diff * diff) / expected;
}
return chi2;
}
};4.5 Track Consistency Monitoring
Tracking quality degrades before detection quality -- ID switches, track fragmentation, and ghost tracks are early indicators of perception problems.
struct TrackConsistencyMonitor {
// Track lifecycle statistics over sliding window
int window_frames = 300; // 30 seconds
// Per-window counters
int id_switches = 0;
int track_births = 0;
int track_deaths = 0;
int ghost_tracks = 0; // Tracks <3 frames then lost
int total_tracks = 0;
// CUSUM monitors on rates
CUSUMMonitor id_switch_rate{0.5f, 0.3f}; // Per second
CUSUMMonitor fragmentation_rate{0.2f, 0.15f}; // Ghost tracks / total
struct TrackHealth {
float id_switch_score; // 0 = normal, 1 = severe
float fragmentation_score; // 0 = normal, 1 = severe
float composite;
};
TrackHealth evaluate() {
float seconds = window_frames / 10.0f;
float switch_rate = id_switches / seconds;
float frag_rate = (total_tracks > 0) ?
static_cast<float>(ghost_tracks) / total_tracks : 0.0f;
auto r_switch = id_switch_rate.update(switch_rate);
auto r_frag = fragmentation_rate.update(frag_rate);
float switch_score = std::min(1.0f,
std::abs(r_switch.deviation) / 3.0f);
float frag_score = std::min(1.0f,
std::abs(r_frag.deviation) / 3.0f);
return {
switch_score,
frag_score,
0.6f * switch_score + 0.4f * frag_score
};
}
};4.6 Occupancy Grid Fill Ratio
For occupancy-based perception (see 30-autonomy-stack/localization-mapping/maps/realtime-occupancy-grid-mapping.md), the fill ratio -- fraction of occupied voxels -- should be stable for a given scene type. Sudden changes indicate perception anomalies.
struct OccupancyFillMonitor {
EWMAMonitor near_fill{0.12f, 0.04f, 0.1f}; // 0-20m: ~12% occupied
EWMAMonitor mid_fill{0.06f, 0.03f, 0.1f}; // 20-50m: ~6% occupied
EWMAMonitor far_fill{0.02f, 0.02f, 0.1f}; // 50-100m: ~2% occupied
bool update(float near, float mid, float far) {
bool a = near_fill.update(near);
bool b = mid_fill.update(mid);
bool c = far_fill.update(far);
return a || b || c;
}
};5. Cross-Modal Consistency
5.1 Rationale
When two independent perception channels disagree, at least one is wrong. Cross-modal consistency checking uses agreement between sensors/models as a proxy for correctness without requiring ground truth. This is the most powerful perception monitoring technique for multi-sensor vehicles like the reference airside fleet.
5.2 LiDAR vs Radar Detection Agreement
4D imaging radar (Continental ARS548, see 20-av-platform/sensors/4d-radar.md) provides independent object detections. Compare against LiDAR detections.
// cross_modal_consistency.hpp
#pragma once
#include <vector>
#include <cmath>
struct Detection3D {
float x, y, z;
float length, width, height;
float confidence;
int class_id;
int sensor_source; // 0=LiDAR, 1=Radar, 2=Camera, 3=Thermal
};
class LidarRadarConsistency {
public:
// Association parameters
float max_association_dist = 3.0f; // meters (generous for radar accuracy)
float min_confidence = 0.3f; // Ignore low-confidence detections
struct ConsistencyResult {
int lidar_only; // Detections seen by LiDAR but not radar
int radar_only; // Detections seen by radar but not LiDAR
int both; // Matched detections
float position_rmse; // RMSE of matched detection positions
float agreement_ratio; // both / (lidar_only + radar_only + both)
float score; // 0 = complete disagreement, 1 = perfect agreement
};
ConsistencyResult compute(
const std::vector<Detection3D>& lidar_dets,
const std::vector<Detection3D>& radar_dets
) {
// Greedy nearest-neighbor association (sufficient at 10 Hz)
std::vector<bool> radar_matched(radar_dets.size(), false);
int matched = 0;
int lidar_count = 0;
float position_error_sum = 0.0f;
for (const auto& ld : lidar_dets) {
if (ld.confidence < min_confidence) continue;
lidar_count++;
float best_dist = max_association_dist;
int best_idx = -1;
for (size_t j = 0; j < radar_dets.size(); j++) {
if (radar_matched[j]) continue;
if (radar_dets[j].confidence < min_confidence) continue;
float dx = ld.x - radar_dets[j].x;
float dy = ld.y - radar_dets[j].y;
float dist = std::sqrt(dx * dx + dy * dy);
if (dist < best_dist) {
best_dist = dist;
best_idx = static_cast<int>(j);
}
}
if (best_idx >= 0) {
radar_matched[best_idx] = true;
matched++;
position_error_sum += best_dist * best_dist;
}
}
int radar_count = 0;
for (size_t j = 0; j < radar_dets.size(); j++) {
if (radar_dets[j].confidence >= min_confidence) radar_count++;
}
int lidar_only = lidar_count - matched;
int radar_only = radar_count - matched;
int total = lidar_only + radar_only + matched;
float agreement = (total > 0) ?
static_cast<float>(matched) / total : 1.0f;
float rmse = (matched > 0) ?
std::sqrt(position_error_sum / matched) : 0.0f;
// Score: high agreement + low position error = good
// Weight agreement more heavily (detection-level disagreement is
// more concerning than position-level disagreement)
float score = 0.7f * agreement +
0.3f * std::max(0.0f, 1.0f - rmse / max_association_dist);
return {lidar_only, radar_only, matched, rmse, agreement, score};
}
};5.3 Multi-LiDAR Overlap Consistency
With 4-8 RoboSense LiDARs having overlapping fields of view, objects in overlap zones should be detected consistently by both sensors. Disagreement indicates calibration drift or sensor-specific degradation.
struct MultiLidarOverlapMonitor {
// For each LiDAR pair with overlapping FOV, track detection agreement
// reference airside AV stack typical: pairs (0,1), (1,2), (2,3), (3,0) for 4 front-facing
static constexpr int MAX_PAIRS = 12; // C(8,2) worst case, typically 4-6 active
struct PairResult {
int lidar_a_id;
int lidar_b_id;
float agreement_ratio;
float position_consistency; // Std dev of matched positions
};
// Per-pair EWMA monitors
std::array<EWMAMonitor, MAX_PAIRS> pair_agreement;
int num_active_pairs = 0;
float worst_pair_score() {
float worst = 1.0f;
for (int i = 0; i < num_active_pairs; i++) {
worst = std::min(worst,
1.0f - pair_agreement[i].deviation_ratio());
}
return worst;
}
};5.4 GTSAM Innovation Sequence Monitoring
The GTSAM factor graph optimizer (see 30-autonomy-stack/localization-mapping/overview/mapping-and-localization.md) produces innovation sequences -- the difference between predicted and observed measurements. Under nominal conditions, innovations should be zero-mean Gaussian with covariance matching the measurement noise model. Departure from this indicates either localization failure or sensor/perception degradation.
struct GTSAMInnovationMonitor {
// Monitor innovation statistics from GTSAM factor graph
// Innovations should be zero-mean with known covariance
// Normalized Innovation Squared (NIS) test
// NIS = innovation^T * S^-1 * innovation, where S = innovation covariance
// Under nominal conditions, NIS ~ chi-squared(dof)
EWMAMonitor nis_mean{0.0f, 1.0f, 0.1f}; // Should track 0
CUSUMMonitor nis_cusum{3.0f, 1.5f}; // NIS mean should be ~dof (e.g., 3 for 3D)
// Per-sensor type innovation tracking
struct SensorInnovation {
float mean_nis;
float max_nis;
int outlier_count; // NIS > chi2_threshold
float chi2_threshold; // 99th percentile of chi2(dof)
};
SensorInnovation lidar_vgicp{0, 0, 0, 11.34f}; // dof=3, p=0.01
SensorInnovation imu{0, 0, 0, 16.81f}; // dof=6, p=0.01
SensorInnovation gps{0, 0, 0, 11.34f}; // dof=3, p=0.01
SensorInnovation wheel_odom{0, 0, 0, 9.21f}; // dof=2, p=0.01
float localization_health() {
// Combine sensor innovation health scores
auto score = [](const SensorInnovation& si) -> float {
if (si.mean_nis < si.chi2_threshold * 0.5f) return 1.0f;
if (si.mean_nis < si.chi2_threshold) return 0.7f;
if (si.mean_nis < si.chi2_threshold * 2.0f) return 0.4f;
return 0.1f;
};
return 0.4f * score(lidar_vgicp) +
0.2f * score(imu) +
0.25f * score(gps) +
0.15f * score(wheel_odom);
}
};5.5 Tracking Prediction vs Observation Residuals
The multi-object tracker predicts where objects will be in the next frame based on motion models. Large residuals between predictions and actual observations indicate either perception noise or tracker model mismatch.
struct TrackingResidualMonitor {
// For each active track, compute prediction-observation residual
// Large residuals = perception noise or tracker degradation
EWMAMonitor mean_residual_xy{0.15f, 0.08f, 0.1f}; // meters
EWMAMonitor mean_residual_v{0.3f, 0.2f, 0.1f}; // m/s
CUSUMMonitor large_residual_rate{0.05f, 0.03f}; // Fraction >1m
void update(const std::vector<float>& residuals_xy,
const std::vector<float>& residuals_v) {
if (residuals_xy.empty()) return;
float mean_xy = 0.0f, mean_v = 0.0f;
int large_count = 0;
for (size_t i = 0; i < residuals_xy.size(); i++) {
mean_xy += residuals_xy[i];
if (i < residuals_v.size()) mean_v += residuals_v[i];
if (residuals_xy[i] > 1.0f) large_count++;
}
mean_xy /= residuals_xy.size();
if (!residuals_v.empty()) mean_v /= residuals_v.size();
float large_rate = static_cast<float>(large_count) / residuals_xy.size();
mean_residual_xy.update(mean_xy);
mean_residual_v.update(mean_v);
large_residual_rate.update(large_rate);
}
float health_score() {
float xy_score = 1.0f - std::min(1.0f,
mean_residual_xy.deviation_ratio());
float rate_score = large_residual_rate.alarm ? 0.3f : 1.0f;
return 0.6f * xy_score + 0.4f * rate_score;
}
};5.6 Expected Cross-Modal Agreement Rates
Calibrated during commissioning per airport:
| Comparison | Normal Agreement | Degraded Threshold | Restricted Threshold | Notes |
|---|---|---|---|---|
| LiDAR vs Radar (objects >1m) | 75-90% | <65% | <50% | Radar misses small objects normally |
| Multi-LiDAR overlap | 90-98% | <85% | <75% | High agreement expected |
| Track prediction residual (xy) | 0.10-0.20m | >0.35m | >0.60m | Depends on object speed |
| GTSAM NIS (LiDAR VGICP) | 1.5-4.5 | >8.0 | >15.0 | Chi-squared with 3 dof |
| GTSAM NIS (GPS) | 1.0-5.0 | >10.0 | >20.0 | Higher during multipath |
| Occupancy fill stability (near) | +/-2% | >+/-5% | >+/-10% | Per-frame variation |
6. OOD Detection Integration
6.1 Methods for Airside Perception
Three complementary OOD detection methods, each with different strengths (see 30-autonomy-stack/perception/overview/uncertainty-quantification-calibration.md for mathematical foundations):
Method 1: Energy-based OOD Score
Uses the logits from the detection head directly. No additional model required.
# energy_ood.py
import torch
import torch.nn.functional as F
class EnergyOODDetector:
"""Energy-based OOD detection (Liu et al., NeurIPS 2020).
Free energy E(x) = -T * log(sum_c exp(f_c(x) / T))
In-distribution: low energy. OOD: high energy.
"""
def __init__(self, temperature=1.0, in_dist_percentiles=None):
self.T = temperature
# Thresholds set from in-distribution calibration data
self.energy_mean = 0.0
self.energy_std = 1.0
if in_dist_percentiles is not None:
self.energy_95 = in_dist_percentiles['p95']
self.energy_99 = in_dist_percentiles['p99']
else:
self.energy_95 = float('inf')
self.energy_99 = float('inf')
@torch.no_grad()
def compute_energy(self, logits: torch.Tensor) -> torch.Tensor:
"""Compute free energy from classification logits.
Args:
logits: (N, C) classification logits for N detections, C classes
Returns:
energy: (N,) free energy per detection (lower = more in-distribution)
"""
return -self.T * torch.logsumexp(logits / self.T, dim=1)
def score_frame(self, logits: torch.Tensor) -> dict:
"""Score entire frame for OOD content."""
if logits.shape[0] == 0:
return {'mean_energy': 0.0, 'max_energy': 0.0,
'ood_fraction': 0.0, 'ood_score': 0.0}
energies = self.compute_energy(logits)
mean_e = energies.mean().item()
max_e = energies.max().item()
ood_fraction = (energies > self.energy_95).float().mean().item()
# Normalized OOD score: 0 = fully in-distribution, 1 = fully OOD
normalized = ((mean_e - self.energy_mean) /
(self.energy_std + 1e-8))
ood_score = min(1.0, max(0.0, normalized / 3.0)) # Clip at 3 sigma
return {
'mean_energy': mean_e,
'max_energy': max_e,
'ood_fraction': ood_fraction,
'ood_score': ood_score,
}Cost: ~0.1ms (logits already computed by detection head).
Method 2: Mahalanobis Distance in Feature Space
Measures how far the current feature representation is from the training distribution in the learned feature space.
# mahalanobis_ood.py
import torch
class MahalanobisOODDetector:
"""Mahalanobis distance OOD detection (Lee et al., NeurIPS 2018).
Compute distance from class-conditional Gaussian fitted to
penultimate layer features of the trained model.
"""
def __init__(self, num_classes=10, feature_dim=256):
self.num_classes = num_classes
self.feature_dim = feature_dim
# Class-conditional means and shared precision matrix
# Set from training data features
self.class_means = None # (C, D) -- C classes, D feature dim
self.precision = None # (D, D) -- shared inverse covariance
# Calibration thresholds
self.threshold_warn = 50.0
self.threshold_alert = 100.0
self.threshold_critical = 200.0
def fit(self, features: torch.Tensor, labels: torch.Tensor):
"""Fit class-conditional Gaussians from training features.
Args:
features: (N, D) penultimate layer features
labels: (N,) class labels
"""
self.class_means = torch.zeros(self.num_classes, self.feature_dim)
covariance = torch.zeros(self.feature_dim, self.feature_dim)
for c in range(self.num_classes):
mask = labels == c
if mask.sum() == 0:
continue
class_features = features[mask]
self.class_means[c] = class_features.mean(dim=0)
centered = class_features - self.class_means[c]
covariance += centered.T @ centered
covariance /= features.shape[0]
# Add small ridge for numerical stability
covariance += 1e-4 * torch.eye(self.feature_dim)
self.precision = torch.linalg.inv(covariance)
@torch.no_grad()
def compute_distance(self, features: torch.Tensor) -> torch.Tensor:
"""Compute minimum Mahalanobis distance across all classes.
Args:
features: (N, D) features for N detections
Returns:
distances: (N,) minimum Mahalanobis distance per detection
"""
# (N, C, D) - broadcast subtraction
diff = features.unsqueeze(1) - self.class_means.unsqueeze(0)
# (N, C) - Mahalanobis distance per class
# d = diff^T @ precision @ diff
mahal = torch.einsum('ncd,de,nce->nc', diff, self.precision, diff)
# Minimum across classes
return mahal.min(dim=1).values
def score_frame(self, features: torch.Tensor) -> dict:
if features.shape[0] == 0:
return {'mean_mahal': 0.0, 'max_mahal': 0.0, 'ood_score': 0.0}
distances = self.compute_distance(features)
mean_d = distances.mean().item()
max_d = distances.max().item()
# Normalize to [0, 1]
ood_score = min(1.0, mean_d / self.threshold_alert)
return {
'mean_mahal': mean_d,
'max_mahal': max_d,
'ood_score': ood_score,
}Cost: ~0.5ms on Orin GPU for 50 detections with 256-dim features.
Method 3: Feature Norm as Lightweight Proxy
The simplest OOD indicator: the L2 norm of backbone features tends to be lower for OOD inputs. Nearly free to compute.
class FeatureNormOOD:
"""L2 feature norm as OOD proxy -- nearly zero cost."""
def __init__(self, ref_mean_norm=12.5, ref_std_norm=2.0):
self.ref_mean = ref_mean_norm
self.ref_std = ref_std_norm
@torch.no_grad()
def score(self, features: torch.Tensor) -> float:
"""Score frame by mean feature norm deviation."""
norm = features.norm(dim=-1).mean().item()
z_score = abs(norm - self.ref_mean) / self.ref_std
return min(1.0, z_score / 3.0) # 3-sigma → 1.06.2 Threshold Calibration
Critical design decision: how to set OOD thresholds without an OOD validation set.
Approach 1: Percentile-based (simple, conservative)
def calibrate_ood_thresholds(in_distribution_scores: np.ndarray) -> dict:
"""Set thresholds from in-distribution scores only.
Logic: if it's more extreme than 95% of in-distribution,
treat as potentially OOD. Beyond 99.9%, treat as definitely OOD.
"""
return {
'novel_but_safe': float(np.percentile(in_distribution_scores, 95)),
'likely_ood': float(np.percentile(in_distribution_scores, 99)),
'definitely_ood': float(np.percentile(in_distribution_scores, 99.9)),
}Approach 2: Conformal prediction (distribution-free guarantees)
def conformal_ood_threshold(calibration_scores: np.ndarray,
alpha: float = 0.01) -> float:
"""Set threshold with P(false alarm) <= alpha guarantee.
Conformal prediction: if we observe n calibration scores,
the (1-alpha)(1+1/n) quantile gives a threshold with
coverage guarantee.
"""
n = len(calibration_scores)
quantile_level = min(1.0, (1.0 - alpha) * (1.0 + 1.0 / n))
threshold = float(np.quantile(calibration_scores, quantile_level))
return thresholdWith 1,000 commissioning frames and alpha=0.01, this gives a threshold where the false alarm rate is guaranteed to be at most 1% regardless of the score distribution.
6.3 Distinguishing "Novel but Safe" from "Novel and Dangerous"
Not all OOD inputs are dangerous. A new type of baggage cart never seen in training is OOD but still a standard GSE vehicle. A stray construction vehicle is OOD and potentially dangerous due to unknown dynamics.
Decision Matrix for OOD Detections:
Localized OOD Global OOD
(one/few detections) (entire scene shift)
┌──────────────────────┬──────────────────────┐
Low │ Log + continue │ Input distribution │
severity │ (new GSE variant, │ drift. Flag for │
(object- │ new marking) │ offline review. │
level OOD │ Treat as "unknown │ Adjust reference. │
score │ obstacle" class │ │
moderate) │ │ │
├──────────────────────┼──────────────────────┤
High │ Conservative │ Domain shift or │
severity │ treatment: │ environmental event. │
(detection │ Stop, 5m clearance, │ Enter DEGRADED mode. │
with very │ flag for teleop. │ Speed reduction. │
high OOD │ Object could be │ Request human │
score) │ anything. │ oversight. │
└──────────────────────┴──────────────────────┘Implementation: combine per-detection OOD score with spatial extent and temporal persistence.
enum class OODCategory {
IN_DISTRIBUTION,
NOVEL_BENIGN, // New object type, tracking normally
NOVEL_UNCERTAIN, // New object type, behavior unclear
OOD_DANGEROUS, // High OOD + unexpected behavior
GLOBAL_SHIFT, // Entire scene is OOD
};
OODCategory classify_ood(float ood_score, float track_stability,
float global_input_shift, int frame_persistence) {
if (ood_score < 0.3f) return OODCategory::IN_DISTRIBUTION;
if (global_input_shift > 0.5f) return OODCategory::GLOBAL_SHIFT;
// Object-level OOD
if (track_stability > 0.8f && frame_persistence > 30) {
// OOD but tracks stably -- probably new GSE type
return OODCategory::NOVEL_BENIGN;
}
if (ood_score > 0.7f || track_stability < 0.4f) {
return OODCategory::OOD_DANGEROUS;
}
return OODCategory::NOVEL_UNCERTAIN;
}7. ODD Boundary Monitoring
7.1 Formal ODD Specification
The ODD (Operational Design Domain) is defined as a machine-readable constraint set. Each parameter has four operating zones with explicit thresholds. This section complements 60-safety-validation/runtime-assurance/weather-adaptive-odd-management.md by focusing on perception-driven ODD assessment rather than environmental weather feeds.
# odd_specification.py
from dataclasses import dataclass, field
from enum import IntEnum
from typing import Dict
class ODDLevel(IntEnum):
NORMAL = 0 # Full capability
DEGRADED = 1 # Reduced speed, increased margins
RESTRICTED = 2 # Minimal operation, teleop standby
SUSPENDED = 3 # Safe stop required
@dataclass
class ODDParameter:
"""Single ODD parameter with four-level thresholds."""
name: str
unit: str
# Thresholds -- direction indicates whether higher or lower is worse
normal_min: float = float('-inf')
normal_max: float = float('inf')
degraded_min: float = float('-inf')
degraded_max: float = float('inf')
restricted_min: float = float('-inf')
restricted_max: float = float('inf')
# Beyond restricted thresholds = SUSPENDED
# Hysteresis (prevent oscillation at boundaries)
hysteresis: float = 0.0
# Current smoothed value
_current: float = 0.0
_level: ODDLevel = ODDLevel.NORMAL
def evaluate(self, value: float) -> ODDLevel:
self._current = value
# Determine raw level
if (value >= self.normal_min and value <= self.normal_max):
raw_level = ODDLevel.NORMAL
elif (value >= self.degraded_min and value <= self.degraded_max):
raw_level = ODDLevel.DEGRADED
elif (value >= self.restricted_min and value <= self.restricted_max):
raw_level = ODDLevel.RESTRICTED
else:
raw_level = ODDLevel.SUSPENDED
# Apply hysteresis: fast degradation, slow recovery
if raw_level > self._level:
# Degrading: immediate transition
self._level = raw_level
elif raw_level < self._level:
# Recovering: require sustained improvement (handled by state machine)
pass # State machine handles recovery delay
return self._level
# Complete airside ODD specification
AIRSIDE_ODD_SPEC: Dict[str, ODDParameter] = {
# === Perception-derived parameters ===
'lidar_effective_range': ODDParameter(
name='LiDAR Effective Detection Range',
unit='meters',
normal_min=60.0, normal_max=float('inf'),
degraded_min=40.0, degraded_max=float('inf'),
restricted_min=20.0, restricted_max=float('inf'),
hysteresis=5.0,
),
'perception_health_score': ODDParameter(
name='Perception Health Score (PHS)',
unit='ratio [0-1]',
normal_min=0.7, normal_max=1.0,
degraded_min=0.5, degraded_max=1.0,
restricted_min=0.3, restricted_max=1.0,
hysteresis=0.05,
),
'ood_score': ODDParameter(
name='Global OOD Score',
unit='ratio [0-1]',
normal_min=0.0, normal_max=0.3,
degraded_min=0.0, degraded_max=0.5,
restricted_min=0.0, restricted_max=0.7,
hysteresis=0.05,
),
'cross_modal_agreement': ODDParameter(
name='Cross-Modal Detection Agreement',
unit='ratio [0-1]',
normal_min=0.65, normal_max=1.0,
degraded_min=0.50, degraded_max=1.0,
restricted_min=0.35, restricted_max=1.0,
hysteresis=0.05,
),
'detection_count_stability': ODDParameter(
name='Detection Count CUSUM Status',
unit='sigma',
normal_min=0.0, normal_max=2.0,
degraded_min=0.0, degraded_max=3.5,
restricted_min=0.0, restricted_max=5.0,
hysteresis=0.3,
),
# === Environment-derived parameters ===
# (Complement weather-adaptive-odd-management.md)
'visibility_range': ODDParameter(
name='Meteorological Visibility',
unit='meters',
normal_min=2000.0, normal_max=float('inf'),
degraded_min=500.0, degraded_max=float('inf'),
restricted_min=200.0, restricted_max=float('inf'),
hysteresis=100.0,
),
'precipitation_rate': ODDParameter(
name='Precipitation Intensity',
unit='mm/hr',
normal_min=0.0, normal_max=2.5,
degraded_min=0.0, degraded_max=7.5,
restricted_min=0.0, restricted_max=15.0,
hysteresis=0.5,
),
'wind_speed': ODDParameter(
name='Surface Wind Speed',
unit='knots',
normal_min=0.0, normal_max=20.0,
degraded_min=0.0, degraded_max=35.0,
restricted_min=0.0, restricted_max=50.0,
hysteresis=3.0,
),
'ambient_temperature': ODDParameter(
name='Ambient Temperature',
unit='celsius',
normal_min=-5.0, normal_max=40.0,
degraded_min=-15.0, degraded_max=48.0,
restricted_min=-25.0, restricted_max=55.0,
hysteresis=2.0,
),
# === Operational parameters ===
'traffic_density': ODDParameter(
name='Proximate Object Count',
unit='count within 50m',
normal_min=0.0, normal_max=25.0,
degraded_min=0.0, degraded_max=40.0,
restricted_min=0.0, restricted_max=60.0,
hysteresis=3.0,
),
'calibration_health': ODDParameter(
name='Multi-LiDAR Calibration Health',
unit='ratio [0-1]',
normal_min=0.8, normal_max=1.0,
degraded_min=0.6, degraded_max=1.0,
restricted_min=0.4, restricted_max=1.0,
hysteresis=0.05,
),
}7.2 ODD State Machine with Hysteresis
┌──────────────────────────────────────┐
│ │
▼ │
┌──────────────┐ Any param enters DEGRADED │
│ │ (immediate) │
────────► │ NORMAL │ ─────────────────────┐ │
│ v_max=25 │ │ │
│ margin=1x │ ▼ │
└──────────────┘ ┌──────────────┐ │
▲ │ │ │
│ │ DEGRADED │ │
│ All params │ v_max=15 │──┘
│ NORMAL for │ margin=1.5x │ All params NORMAL
│ T_recover=30s │ │ for 30s
│ └──────┬───────┘
│ │
│ Any param enters │ Any param enters
│ RESTRICTED │ RESTRICTED
│ (immediate) │ (immediate)
│ ▼
│ ┌──────────────┐
│ │ │
│ │ RESTRICTED │
│ │ v_max=8 │
│ │ margin=2.5x │
│ │ teleop req │
│ └──────┬───────┘
│ │
│ All params │ Any param enters
│ ≤DEGRADED │ SUSPENDED
│ for 60s │ (immediate)
│ ▼
│ ┌──────────────┐
│ │ │
│ │ SUSPENDED │
│ │ v_max=0 │
│ │ safe stop │
│ │ maint. req │
│ └──────┬───────┘
│ │
│ All params ≤RESTRICTED │
│ for 120s + human ack │
└─────────────────────────┘Key design principles:
- Fast degradation, slow recovery: Transition to worse state is immediate; recovery requires sustained good conditions for a hysteresis period
- Worst-parameter drives state: The system state is at least as bad as the worst individual parameter
- SUSPENDED requires human acknowledgment to recover: prevents automated oscillation in edge conditions
- Recovery times increase with severity: DEGRADED→NORMAL: 30s, RESTRICTED→DEGRADED: 60s, SUSPENDED→RESTRICTED: 120s + human ack
// odd_state_machine.hpp
#pragma once
#include <chrono>
#include <algorithm>
#include <map>
#include <string>
enum class ODDState { NORMAL, DEGRADED, RESTRICTED, SUSPENDED };
class ODDStateMachine {
public:
struct Config {
// Recovery hold times (seconds)
float degraded_to_normal_hold = 30.0f;
float restricted_to_degraded_hold = 60.0f;
float suspended_to_restricted_hold = 120.0f;
// Speed limits per state (km/h)
float normal_speed = 25.0f;
float degraded_speed = 15.0f;
float restricted_speed = 8.0f;
float suspended_speed = 0.0f;
// Safety margin multipliers
float normal_margin = 1.0f;
float degraded_margin = 1.5f;
float restricted_margin = 2.5f;
};
ODDStateMachine(Config config = {}) : config_(config) {}
struct StateOutput {
ODDState state;
float max_speed_kmh;
float margin_multiplier;
bool teleop_requested;
bool safe_stop_required;
bool maintenance_required;
std::string worst_parameter;
float time_in_state_s;
};
StateOutput update(
const std::map<std::string, ODDState>& param_states,
float dt_seconds,
bool human_ack_received = false
) {
// Find worst parameter state
ODDState worst = ODDState::NORMAL;
std::string worst_param = "none";
for (const auto& [name, state] : param_states) {
if (static_cast<int>(state) > static_cast<int>(worst)) {
worst = state;
worst_param = name;
}
}
// State transition logic
ODDState target = worst;
if (target > current_state_) {
// Degradation: immediate
current_state_ = target;
recovery_timer_ = 0.0f;
time_in_state_ = 0.0f;
} else if (target < current_state_) {
// Recovery: requires sustained improvement
recovery_timer_ += dt_seconds;
float required_hold = get_recovery_hold(current_state_);
bool can_recover = true;
if (current_state_ == ODDState::SUSPENDED) {
can_recover = human_ack_received;
}
if (recovery_timer_ >= required_hold && can_recover) {
// Step down one level (not jump directly to NORMAL)
current_state_ = static_cast<ODDState>(
static_cast<int>(current_state_) - 1);
recovery_timer_ = 0.0f;
time_in_state_ = 0.0f;
}
} else {
recovery_timer_ = 0.0f;
}
time_in_state_ += dt_seconds;
return {
current_state_,
get_max_speed(current_state_),
get_margin(current_state_),
current_state_ >= ODDState::RESTRICTED,
current_state_ == ODDState::SUSPENDED,
current_state_ == ODDState::SUSPENDED &&
time_in_state_ > 300.0f, // 5 min suspended = maintenance
worst_param,
time_in_state_
};
}
private:
Config config_;
ODDState current_state_ = ODDState::NORMAL;
float recovery_timer_ = 0.0f;
float time_in_state_ = 0.0f;
float get_recovery_hold(ODDState state) const {
switch (state) {
case ODDState::DEGRADED: return config_.degraded_to_normal_hold;
case ODDState::RESTRICTED: return config_.restricted_to_degraded_hold;
case ODDState::SUSPENDED: return config_.suspended_to_restricted_hold;
default: return 0.0f;
}
}
float get_max_speed(ODDState state) const {
switch (state) {
case ODDState::NORMAL: return config_.normal_speed;
case ODDState::DEGRADED: return config_.degraded_speed;
case ODDState::RESTRICTED: return config_.restricted_speed;
case ODDState::SUSPENDED: return config_.suspended_speed;
default: return 0.0f;
}
}
float get_margin(ODDState state) const {
switch (state) {
case ODDState::NORMAL: return config_.normal_margin;
case ODDState::DEGRADED: return config_.degraded_margin;
case ODDState::RESTRICTED: return config_.restricted_margin;
case ODDState::SUSPENDED: return config_.restricted_margin;
default: return config_.restricted_margin;
}
}
};7.3 LiDAR Effective Range Estimation
A critical perception-derived ODD parameter. The effective detection range is estimated without ground truth by measuring the maximum range at which the point cloud still has sufficient density for detection.
struct EffectiveRangeEstimator {
// Estimate effective LiDAR detection range from point cloud density
// Method: find the range beyond which point density drops below
// the minimum required for reliable object detection
static constexpr float MIN_DENSITY_PER_M2 = 2.0f; // Points/m^2
static constexpr int NUM_RANGE_BINS = 20; // 5m bins from 0-100m
static constexpr float BIN_SIZE = 5.0f;
float estimate(const std::vector<float>& point_ranges) {
// Bin points by range
std::array<int, NUM_RANGE_BINS> counts{};
for (float r : point_ranges) {
int bin = static_cast<int>(r / BIN_SIZE);
if (bin >= 0 && bin < NUM_RANGE_BINS) {
counts[bin]++;
}
}
// Find outermost bin with sufficient density
// Area of ring at range r with width dr = 2*pi*r*dr
float effective_range = 10.0f; // Minimum safe range
for (int i = 0; i < NUM_RANGE_BINS; i++) {
float r_center = (i + 0.5f) * BIN_SIZE;
float ring_area = 2.0f * 3.14159f * r_center * BIN_SIZE;
float density = counts[i] / ring_area;
if (density >= MIN_DENSITY_PER_M2) {
effective_range = (i + 1) * BIN_SIZE;
}
}
return effective_range;
}
};7.4 NOTAM Integration
Active NOTAMs constrain the ODD spatially. See 70-operations-domains/airside/operations/ground-control-instructions.md for NOTAM parsing.
# notam_odd_constraints.py
@dataclass
class NOTAMConstraint:
"""NOTAM-derived ODD spatial constraint."""
notam_id: str
zone_polygon: list # [(lat, lon), ...] or apron-relative coords
constraint_type: str # 'CLOSED', 'SPEED_REDUCED', 'CAUTION', 'CONSTRUCTION'
max_speed_override: float = None # km/h, None = use ODD state speed
margin_override: float = None # multiplier
effective_from: str = ""
effective_to: str = ""
def apply_notam_constraints(odd_output: dict,
active_notams: list,
vehicle_position: tuple) -> dict:
"""Apply NOTAM constraints on top of perception-derived ODD."""
for notam in active_notams:
if point_in_polygon(vehicle_position, notam.zone_polygon):
if notam.constraint_type == 'CLOSED':
odd_output['state'] = 'SUSPENDED'
odd_output['reason'] = f'NOTAM {notam.notam_id}: zone closed'
elif notam.max_speed_override is not None:
odd_output['max_speed'] = min(
odd_output['max_speed'], notam.max_speed_override)
if notam.margin_override is not None:
odd_output['margin_multiplier'] = max(
odd_output['margin_multiplier'], notam.margin_override)
return odd_output8. Perception Confidence Aggregation
8.1 The Perception Health Score (PHS)
The PHS combines all individual monitor outputs into a single score in [0, 1] that drives system behavior. A value of 1.0 indicates full confidence in perception outputs; 0.0 indicates no confidence.
Design requirements:
- Conservative: PHS should decrease when any individual monitor degrades
- Smooth: No sudden jumps from minor fluctuations
- Interpretable: Each component's contribution traceable
- Fast: <0.1ms computation
8.2 Bayesian Fusion Architecture
We use a weighted geometric mean formulation that naturally captures the "weakest link" property: if any single monitor reports severe degradation, the overall PHS drops significantly even if other monitors are healthy.
// perception_health_score.hpp
#pragma once
#include <cmath>
#include <array>
#include <algorithm>
#include <string>
struct MonitorScore {
std::string name;
float score; // [0, 1] where 1 = healthy
float weight; // Importance weight
float confidence; // How much to trust this monitor [0, 1]
};
class PerceptionHealthScore {
public:
// Monitor weights (sum to 1.0, calibrated from validation data)
static constexpr int NUM_MONITORS = 8;
struct Weights {
float input_distribution = 0.10f;
float feature_ood = 0.15f;
float output_consistency = 0.15f;
float cross_modal = 0.20f; // Highest weight: independent check
float track_consistency = 0.10f;
float calibration_health = 0.10f;
float localization_health = 0.10f;
float temporal_trend = 0.10f;
};
struct PHSResult {
float phs; // [0, 1] composite score
float phs_smoothed; // EWMA-smoothed PHS
std::array<float, NUM_MONITORS> component_scores;
std::array<float, NUM_MONITORS> component_contributions;
std::string worst_component;
float worst_score;
};
PHSResult compute(
float input_dist_score, // From Section 3
float feature_ood_score, // From Section 6 (inverted: 1 = healthy)
float output_consist_score, // From Section 4
float cross_modal_score, // From Section 5
float track_consist_score, // From Section 4.5
float calibration_score, // From Section 9
float localization_score, // From Section 5.4
float temporal_score // From Section 10
) {
// Invert OOD scores (OOD reports high = bad, PHS uses high = good)
// All inputs should be in [0, 1] where 1 = healthy
std::array<float, NUM_MONITORS> scores = {
input_dist_score,
feature_ood_score,
output_consist_score,
cross_modal_score,
track_consist_score,
calibration_score,
localization_score,
temporal_score,
};
std::array<float, NUM_MONITORS> weights = {
weights_.input_distribution,
weights_.feature_ood,
weights_.output_consistency,
weights_.cross_modal,
weights_.track_consistency,
weights_.calibration_health,
weights_.localization_health,
weights_.temporal_trend,
};
static const std::array<std::string, NUM_MONITORS> names = {
"input_distribution",
"feature_ood",
"output_consistency",
"cross_modal",
"track_consistency",
"calibration",
"localization",
"temporal_trend",
};
// Weighted geometric mean: PHS = prod(s_i ^ w_i)
// Equivalent to exp(sum(w_i * log(s_i)))
// This ensures that any single low score pulls down the overall PHS
float log_phs = 0.0f;
float worst = 1.0f;
int worst_idx = 0;
std::array<float, NUM_MONITORS> contributions;
for (int i = 0; i < NUM_MONITORS; i++) {
float s = std::max(0.01f, std::min(1.0f, scores[i]));
log_phs += weights[i] * std::log(s);
contributions[i] = weights[i] * std::log(s);
if (s < worst) {
worst = s;
worst_idx = i;
}
}
float phs = std::exp(log_phs);
// Additional floor: if ANY component is critically low, cap PHS
// This prevents a single catastrophic failure from being
// averaged away by many healthy monitors
if (worst < 0.2f) {
phs = std::min(phs, 0.3f); // Floor if any component critical
}
// EWMA smoothing
phs_smoothed_ = phs_alpha_ * phs + (1.0f - phs_alpha_) * phs_smoothed_;
return {
phs,
phs_smoothed_,
scores,
contributions,
names[worst_idx],
worst,
};
}
// Get system response recommendation based on PHS
struct ResponseRecommendation {
float max_speed_kmh;
float margin_multiplier;
bool request_teleop;
bool request_safe_stop;
};
ResponseRecommendation get_response(float phs_smoothed) const {
if (phs_smoothed >= 0.7f) {
return {25.0f, 1.0f, false, false}; // NORMAL
} else if (phs_smoothed >= 0.5f) {
// Linear interpolation between NORMAL and DEGRADED
float t = (0.7f - phs_smoothed) / 0.2f;
return {
25.0f - t * 10.0f, // 25 → 15 km/h
1.0f + t * 0.5f, // 1.0 → 1.5x margins
false, false
};
} else if (phs_smoothed >= 0.3f) {
float t = (0.5f - phs_smoothed) / 0.2f;
return {
15.0f - t * 7.0f, // 15 → 8 km/h
1.5f + t * 1.0f, // 1.5 → 2.5x margins
true, false // Request teleop
};
} else {
return {0.0f, 2.5f, true, true}; // SAFE STOP
}
}
private:
Weights weights_;
float phs_smoothed_ = 1.0f;
float phs_alpha_ = 0.1f; // EWMA smoothing factor
};8.3 PHS Behavior Examples
| Scenario | Input | Feature OOD | Output | Cross-Modal | Calib | Loc. | PHS | Action |
|---|---|---|---|---|---|---|---|---|
| Normal operation | 0.95 | 0.90 | 0.92 | 0.88 | 0.95 | 0.93 | 0.91 | Normal |
| New aircraft type | 0.85 | 0.60 | 0.80 | 0.70 | 0.95 | 0.93 | 0.74 | Normal (edge) |
| De-icing spray nearby | 0.50 | 0.65 | 0.55 | 0.60 | 0.90 | 0.85 | 0.55 | Degraded, slow to 15 |
| Heavy rain onset | 0.40 | 0.50 | 0.45 | 0.45 | 0.80 | 0.70 | 0.41 | Restricted, 8 km/h |
| Domain shift (new airport) | 0.30 | 0.35 | 0.40 | 0.55 | 0.90 | 0.80 | 0.38 | Restricted, teleop |
| Multiple sensor degradation | 0.20 | 0.25 | 0.30 | 0.20 | 0.50 | 0.40 | 0.19 | Safe stop |
| Calibration drift + fog | 0.45 | 0.55 | 0.50 | 0.40 | 0.30 | 0.60 | 0.36 | Restricted |
8.4 PHS Logging and Audit Trail
Every PHS computation is logged for post-hoc analysis and certification evidence:
struct PHSLogEntry {
double timestamp;
float phs_raw;
float phs_smoothed;
std::array<float, 8> component_scores;
std::string worst_component;
ODDState odd_state;
float max_speed_commanded;
float margin_multiplier;
bool teleop_requested;
// Serialize to compact binary for rosbag
std::vector<uint8_t> serialize() const;
};At 10 Hz, this produces approximately 1.2 KB/s of monitoring data -- negligible relative to LiDAR data volumes.
9. Calibration Drift Detection
9.1 The Calibration Problem
Multi-LiDAR extrinsic calibration drifts due to thermal expansion, vibration, and mechanical settling (see 20-av-platform/sensors/multi-lidar-calibration.md). A 0.1 degree error produces 17 cm error at 100m -- enough to cause ghost detections and split objects that degrade perception without triggering sensor health alarms.
9.2 Mutual Information Between Overlapping FOVs
For LiDAR pairs with overlapping fields of view, compute the mutual information (MI) of their point clouds in the overlap region. Well-calibrated sensors produce high MI; drifted calibration produces low MI.
// calibration_drift_monitor.hpp
struct CalibrationDriftMonitor {
// For each LiDAR pair with overlap, track mutual information
struct LidarPair {
int lidar_a;
int lidar_b;
// Overlap region in vehicle frame (BEV bounding box)
float overlap_x_min, overlap_x_max;
float overlap_y_min, overlap_y_max;
// Reference MI from factory calibration
float ref_mi;
// EWMA of current MI
EWMAMonitor mi_monitor;
// Ground plane consistency
// Well-calibrated sensors agree on ground plane within overlap
float ref_ground_z_diff; // Expected difference (~0)
EWMAMonitor ground_z_monitor;
};
std::vector<LidarPair> pairs;
// Compute MI via 2D binned histogram
// Each bin = (height from sensor A, height from sensor B) for
// points in the overlap region
float compute_overlap_mi(
const std::vector<float>& heights_a, // From LiDAR A
const std::vector<float>& heights_b // From LiDAR B (transformed to A frame)
) {
static constexpr int MI_BINS = 20;
static constexpr float Z_MIN = -2.0f, Z_MAX = 5.0f;
static constexpr float BIN_SIZE = (Z_MAX - Z_MIN) / MI_BINS;
// Joint histogram
std::array<std::array<int, MI_BINS>, MI_BINS> joint{};
int total = 0;
// Simple approach: voxelize overlap region, compare z-values
// in matching voxels between sensor A and sensor B
size_t n = std::min(heights_a.size(), heights_b.size());
for (size_t i = 0; i < n; i++) {
int ba = static_cast<int>((heights_a[i] - Z_MIN) / BIN_SIZE);
int bb = static_cast<int>((heights_b[i] - Z_MIN) / BIN_SIZE);
if (ba >= 0 && ba < MI_BINS && bb >= 0 && bb < MI_BINS) {
joint[ba][bb]++;
total++;
}
}
if (total < 100) return 0.0f;
// Marginals
std::array<int, MI_BINS> margin_a{}, margin_b{};
for (int i = 0; i < MI_BINS; i++) {
for (int j = 0; j < MI_BINS; j++) {
margin_a[i] += joint[i][j];
margin_b[j] += joint[i][j];
}
}
// MI = sum p(a,b) * log(p(a,b) / (p(a)*p(b)))
float mi = 0.0f;
float inv_total = 1.0f / total;
for (int i = 0; i < MI_BINS; i++) {
for (int j = 0; j < MI_BINS; j++) {
if (joint[i][j] == 0) continue;
float pab = joint[i][j] * inv_total;
float pa = margin_a[i] * inv_total;
float pb = margin_b[j] * inv_total;
mi += pab * std::log(pab / (pa * pb + 1e-10f) + 1e-10f);
}
}
return mi;
}
struct CalibrationHealth {
float overall_score; // [0, 1]
int degraded_pairs; // Number of pairs with low MI
std::string worst_pair; // "lidar_2_3"
bool recalibration_needed;
};
CalibrationHealth evaluate() {
float min_score = 1.0f;
int degraded = 0;
std::string worst = "";
for (auto& pair : pairs) {
float ratio = pair.mi_monitor.ewma_value /
(pair.ref_mi + 1e-6f);
float score = std::min(1.0f, ratio);
if (score < min_score) {
min_score = score;
worst = "lidar_" + std::to_string(pair.lidar_a) +
"_" + std::to_string(pair.lidar_b);
}
if (score < 0.7f) degraded++;
}
return {
min_score,
degraded,
worst,
min_score < 0.5f || degraded >= 2
};
}
};9.3 Ground Plane Consistency
A simpler but effective calibration check: fit a ground plane to each LiDAR's points independently and compare. Well-calibrated sensors produce ground planes that agree within 1-2 cm.
struct GroundPlaneConsistency {
// Fit ground plane z = ax + by + c per LiDAR
// Compare coefficients across sensors
struct PlaneParams {
float a, b, c; // z = ax + by + c
};
float compare_planes(const PlaneParams& p1, const PlaneParams& p2) {
// Angular difference between plane normals
// Normal of z = ax + by + c is (-a, -b, 1) / ||(-a, -b, 1)||
float dot = (-p1.a * -p2.a + -p1.b * -p2.b + 1.0f);
float n1 = std::sqrt(p1.a * p1.a + p1.b * p1.b + 1.0f);
float n2 = std::sqrt(p2.a * p2.a + p2.b * p2.b + 1.0f);
float cos_angle = dot / (n1 * n2);
float angle_deg = std::acos(std::min(1.0f, cos_angle)) * 57.2958f;
// Height offset at ego position (x=0, y=0)
float z_diff = std::abs(p1.c - p2.c);
// Score: good if angle < 0.5 deg and z_diff < 0.02m
float angle_score = std::max(0.0f, 1.0f - angle_deg / 1.0f);
float z_score = std::max(0.0f, 1.0f - z_diff / 0.05f);
return 0.5f * angle_score + 0.5f * z_score;
}
};9.4 Static Object Position Consistency
Track known static objects (light poles, buildings, parked aircraft) across multiple LiDARs. If the same static object's position disagrees between sensors, calibration has drifted.
struct StaticObjectConsistency {
// Compare position of static landmarks across LiDARs
// A static object detected by LiDAR A at (x_a, y_a) and by
// LiDAR B at (x_b, y_b) should satisfy ||(x_a, y_a) - (x_b, y_b)|| < epsilon
float max_acceptable_error = 0.10f; // 10 cm for well-calibrated
float warning_threshold = 0.15f; // 15 cm = early warning
float alarm_threshold = 0.25f; // 25 cm = calibration drift confirmed
EWMAMonitor static_error_monitor{0.05f, 0.03f, 0.05f};
float update(const std::vector<float>& cross_sensor_errors) {
if (cross_sensor_errors.empty()) return 0.0f;
float mean_error = 0.0f;
for (float e : cross_sensor_errors) mean_error += e;
mean_error /= cross_sensor_errors.size();
static_error_monitor.update(mean_error);
// Return health score
if (mean_error < max_acceptable_error) return 1.0f;
if (mean_error < warning_threshold) return 0.7f;
if (mean_error < alarm_threshold) return 0.4f;
return 0.1f;
}
};10. Temporal Anomaly Detection
10.1 Rationale
Some perception degradation modes are invisible at the per-frame level but become apparent over longer time horizons: gradual decrease in effective detection range, slowly increasing false positive rate, seasonal performance changes. These require statistical monitoring over minutes, hours, and days.
10.2 Multi-Timescale Monitoring Architecture
Per-Frame (10 Hz) Per-Minute (0.017 Hz) Per-Hour (0.00028 Hz)
┌───────────────┐ ┌───────────────┐ ┌──────────────────┐
│ Raw monitor │ ──────► │ 1-min stats │ ──────► │ Hourly trends │
│ values │ buffer │ (mean, var, │ buffer │ (moving avg, │
│ │ 600 │ min, max, │ 60 min │ trend slope, │
│ │ samples │ percentiles) │ entries │ change points) │
└───────────────┘ └───────┬───────┘ └────────┬─────────┘
│ │
▼ ▼
CUSUM on 1-min Linear regression
aggregates on hourly values
(detects drift (detects seasonal
over minutes) trends over hours)10.3 Minute-Scale Anomaly Detection
// temporal_anomaly.hpp
struct MinuteStatsBuffer {
static constexpr int SAMPLES_PER_MINUTE = 600; // 10 Hz * 60s
static constexpr int BUFFER_MINUTES = 60; // 1 hour of history
struct MinuteStats {
float mean;
float variance;
float min_val;
float max_val;
float p10, p50, p90; // Percentiles
int count;
};
std::array<MinuteStats, BUFFER_MINUTES> history{};
int write_idx = 0;
int num_valid = 0;
// Running accumulator for current minute
std::vector<float> current_minute_samples;
void add_sample(float value) {
current_minute_samples.push_back(value);
if (current_minute_samples.size() >= SAMPLES_PER_MINUTE) {
flush_minute();
}
}
void flush_minute() {
if (current_minute_samples.empty()) return;
int n = current_minute_samples.size();
std::sort(current_minute_samples.begin(),
current_minute_samples.end());
MinuteStats stats;
stats.count = n;
stats.min_val = current_minute_samples.front();
stats.max_val = current_minute_samples.back();
stats.p10 = current_minute_samples[n / 10];
stats.p50 = current_minute_samples[n / 2];
stats.p90 = current_minute_samples[n * 9 / 10];
float sum = 0.0f, sum_sq = 0.0f;
for (float v : current_minute_samples) {
sum += v;
sum_sq += v * v;
}
stats.mean = sum / n;
stats.variance = sum_sq / n - stats.mean * stats.mean;
history[write_idx] = stats;
write_idx = (write_idx + 1) % BUFFER_MINUTES;
if (num_valid < BUFFER_MINUTES) num_valid++;
current_minute_samples.clear();
}
};
// CUSUM on minute-level statistics to detect drift
struct TemporalDriftDetector {
// Track multiple metrics over time
MinuteStatsBuffer detection_count_buffer;
MinuteStatsBuffer mean_confidence_buffer;
MinuteStatsBuffer effective_range_buffer;
MinuteStatsBuffer cross_modal_agreement_buffer;
// CUSUM on minute means
CUSUMMonitor detection_count_trend{35.0f, 3.0f, 0.5f, 5.0f};
CUSUMMonitor confidence_trend{0.82f, 0.04f, 0.5f, 4.0f};
CUSUMMonitor range_trend{65.0f, 5.0f, 0.5f, 4.0f};
CUSUMMonitor agreement_trend{0.85f, 0.05f, 0.5f, 4.0f};
struct TemporalHealth {
float score; // [0, 1]
bool drift_detected;
std::string drift_type; // Which metric is drifting
float drift_magnitude; // In sigma units
};
TemporalHealth evaluate_minute() {
bool any_drift = false;
float worst_deviation = 0.0f;
std::string worst_metric = "none";
auto check = [&](CUSUMMonitor& mon, MinuteStatsBuffer& buf,
const std::string& name) {
if (buf.num_valid == 0) return;
int latest = (buf.write_idx - 1 + buf.BUFFER_MINUTES) %
buf.BUFFER_MINUTES;
auto result = mon.update(buf.history[latest].mean);
if (result.alarm) any_drift = true;
if (std::abs(result.deviation) > worst_deviation) {
worst_deviation = std::abs(result.deviation);
worst_metric = name;
}
};
check(detection_count_trend, detection_count_buffer, "detection_count");
check(confidence_trend, mean_confidence_buffer, "confidence");
check(range_trend, effective_range_buffer, "effective_range");
check(agreement_trend, cross_modal_agreement_buffer, "cross_modal");
float score = std::max(0.0f,
1.0f - worst_deviation / 5.0f); // 5 sigma → score 0
return {score, any_drift, worst_metric, worst_deviation};
}
};10.4 Hour-Scale Trend Detection
For gradual degradation over hours (model staleness, seasonal drift), fit a linear trend to hourly aggregated statistics.
struct HourlyTrendDetector {
static constexpr int HOURS_TRACKED = 24;
struct HourlyEntry {
float mean_phs; // Average PHS over the hour
float mean_ood_score;
float mean_range;
int num_samples;
double timestamp;
};
std::array<HourlyEntry, HOURS_TRACKED> hourly_history{};
int num_hours = 0;
struct TrendResult {
float phs_slope; // PHS change per hour (negative = degrading)
float range_slope; // Effective range change per hour
float ood_slope; // OOD score change per hour (positive = degrading)
bool significant_trend; // p-value < 0.05 equivalent
};
TrendResult compute_trends() {
if (num_hours < 4) {
return {0.0f, 0.0f, 0.0f, false};
}
// Simple linear regression: y = a + b*x
// Using accumulated hours as x (0, 1, 2, ...)
int n = std::min(num_hours, HOURS_TRACKED);
float sum_x = 0, sum_y_phs = 0, sum_y_range = 0, sum_y_ood = 0;
float sum_xx = 0, sum_xy_phs = 0, sum_xy_range = 0, sum_xy_ood = 0;
for (int i = 0; i < n; i++) {
float x = static_cast<float>(i);
sum_x += x;
sum_xx += x * x;
sum_y_phs += hourly_history[i].mean_phs;
sum_xy_phs += x * hourly_history[i].mean_phs;
sum_y_range += hourly_history[i].mean_range;
sum_xy_range += x * hourly_history[i].mean_range;
sum_y_ood += hourly_history[i].mean_ood_score;
sum_xy_ood += x * hourly_history[i].mean_ood_score;
}
float denom = n * sum_xx - sum_x * sum_x;
if (std::abs(denom) < 1e-6f) {
return {0.0f, 0.0f, 0.0f, false};
}
float slope_phs = (n * sum_xy_phs - sum_x * sum_y_phs) / denom;
float slope_range = (n * sum_xy_range - sum_x * sum_y_range) / denom;
float slope_ood = (n * sum_xy_ood - sum_x * sum_y_ood) / denom;
// Significance: slope magnitude > 2 * standard error
// Simplified: flag if PHS dropping > 0.02/hour or range > 2m/hour
bool significant = (slope_phs < -0.02f) ||
(slope_range < -2.0f) ||
(slope_ood > 0.02f);
return {slope_phs, slope_range, slope_ood, significant};
}
};10.5 Fleet Comparison
Compare this vehicle's perception health to the fleet average. If one vehicle degrades while others are fine, the problem is vehicle-specific (sensor, calibration). If all vehicles degrade simultaneously, the problem is environmental or model-global.
# fleet_comparison.py -- runs on fleet manager, not on vehicle
class FleetPerceptionComparator:
"""Compare individual vehicle PHS against fleet baseline."""
def __init__(self, fleet_size: int, z_threshold: float = 2.5):
self.fleet_size = fleet_size
self.z_threshold = z_threshold
def identify_outliers(self,
vehicle_phs: dict # {vehicle_id: float}
) -> dict:
"""Find vehicles whose PHS is significantly below fleet average."""
values = list(vehicle_phs.values())
if len(values) < 3:
return {}
fleet_mean = np.mean(values)
fleet_std = max(np.std(values), 0.01)
outliers = {}
for vid, phs in vehicle_phs.items():
z = (fleet_mean - phs) / fleet_std
if z > self.z_threshold:
outliers[vid] = {
'phs': phs,
'fleet_mean': fleet_mean,
'z_score': z,
'likely_cause': 'vehicle_specific' if z > 4.0
else 'possible_vehicle_specific'
}
# Check if fleet-wide degradation
if fleet_mean < 0.6:
for vid in vehicle_phs:
if vid not in outliers:
outliers[vid] = {
'phs': vehicle_phs[vid],
'fleet_mean': fleet_mean,
'z_score': 0.0,
'likely_cause': 'fleet_wide_degradation'
}
return outliers11. Response Actions and Graceful Degradation
11.1 Response Decision Tree
PHS + ODD State ──► Response Decision
│
├── PHS >= 0.7 AND ODD = NORMAL
│ └── Full autonomous operation
│ Speed: 25 km/h, Margins: 1.0x
│ Stack: Neural AC active
│
├── PHS >= 0.5 AND PHS < 0.7 (OR ODD = DEGRADED)
│ └── Degraded autonomous operation
│ Speed: 15-25 km/h (linear interpolation)
│ Margins: 1.0-1.5x
│ Stack: Neural AC with increased CBF margins
│ Logging: Elevated (full rosbag)
│ Fleet: Notify fleet manager
│
├── PHS >= 0.3 AND PHS < 0.5 (OR ODD = RESTRICTED)
│ └── Restricted operation, teleop standby
│ Speed: 8-15 km/h
│ Margins: 1.5-2.5x
│ Stack: Consider switching to Classical BC (Frenet)
│ Teleop: Request operator attention
│ Mission: May abort non-critical missions
│
├── PHS < 0.3 (OR ODD = SUSPENDED)
│ └── Safe stop
│ Speed: Controlled deceleration to 0
│ Stack: Classical BC (Frenet) for stopping maneuver
│ Teleop: Urgent request
│ Maintenance: Auto-generate ticket
│ Fleet: Reassign missions to other vehicles
│
└── SPECIAL CASES (override above):
│
├── Any single monitor score < 0.1
│ └── Immediate RESTRICTED regardless of PHS
│
├── Calibration health < 0.3
│ └── Immediate RESTRICTED + schedule recalibration
│
├── Localization health < 0.2 (GTSAM innovation > 20)
│ └── Immediate safe stop (cannot trust position)
│
└── Cross-modal agreement < 0.3 AND declining
└── Immediate RESTRICTED (fundamental perception failure)11.2 Speed Reduction Curve
The speed limit is a continuous function of PHS, not discrete steps:
float compute_safe_speed(float phs, float odd_speed_limit,
float current_speed) {
// Piecewise linear speed curve based on PHS
float phs_speed;
if (phs >= 0.8f) {
phs_speed = 25.0f; // Full speed
} else if (phs >= 0.5f) {
// Linear from 25 to 15 km/h
phs_speed = 15.0f + (phs - 0.5f) / 0.3f * 10.0f;
} else if (phs >= 0.3f) {
// Linear from 15 to 5 km/h
phs_speed = 5.0f + (phs - 0.3f) / 0.2f * 10.0f;
} else {
phs_speed = 0.0f; // Stop
}
// Take minimum of PHS-derived speed and ODD speed limit
float target = std::min(phs_speed, odd_speed_limit);
// Rate limit: max deceleration 2 m/s^2 for comfort
// (safety stops use higher deceleration)
float max_decel_per_frame = 2.0f * 0.1f * 3.6f; // m/s^2 * dt * km/h per m/s
float limited = std::max(target, current_speed - max_decel_per_frame);
return std::max(0.0f, limited);
}11.3 Safety Margin Adjustment
Margins scale inversely with PHS:
struct SafetyMarginAdjustment {
// Base margins from Frenet planner configuration
float base_lateral_margin = 1.5f; // meters
float base_longitudinal_margin = 3.0f; // meters
float base_aircraft_margin = 5.0f; // meters (nose/wing)
float base_personnel_margin = 2.5f; // meters
struct AdjustedMargins {
float lateral;
float longitudinal;
float aircraft;
float personnel;
};
AdjustedMargins compute(float phs, float margin_multiplier) {
// Multiplier from ODD state machine (1.0, 1.5, 2.5)
// Additional PHS-proportional scaling
float phs_factor = 1.0f + (1.0f - phs) * 1.0f; // 1.0-2.0x
float total_mult = margin_multiplier * phs_factor;
return {
base_lateral_margin * total_mult,
base_longitudinal_margin * total_mult,
base_aircraft_margin * std::max(total_mult, 1.5f), // Never reduce aircraft margins
base_personnel_margin * std::max(total_mult, 1.0f), // Never reduce personnel margins
};
}
};11.4 Simplex Integration
The perception monitor integrates with the existing Simplex architecture (see 60-safety-validation/runtime-assurance/simplex-safety-architecture.md):
Perception Monitor Output
│
▼
┌─────────────────────┐
│ Simplex Decision │
│ Module │
├─────────────────────┤
│ │
│ IF PHS > 0.5: │───► Neural AC (advanced controller)
│ Use neural stack │ with adjusted speed/margins
│ │
│ IF PHS 0.3-0.5: │───► Classical BC (Frenet planner)
│ Switch to Frenet │ as primary controller
│ fallback │
│ │
│ IF PHS < 0.3: │───► Safe stop via BC
│ Emergency stop │ then request teleop
│ │
│ IF PHS < 0.1: │───► Hardware E-stop
│ Complete failure │ via safety MCU
│ │
└─────────────────────┘The Frenet planner (420 candidates/cycle, Stanley lateral control) does not use neural networks and thus does not degrade with the same failure modes. It is always available as fallback:
// simplex_perception_integration.hpp
enum class ActiveController {
NEURAL_AC, // Normal: neural perception + planning
CLASSICAL_BC, // Fallback: Frenet planner + GTSAM
SAFE_STOP, // Controlled stop using BC
E_STOP, // Hardware emergency stop
};
ActiveController select_controller(float phs_smoothed,
ODDState odd_state,
bool localization_healthy) {
// Localization failure overrides everything
if (!localization_healthy) return ActiveController::SAFE_STOP;
// ODD SUSPENDED always means stop
if (odd_state == ODDState::SUSPENDED) return ActiveController::SAFE_STOP;
// PHS-based selection
if (phs_smoothed >= 0.5f) return ActiveController::NEURAL_AC;
if (phs_smoothed >= 0.3f) return ActiveController::CLASSICAL_BC;
if (phs_smoothed >= 0.1f) return ActiveController::SAFE_STOP;
return ActiveController::E_STOP;
}11.5 Teleop Escalation Protocol
Level 1: Monitoring Notification (PHS < 0.7)
├── Fleet dashboard shows degraded vehicle
├── Operator sees amber indicator
└── No immediate action required
Level 2: Attention Request (PHS < 0.5)
├── Audible alert to assigned operator
├── Vehicle camera feeds displayed prominently
├── Vehicle reduces to restricted speed
└── Operator has 30 seconds to acknowledge
Level 3: Teleop Request (PHS < 0.3 or no ack)
├── Vehicle executes safe stop
├── Priority teleop connection established
├── Operator assumes control
└── Rosbag flagged for review
Level 4: Fleet Escalation (multiple vehicles degraded)
├── Fleet manager alerts shift supervisor
├── Environmental condition suspected
├── Consider fleet-wide speed reduction
└── Engage maintenance team12. Implementation on Orin
12.1 Compute Budget Breakdown
Detailed timing analysis on NVIDIA Orin AGX (275 TOPS, 12-core ARM A78AE, Ampere GPU):
| Monitor | Where | Latency | Memory | Frequency | Notes |
|---|---|---|---|---|---|
| Point density KL | CPU (1 core) | 0.05 ms | 80 KB | 10 Hz | BEV grid 100x100 |
| Intensity Wasserstein | CPU (1 core) | 0.02 ms | 2 KB | 10 Hz | 256-bin histogram |
| Spatial coverage | CPU (1 core) | 0.03 ms | 5 KB | 10 Hz | 36x8 grid |
| BEV feature stats | GPU (piggyback) | 0.20 ms | 0 KB extra | 10 Hz | Fused with backbone |
| Energy OOD | GPU (piggyback) | 0.10 ms | 0 KB extra | 10 Hz | Uses existing logits |
| Mahalanobis OOD | GPU | 0.50 ms | 2 MB | 10 Hz | Precision matrix |
| Detection CUSUM | CPU (1 core) | 0.01 ms | 1 KB | 10 Hz | Per-class CUSUM |
| BBox size EWMA | CPU (1 core) | 0.01 ms | 1 KB | 10 Hz | Per-class EWMA |
| Class distribution chi2 | CPU (1 core) | 0.02 ms | 4 KB | 10 Hz | 100-frame window |
| Track consistency | CPU (1 core) | 0.05 ms | 10 KB | 10 Hz | Per-track stats |
| LiDAR-Radar agreement | CPU (1 core) | 0.30 ms | 20 KB | 10 Hz | Nearest-neighbor |
| Multi-LiDAR overlap | CPU (1 core) | 0.20 ms | 15 KB | 10 Hz | Per-pair MI |
| GTSAM innovation | CPU (1 core) | 0.05 ms | 5 KB | 10 Hz | Reuses GTSAM data |
| Track residuals | CPU (1 core) | 0.05 ms | 5 KB | 10 Hz | Per-track |
| Calibration MI | CPU (1 core) | 0.20 ms | 50 KB | 1 Hz | Every 10th frame |
| Ground plane check | CPU (1 core) | 0.10 ms | 10 KB | 1 Hz | Every 10th frame |
| Temporal CUSUM | CPU (1 core) | 0.01 ms | 500 KB | ~0.17 Hz | Minute buffer |
| Hourly trends | CPU (1 core) | 0.01 ms | 5 KB | ~0.0003 Hz | Hourly |
| PHS aggregation | CPU (1 core) | 0.01 ms | 1 KB | 10 Hz | Weighted geometric |
| ODD state machine | CPU (1 core) | 0.01 ms | 1 KB | 10 Hz | State transitions |
| Total | Mixed | ~2.0 ms typical | ~700 KB | Within 5ms budget |
12.2 CUDA Stream Organization
GPU Execution Timeline (per 100ms cycle):
Stream 0 (Perception):
├─ Voxelization (2.0 ms)
├─ PointPillars backbone (3.5 ms) ──► Features available
├─ CenterPoint head (1.5 ms) ──► Detections available
└─ Total: ~7.0 ms
Stream 1 (Monitoring, concurrent with Stream 0 after features):
├─ BEV feature stats (0.2 ms) [after backbone completes]
├─ Energy OOD (0.1 ms) [after head completes]
├─ Mahalanobis OOD (0.5 ms) [after features available]
└─ Total: ~0.8 ms (overlapped with perception)
CPU threads (concurrent with GPU):
Thread 1: Input monitors (point density, intensity, coverage) -- 0.1 ms
Thread 2: Output monitors (CUSUM, EWMA, chi2, tracks) -- 0.15 ms
Thread 3: Cross-modal (LiDAR-radar, multi-LiDAR, GTSAM) -- 0.6 ms
Thread 4: Aggregation + ODD state machine -- 0.05 msEffective added latency to perception pipeline: ~0.8ms (GPU monitors overlap with perception; CPU monitors run in parallel).
12.3 ROS Nodelet Implementation
// perception_monitor_nodelet.cpp
#include <nodelet/nodelet.h>
#include <pluginlib/class_list_macros.h>
#include <ros/ros.h>
#include <sensor_msgs/PointCloud2.h>
#include <std_msgs/Float32.h>
#include <diagnostic_msgs/DiagnosticArray.h>
#include <thread>
#include <mutex>
// Include all monitors defined in previous sections
#include "perception_monitor/point_density_monitor.hpp"
#include "perception_monitor/intensity_dist_monitor.hpp"
#include "perception_monitor/spatial_coverage_monitor.hpp"
#include "perception_monitor/cusum_monitor.hpp"
#include "perception_monitor/ewma_monitor.hpp"
#include "perception_monitor/cross_modal_consistency.hpp"
#include "perception_monitor/calibration_drift_monitor.hpp"
#include "perception_monitor/odd_state_machine.hpp"
#include "perception_monitor/perception_health_score.hpp"
#include "perception_monitor/temporal_anomaly.hpp"
namespace perception_monitor {
class PerceptionMonitorNodelet : public nodelet::Nodelet {
public:
void onInit() override {
ros::NodeHandle& nh = getNodeHandle();
ros::NodeHandle& pnh = getPrivateNodeHandle();
// Load configuration
loadConfig(pnh);
// Subscribers
sub_pointcloud_ = nh.subscribe(
"/pointcloud_aggregator/output", 1,
&PerceptionMonitorNodelet::pointcloudCallback, this);
sub_detections_ = nh.subscribe(
"/perception/detections", 1,
&PerceptionMonitorNodelet::detectionsCallback, this);
sub_radar_ = nh.subscribe(
"/radar/detections", 1,
&PerceptionMonitorNodelet::radarCallback, this);
sub_gtsam_ = nh.subscribe(
"/gtsam/innovation", 1,
&PerceptionMonitorNodelet::gtsamCallback, this);
sub_tracks_ = nh.subscribe(
"/perception/tracks", 1,
&PerceptionMonitorNodelet::tracksCallback, this);
// Publishers
pub_phs_ = nh.advertise<std_msgs::Float32>(
"/monitor/perception_health", 1);
pub_odd_state_ = nh.advertise<std_msgs::String>(
"/monitor/odd_state", 1, true); // Latched
pub_response_ = nh.advertise<perception_monitor::ResponseAction>(
"/monitor/response_action", 1);
pub_diagnostics_ = nh.advertise<diagnostic_msgs::DiagnosticArray>(
"/monitor/diagnostics", 1);
// 10 Hz timer for periodic evaluation
timer_ = nh.createTimer(ros::Duration(0.1),
&PerceptionMonitorNodelet::timerCallback, this);
NODELET_INFO("Perception monitor initialized");
}
private:
void loadConfig(ros::NodeHandle& pnh) {
// Load reference distributions, thresholds, weights
// from YAML parameter file
pnh.param("kl_warn_threshold",
density_monitor_.kl_warn_threshold, 0.15f);
pnh.param("kl_alert_threshold",
density_monitor_.kl_alert_threshold, 0.35f);
// ... (all parameters)
}
void timerCallback(const ros::TimerEvent&) {
std::lock_guard<std::mutex> lock(mutex_);
// Compute PHS from all monitors
auto phs_result = phs_.compute(
input_health_,
1.0f - ood_score_, // Invert: OOD high = bad, PHS high = good
output_health_,
cross_modal_health_,
track_health_,
calibration_health_,
localization_health_,
temporal_health_
);
// Update ODD state machine
std::map<std::string, ODDState> param_states;
param_states["perception_health"] = phs_to_odd(phs_result.phs_smoothed);
param_states["cross_modal"] = score_to_odd(cross_modal_health_);
param_states["calibration"] = score_to_odd(calibration_health_);
param_states["localization"] = score_to_odd(localization_health_);
auto odd_result = odd_sm_.update(param_states, 0.1f);
// Publish results
publish_phs(phs_result);
publish_odd(odd_result);
publish_response(phs_result, odd_result);
publish_diagnostics(phs_result, odd_result);
}
// Member variables
std::mutex mutex_;
// Monitors
PointDensityMonitor density_monitor_;
IntensityDistMonitor intensity_monitor_;
SpatialCoverageMonitor coverage_monitor_;
DetectionCountMonitor detection_monitor_;
BBoxSizeMonitor bbox_monitor_;
ClassDistributionMonitor class_monitor_;
TrackConsistencyMonitor track_monitor_;
LidarRadarConsistency cross_modal_;
CalibrationDriftMonitor calibration_monitor_;
GTSAMInnovationMonitor gtsam_monitor_;
TemporalDriftDetector temporal_detector_;
// Aggregation
PerceptionHealthScore phs_;
ODDStateMachine odd_sm_;
// Current scores (updated by callbacks)
float input_health_ = 1.0f;
float ood_score_ = 0.0f;
float output_health_ = 1.0f;
float cross_modal_health_ = 1.0f;
float track_health_ = 1.0f;
float calibration_health_ = 1.0f;
float localization_health_ = 1.0f;
float temporal_health_ = 1.0f;
// ROS handles
ros::Subscriber sub_pointcloud_, sub_detections_, sub_radar_;
ros::Subscriber sub_gtsam_, sub_tracks_;
ros::Publisher pub_phs_, pub_odd_state_, pub_response_, pub_diagnostics_;
ros::Timer timer_;
};
} // namespace perception_monitor
PLUGINLIB_EXPORT_CLASS(perception_monitor::PerceptionMonitorNodelet,
nodelet::Nodelet)12.4 Launch File
<!-- perception_monitor.launch -->
<launch>
<!-- Load reference distributions for this airport -->
<arg name="airport" default="heathrow"/>
<arg name="condition" default="auto"/> <!-- auto, day_dry, night_wet, etc. -->
<rosparam command="load"
file="$(find perception_monitor)/config/$(arg airport)_reference.yaml"/>
<rosparam command="load"
file="$(find perception_monitor)/config/thresholds.yaml"/>
<!-- Nodelet manager (share with perception for zero-copy) -->
<node pkg="nodelet" type="nodelet" name="perception_monitor"
args="load perception_monitor/PerceptionMonitorNodelet
/perception_nodelet_manager"
output="screen">
<param name="airport" value="$(arg airport)"/>
<param name="condition" value="$(arg condition)"/>
</node>
<!-- Rosbag logger for monitoring data (lightweight) -->
<node pkg="rosbag" type="record" name="monitor_logger"
args="/monitor/perception_health
/monitor/odd_state
/monitor/response_action
/monitor/diagnostics
-o /data/logs/monitor_
--split --duration=3600">
</node>
</launch>12.5 Power and Thermal Considerations
The monitoring layer adds minimal power draw:
| Component | Power Draw | Notes |
|---|---|---|
| GPU monitors | ~0.5W additional | Marginal work in existing CUDA context |
| CPU monitors (4 threads) | ~1.0W additional | Light computation, mostly idle |
| Memory | Negligible | ~700 KB total |
| Total | ~1.5W | ❤️% of Orin's 50W MAXN budget |
Thermal impact is negligible. The monitoring layer does not contribute meaningfully to GPU or CPU thermal load.
13. Certification and Safety Case
13.1 Standards Mapping
| Standard | Requirement | How This System Addresses It |
|---|---|---|
| ISO 3691-4:2023 Section 4.12 | Sensor performance monitoring | Input distribution + sensor health = continuous performance verification |
| ISO 3691-4:2023 Section 4.8 | Safe state on detection of unsafe condition | ODD state machine SUSPENDED = safe stop |
| ISO 3691-4:2023 Section 4.10 | Speed limitation | PHS-proportional speed reduction |
| UL 4600 Section 10.6 | Monitor ML model behavior | Output consistency CUSUM/EWMA + OOD detection |
| UL 4600 Section 10.3 | Runtime monitoring framework | Complete multi-monitor PHS architecture |
| UL 4600 Section 10.7 | Self-diagnostic reporting | Diagnostic topic + rosbag logging |
| UL 4600 Section 12.2 | Update validation | Temporal anomaly detection catches post-OTA regression |
| EU AI Act Article 9(8) | Monitoring post-deployment | Fleet PHS comparison + temporal trends |
| EU AI Act Article 14 | Human oversight | Teleop escalation protocol |
| EU AI Act Article 15(4) | Accuracy, robustness monitoring | PHS aggregation + ODD enforcement |
| ISO 21448 (SOTIF) | Triggering conditions for functional insufficiency | OOD detection identifies triggering conditions at runtime |
| ISO 26262 ASIL | Independence of monitoring from monitored function | Separate nodelet, separate CUDA stream, independent computation path |
| EU Machinery Reg. 2023/1230 Annex III 1.2.1 | Self-monitoring safety function | PHS = quantitative self-monitoring metric |
13.2 Independence Argument
For the monitoring to be credible as a safety function, it must be independent from the perception it monitors:
| Aspect | Perception Pipeline | Monitoring Layer | Independence |
|---|---|---|---|
| Code | PointPillars C++ / TensorRT | Separate C++ nodelet | Separate codebase, separate build |
| GPU execution | CUDA Stream 0 | CUDA Stream 1 | Stream-level isolation |
| CPU execution | Perception threads | Monitor threads (separate cores) | Core affinity enforced |
| Data flow | Publishes detections | Subscribes to detections | Read-only access to perception |
| Failure mode | GPU hang, model crash | CPU-only fallback viable | CPU monitors continue if GPU fails |
| Development team | Perception engineers | Safety engineers | Organizational independence |
13.3 Evidence Collection
The monitoring system automatically generates certification evidence:
Evidence Type Source Rate
──────────────────────────────────────────────────────────────────
PHS time series /monitor/perception_health 10 Hz
ODD state transitions /monitor/odd_state Event-based
Response actions taken /monitor/response_action Event-based
Monitor component details /monitor/diagnostics 10 Hz
Threshold calibration records Config YAML + commissioning Once per airport
False alarm analysis Offline fleet analytics Monthly
Detection of known degradation Validation test results Per-release
WCET timing analysis Per-monitor profiling Per-release
Monitor coverage analysis Which degradation modes Per-release
are detectable13.4 Safety Integrity Considerations
The perception monitoring layer is part of the safety function chain. Its required integrity level depends on the ASIL decomposition:
Full System Safety Requirement: ASIL B(D)
(from ISO 26262 decomposition, see fail-operational-architecture.md)
Decomposition:
Neural Perception + Planning (AC): ASIL QM (monitored)
Classical Frenet (BC): ASIL B
Safety Monitor (including PHS): ASIL B
Hardware Safety MCU: ASIL D
Perception Monitor contributes to Safety Monitor (ASIL B):
- Systematic capability: SC 2 (ISO 26262 Part 4)
- Random hardware failure: N/A (software-only)
- Diagnostic coverage: >90% (based on Section 1 threat model)
- Latency: <5ms (verified by profiling)
- Availability: 99.99% (monitor runs on CPU even if GPU fails)13.5 Validation Protocol
To validate the monitoring system itself:
Fault injection testing: Inject known degradation conditions (reduce point cloud density, shift intensity distribution, inject wrong bounding boxes, corrupt calibration) and verify the monitoring system detects them within specified latency.
False positive analysis: Run monitoring on extensive nominal driving data (1,000+ hours) and measure false alarm rates. Target: <0.1% false RESTRICTED, <0.01% false SUSPENDED.
Coverage analysis: For each degradation mode in Section 1.1, verify that at least one monitor detects it. Document any residual undetectable modes.
Timing verification: Profile each monitor component on Orin under worst-case load. Verify total <5ms at 99.9th percentile.
# fault_injection_validation.py
class FaultInjectionTest:
"""Systematic validation of perception monitoring."""
test_cases = [
{
'name': 'gradual_point_reduction',
'injection': 'reduce_point_count_by_50pct_over_60s',
'expected_detection': 'input_distribution KL alarm',
'max_detection_latency_s': 15.0,
},
{
'name': 'sudden_intensity_shift',
'injection': 'multiply_intensity_by_0.5',
'expected_detection': 'input intensity Wasserstein alarm',
'max_detection_latency_s': 3.0,
},
{
'name': 'calibration_drift_0.2deg',
'injection': 'rotate_lidar_2_extrinsic_by_0.2_deg',
'expected_detection': 'calibration MI alarm',
'max_detection_latency_s': 30.0,
},
{
'name': 'false_detection_injection',
'injection': 'add_10_random_ghost_detections',
'expected_detection': 'detection_count CUSUM high alarm',
'max_detection_latency_s': 5.0,
},
{
'name': 'class_confusion',
'injection': 'relabel_30pct_gse_as_unknown',
'expected_detection': 'class_distribution chi2 alarm',
'max_detection_latency_s': 10.0,
},
{
'name': 'cross_modal_disagreement',
'injection': 'shift_radar_detections_by_3m',
'expected_detection': 'cross_modal agreement < 0.5',
'max_detection_latency_s': 3.0,
},
{
'name': 'confidence_score_inflation',
'injection': 'add_0.15_to_all_confidences',
'expected_detection': 'confidence EWMA alarm',
'max_detection_latency_s': 20.0,
},
{
'name': 'track_fragmentation',
'injection': 'drop_every_3rd_detection_frame',
'expected_detection': 'track_consistency fragmentation alarm',
'max_detection_latency_s': 10.0,
},
]
def run_all(self, rosbag_path: str) -> dict:
"""Run all fault injection tests on recorded rosbag."""
results = {}
for tc in self.test_cases:
detection_time = self.run_single(rosbag_path, tc)
passed = detection_time <= tc['max_detection_latency_s']
results[tc['name']] = {
'passed': passed,
'detection_time_s': detection_time,
'max_allowed_s': tc['max_detection_latency_s'],
}
return results14. Implementation Roadmap
14.1 Phased Deployment
| Phase | Duration | Cost | Deliverables | PHS Coverage |
|---|---|---|---|---|
| Phase 1: Core Monitors | 4 weeks | $8-12K | Input distribution (KL, Wasserstein), output CUSUM/EWMA, PHS aggregation, ODD state machine | 50% of degradation modes |
| Phase 2: Cross-Modal | 3 weeks | $7-10K | LiDAR-radar agreement, multi-LiDAR overlap, GTSAM innovation monitor | 70% of degradation modes |
| Phase 3: OOD + Calibration | 3 weeks | $8-12K | Energy OOD, Mahalanobis OOD, calibration MI, ground plane consistency | 85% of degradation modes |
| Phase 4: Temporal + Fleet | 2 weeks | $5-8K | Minute/hour temporal analysis, fleet comparison, hourly trend detector | 90%+ of degradation modes |
| Phase 5: Validation | 3 weeks | $7-10K | Fault injection suite, false alarm analysis, timing verification, FMEA | Validation evidence |
| Total | 15 weeks | $35-52K | Complete monitoring system | 90%+ coverage |
14.2 Prerequisites
- PointPillars backbone must expose intermediate BEV features via shared memory or ROS topic (for feature-space monitors)
- Continental ARS548 radar integration (for cross-modal checks; if radar not yet installed, Phase 2 is LiDAR-only)
- GTSAM must publish innovation/residual data (minor modification to existing localization node)
- Commissioning data collection process (2 weeks of nominal operation at target airport)
14.3 Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| High false alarm rate | Medium | Operators distrust the system | Extensive commissioning calibration; EWMA smoothing; configurable thresholds |
| Undetectable degradation mode | Low | Silent failure | Defense-in-depth: multiple independent monitors; fault injection testing |
| Compute budget overrun | Low | Monitoring slows perception | Profiling-driven development; monitors are individually budget-capped |
| Reference distribution staleness | Medium | Thresholds become loose/tight | Monthly reference update from fleet data; automated recalibration |
| GPU failure kills GPU-side monitors | Low | Loss of OOD detection | CPU-only fallback: input + output + cross-modal monitors still functional |
15. Key Takeaways
Perception can fail silently while sensors are healthy: Domain shift, model staleness, adversarial natural conditions, and novel object types cause neural network degradation that hardware diagnostics cannot detect. This is the gap between sensor health monitoring and runtime verification.
Input distribution monitoring catches problems before outputs degrade: KL divergence on point density histograms (0.05ms), Wasserstein distance on intensity distributions (0.02ms), and BEV feature statistics (0.2ms GPU) detect when inputs have shifted from the training distribution before the model starts producing errors.
Output consistency checking is ground-truth-free: CUSUM on detection counts, EWMA on bounding box sizes, chi-squared on class distributions, and track fragmentation rates detect anomalous output patterns without needing labels. The CUSUM detector is particularly effective for the "sudden drop in detections" failure mode that is most dangerous.
Cross-modal consistency is the most powerful monitor: Agreement between independently derived perception channels (LiDAR vs radar, multi-LiDAR overlap, tracker predictions vs observations) provides a proxy for correctness that requires no reference distribution and catches novel failure modes.
The Perception Health Score (PHS) aggregates everything into one actionable number: Weighted geometric mean of 8 component monitors, ranging from 0 (no confidence) to 1 (full confidence). Drives continuous speed reduction curve, margin adjustment, mode transitions, and teleop escalation.
The ODD state machine provides hysteresis: Fast degradation (immediate), slow recovery (30-120 seconds of sustained improvement). Prevents oscillation at boundaries. SUSPENDED state requires human acknowledgment.
Complete monitoring fits in <5ms and 700 KB on Orin: GPU monitors piggyback on existing perception (0.8ms additional GPU in separate stream), CPU monitors run concurrently on spare cores (1.2ms across 4 threads). Total effective added latency to the perception pipeline is <1ms.
Simplex integration is natural: PHS > 0.5 = neural AC, PHS 0.3-0.5 = classical BC (Frenet), PHS < 0.3 = safe stop, PHS < 0.1 = hardware E-stop. The Frenet planner is always available as fallback because it does not use neural networks.
Temporal monitoring catches gradual degradation: Per-minute CUSUM on aggregated statistics detects drift over minutes to hours. Hourly trend analysis catches seasonal performance changes. Fleet comparison distinguishes vehicle-specific from environmental degradation.
Calibration drift detection prevents a common hidden failure: Mutual information between overlapping LiDAR FOVs, ground plane consistency, and static object position consistency detect extrinsic calibration drift before it causes perception errors. A 0.1 degree drift creates 17 cm error at 100m.
Implementation cost $35-52K over 15 weeks: Phased from core monitors (4 weeks, $8-12K) through validation (3 weeks, $7-10K). No additional hardware required. Phase 1 alone provides 50% degradation mode coverage.
Certification evidence is generated automatically: Every PHS computation, ODD state transition, and response action is logged at 10 Hz. Fault injection validation suite provides systematic coverage evidence. Standards mapping covers ISO 3691-4, UL 4600, EU AI Act, and SOTIF.
16. References
Academic
- Liu, W. et al. (2020). Energy-based Out-of-Distribution Detection. NeurIPS 2020. [Energy OOD scoring]
- Lee, K. et al. (2018). A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks. NeurIPS 2018. [Mahalanobis distance OOD]
- Page, E.S. (1954). Continuous Inspection Schemes. Biometrika. [CUSUM algorithm]
- Roberts, S.W. (1959). Control Chart Tests Based on Geometric Moving Averages. Technometrics. [EWMA algorithm]
- Vovk, V. et al. (2005). Algorithmic Learning in a Random World. Springer. [Conformal prediction foundations]
- Gretton, A. et al. (2012). A Kernel Two-Sample Test. JMLR. [MMD for distribution comparison]
- Hendrycks, D. and Gimpel, K. (2017). A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. ICLR 2017.
- Sun, Y. et al. (2022). ReAct: Out-of-Distribution Detection with Rectified Activations. NeurIPS 2022.
- Angelopoulos, A. and Bates, S. (2022). A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. Foundations and Trends in Machine Learning.
- Yang, J. et al. (2024). Generalized Out-of-Distribution Detection: A Survey. International Journal of Computer Vision.
Standards
- ISO 3691-4:2023. Industrial trucks -- Safety requirements and verification -- Part 4: Driverless industrial trucks and their systems.
- UL 4600:2022. Standard for Evaluation of Autonomous Products.
- ISO 21448:2022. Road vehicles -- Safety of the intended functionality.
- ISO 26262:2018. Road vehicles -- Functional safety.
- EU AI Act (Regulation 2024/1689).
- EU Machinery Regulation 2023/1230.
Repository Cross-References
20-av-platform/sensors/sensor-degradation-health-monitoring.md-- Physical sensor health diagnostics60-safety-validation/runtime-assurance/runtime-verification-monitoring.md-- STL temporal logic monitors, shield synthesis60-safety-validation/runtime-assurance/weather-adaptive-odd-management.md-- Environmental ODD state machine, METAR integration60-safety-validation/runtime-assurance/simplex-safety-architecture.md-- Dual-stack AC/BC architecture60-safety-validation/runtime-assurance/fail-operational-architecture.md-- ASIL decomposition, redundancy patterns30-autonomy-stack/perception/overview/uncertainty-quantification-calibration.md-- Epistemic/aleatoric decomposition, conformal prediction20-av-platform/sensors/multi-lidar-calibration.md-- Calibration procedures, thermal drift compensation30-autonomy-stack/localization-mapping/overview/mapping-and-localization.md-- GTSAM factor graph, innovation sequences20-av-platform/sensors/4d-radar.md-- Continental ARS548 integration60-safety-validation/standards-certification/functional-safety-software.md-- MISRA C, ISO 26262 Part 6 development60-safety-validation/verification-validation/airside-scenario-taxonomy.md-- ISO 34502 scenario classification50-cloud-fleet/observability/fleet-anomaly-root-cause-attribution.md-- Fleet-level anomaly attribution30-autonomy-stack/localization-mapping/maps/realtime-occupancy-grid-mapping.md-- Occupancy grid monitoring context60-safety-validation/verification-validation/testing-validation-methodology.md-- Validation protocol, fault injection