Skip to content

Vehicle Thermal Management

Last updated: 2026-05-09

Why It Matters

Thermal management is an autonomy availability and safety issue, not a comfort feature. Compute throttling changes perception latency, hot SSDs drop write performance, cold sensors drift or fog, LiDAR windows ice over, and sealed electronics can fail from condensation after washdown or de-icing exposure.

AV platforms should treat heat as a first-class resource alongside power, network bandwidth, and compute. The thermal system needs a budget, sensors, derating ladder, service procedure, and evidence that worst-case ODD conditions do not silently invalidate timing, recording, or perception assumptions.

Architecture Decisions

DecisionPractical rule
Thermal zoningSeparate compute/recorder cooling, sensor-window conditioning, battery/HV cooling, cabin/service electronics, and environmental enclosure control.
Cooling methodUse passive or fanless conduction for low-power sealed controllers, forced air only where filtered service access is acceptable, and liquid cooling for dense GPU/recorder loads.
DeratingTie thermal state to ODD restrictions: reduce model rate, disable nonessential logging, shed heaters, reduce speed, or safe stop before watchdog budgets are violated.
CondensationMonitor humidity and dew point inside sealed enclosures; use venting, heaters, conformal coating, drains, and warm-up rules where required.
Sensor windowsHeat and clean optical surfaces based on visibility need, not a static timer. Thermal camera, visible camera, LiDAR, and radar covers have different failure signatures.
StorageKeep NVMe drives inside a temperature and airflow/liquid-cooling envelope that preserves sustained write during event capture.
ServiceMake coolant, fan, filter, pump, and heat-spreader maintenance visible in diagnostics and fleet scheduling.

Thermal control should publish a typed state used by autonomy and fleet operations:

NORMAL -> WATCH -> DERATED -> SAFE_STOP -> SERVICE_REQUIRED

Inputs:
SoC temp, GPU clocks, SSD temp, coolant inlet/outlet, pump tach,
fan tach, enclosure humidity, branch current, sensor-window temp,
ambient temp, solar load estimate, and heater state.

Evidence Artifacts

  • Thermal budget for every heat source: compute, network switches, recorders, sensors, heaters, pumps, DC/DC converters, chargers, and sealed enclosures.
  • Cooling architecture drawing with heat paths, cold plates, TIM stackups, airflow, coolant lines, pumps, radiators, filters, drains, and leak sensors.
  • Hot-soak and cold-start test reports for the full mission profile, including idle, low-speed operation, charging, logging burst, and sensor cleaning.
  • SoC, GPU, DLA, SSD, and network-switch telemetry traces under maximum perception, planning, and recording load.
  • Derating ladder validation proving warnings occur before throttling breaks control or logging latency budgets.
  • Condensation and washdown test evidence for sealed compute and sensor enclosures.
  • Maintenance evidence: filter interval, coolant service interval, pump/fan lifetime, and leak-detection response.

Acceptance Checks

  • Worst-case ODD thermal profile runs without uncontrolled compute throttling or recorder write collapse.
  • Any thermal derate produces an explicit autonomy state change and fleet alert.
  • Safe-stop timing remains valid during hot soak, cold start, and power-limited operation.
  • Sensor-window heaters can clear expected fog, frost, ice, or water film within the approved ODD startup time.
  • A pump, fan, or coolant-flow fault is detected before temperatures exceed the approved operating envelope.
  • Condensation inside sealed electronics is either prevented or detected before power-up into a high-risk state.
  • Thermal service tasks are tied to logged counters, not only calendar time.

Failure Modes

Failure modeDetectionSafe response
Compute thermal throttlingSoC temp, clock-rate drop, over-current or TDP throttle flagReduce nonessential workloads and speed; safe stop if timing budget is at risk.
SSD write cliffDrive temp, SMART data, write latency, recorder queue growthReduce bulk logging, preserve DSSAD/event stream, alert fleet.
Pump or coolant failureFlow sensor, pump tach, inlet/outlet delta, leak sensorDerate compute, stop mission if cooling reserve is insufficient.
Fan or filter blockageFan tach, pressure/airflow proxy, rising enclosure tempShed load, schedule service, avoid dusty/wet ODD if coverage depends on it.
CondensationHumidity/dew point, enclosure temp crossing, leakage currentDelay startup, warm enclosure, block mission if safety electronics are affected.
Sensor heater stuck onBranch current high, local temperature highIsolate heater, restrict adverse-weather ODD, inspect window and branch wiring.
Sensor heater stuck offWindow temp low, perception quality degradation, current lowRestrict fog/ice/rain ODD and route to service.
Solar load underestimationAmbient/enclosure divergence, repeated derates during idleUpdate ODD thermal model and add shielding or cooling margin.

Sources

Public research notes collected from public sources.