Skip to content

Kodiak AI -- Non-ML and Hybrid-ML Perception Techniques

Exhaustive Technical Analysis of Classical Computer Vision, Signal Processing, and Hybrid Methods in the Kodiak Driver Perception Stack

March 2026


Table of Contents

Sensor Signal Processing

  1. Luminar Iris LiDAR Signal Processing
  2. Hesai 360-Degree LiDAR Signal Processing
  3. ZF 4D Imaging Radar Signal Processing
  4. Camera ISP Pipeline

Calibration

  1. SensorPod Calibration Architecture
  2. LiDAR-Camera Cross-Calibration
  3. Radar Calibration
  4. Online Calibration

Lightweight Map Classical Processing

  1. Sparse Map Generation
  2. Map-Disagreement Handling
  3. Real-Time Road Model
  4. Lane Detection

Highway-Specific Classical Methods

  1. Long-Range Processing
  2. Highway Merge Detection
  3. Construction Zone Processing
  4. Cut-In Detection

State Estimation and Tracking

  1. Highway-Speed Tracking
  2. Data Association
  3. Track Prediction
  4. Multi-Sensor Track Fusion

Dual Compute Classical Workload

  1. NVIDIA DRIVE Orin Classical Processing
  2. Ambarella CV3-AD685 Classical Processing
  3. NXP ACE Safety Computer

Military/Defense Perception

  1. Off-Road Perception
  2. Threat Detection
  3. GPS-Denied Navigation
  4. Degraded Sensor Operation

Safety Classical Components

  1. 1,000+ Safety Checks at 10 Hz
  2. ACE Fallback System
  3. Geometric Collision Checking

1. Luminar Iris LiDAR Signal Processing

The 1550nm InGaAs Detector Signal Chain

The Luminar Iris employs a 1550nm fiber laser source paired with an Indium Gallium Arsenide (InGaAs) photodetector -- a compound semiconductor material with a bandgap specifically tuned to absorb photons in the short-wave infrared (SWIR) range around 1550nm. The signal chain from laser emission to point cloud output is entirely classical signal processing:

Laser Emission: Two fiber laser tips emit at 1550nm. The 1550nm wavelength is eye-safe at 1,000,000x the pulse energy of 905nm systems, because the human cornea absorbs 1550nm light before it reaches the retina. This allows dramatically higher laser power, which directly determines maximum detection range.

Scanning: A 2-axis scanning mirror system steers the laser beam across a 120-degree horizontal by up to 28-degree vertical field of view. Critically, only the laser beam is scanned -- not the entire device -- allowing software-configurable scan patterns including both uniform and Gaussian density distributions. The scan pattern can be dynamically adjusted: denser at the horizon (where most objects of interest exist) and sparser at the top and bottom of the FOV.

Receiver and ASIC Processing: The returned photons strike the InGaAs photodetector, which converts them to electrical current via the photoelectric effect. This analog signal feeds into Luminar's custom-designed mixed-signal ASIC -- described by Luminar as "the most sensitive, highest dynamic range InGaAs receiver of its kind in the world." The ASIC performs:

Processing StageClassical MethodPurpose
Transimpedance amplificationAnalog current-to-voltage conversionConvert photodetector current to measurable voltage
Analog filteringBandpass filteringReject out-of-band noise
Time-of-flight measurementThreshold detection / waveform analysisDetermine round-trip time to compute range
Multi-return detectionPeak detection on return waveformIdentify multiple reflecting surfaces per pulse
Reflectivity extractionAmplitude measurementMeasure returned energy to compute per-point reflectivity
Noise rejectionSignal-to-noise thresholdingReject returns below detection threshold

The ASIC acquires and processes gigabits of data per second, converting raw analog returns into digital point measurements. Luminar specifically chose a custom mixed-signal ASIC over off-the-shelf ADC chips for "better performance for significantly less cost."

Multi-Return Processing for Semi-Transparent Objects

Multi-return capability is a purely classical signal processing technique where the receiver analyzes the full return waveform from a single laser pulse rather than just the first or strongest peak. When a pulse hits a semi-transparent object (rain, fog, dust, vegetation, glass, chain-link fencing), part of the energy reflects back while the remaining energy continues to a solid surface behind it. The ASIC detects multiple peaks in the return waveform:

  • First return: The nearest reflecting surface (e.g., raindrops, fog particles, tree canopy)
  • Strongest return: The surface reflecting the most energy (typically the primary target)
  • Last return: The most distant reflecting surface (e.g., the ground behind vegetation)

This is critical for highway trucking perception because:

  • Rain and fog produce many first-returns from water droplets; multi-return processing lets the system "see through" precipitation to the solid objects behind
  • Semi-transparent objects like chain-link fences, highway guardrails with gaps, and construction netting are resolved as partial occluders rather than solid walls
  • Dust clouds in the Permian Basin oilfield operations scatter the laser; multi-return processing recovers the target behind the dust

Per-Point Reflectivity Processing

Beyond (x, y, z) position, each point carries a reflectivity value -- a measurement of how much laser energy was reflected back to the detector. This is classical radiometric processing: the ASIC measures the amplitude of the return pulse relative to the transmitted pulse energy, accounting for the inverse-square range attenuation. The result is a camera-like reflectance image with >300 points per square degree resolution.

Reflectivity provides texture-like information that aids classical and ML-based classification:

  • Retroreflective road signs and lane markings appear very bright
  • Dark asphalt and dark-colored vehicles appear dim
  • Rubber tire treads on the road have different reflectivity than asphalt
  • The reflectivity contrast between lane markings and road surface enables classical lane detection from LiDAR data alone

600m Range Processing

At maximum range (600m), the number of returned photons drops dramatically due to inverse-square law attenuation. The ASIC must perform extremely sensitive signal detection at the single-photon level. Classical techniques applied at long range include:

  • Matched filtering: Correlating the return waveform with the known transmitted pulse shape to maximize SNR
  • Temporal averaging: Accumulating returns from multiple pulses at the same scan angle to build up signal
  • Adaptive thresholding: Lowering detection thresholds at long range (accepting higher false alarm rate) and relying on temporal consistency (multiple scans confirming the same return) to validate detections

At 600m, a sedan may produce only a handful of LiDAR points. The signal processing chain must reliably distinguish these sparse, weak returns from noise floor fluctuations.


2. Hesai 360-Degree LiDAR Signal Processing

Mechanical Spinning LiDAR Signal Chain

The Hesai LiDARs (likely OT128 or similar 128-channel automotive-grade units) on the Kodiak trucks use a fundamentally different architecture from the Luminar Iris. These are spinning mechanical LiDARs: the entire optical assembly rotates 360 degrees to provide surround coverage.

Signal Processing Pipeline:

  1. Laser firing sequence: 128 laser diodes fire in a precisely timed sequence as the optical assembly rotates. Each channel has a fixed vertical angle offset, collectively spanning 40 degrees vertically. The firing timing is synchronized to the rotation encoder, ensuring consistent angular sampling.

  2. Time-of-flight measurement: Each channel has a dedicated receiver that measures the round-trip time of the laser pulse. At 905nm (Hesai's standard wavelength), silicon APD (Avalanche Photodiode) detectors convert photons to electrical signal.

  3. Dual-return processing: The Hesai OT128 supports dual-return mode, effectively doubling point cloud density from 3.45 million points/second (single return) to 6.91 million points/second (dual return). This is the same classical multi-return waveform analysis as described for Luminar, detecting multiple peaks in each return pulse.

  4. Point cloud assembly: Raw timing and angle measurements are converted to Cartesian (x, y, z) coordinates using the calibrated angular offsets and measured range. This is classical coordinate transformation: each point's position is computed from (range, azimuth_angle, elevation_angle) in the sensor frame.

Hesai Intelligent Point Cloud Engine (IPE)

Hesai's proprietary IPE is a significant classical signal processing subsystem embedded in the LiDAR sensor itself. The IPE operates at the firmware level, processing 24.6 billion samples per second, and performs:

IPE FunctionClassical MethodPerformance
Rain detectionPer-return waveform classificationReal-time pixel-level flagging
Fog detectionReturn pattern analysis (diffuse vs. specular)Pixel-level classification
Exhaust fume detectionMulti-return temporal analysisReal-time marking
Water splash detectionSpatial clustering of transient returnsPer-frame flagging
Environmental noise filteringStatistical outlier rejection>99.9% noise rejection in adverse weather

The IPE decodes laser return waveforms with nanosecond-level precision to distinguish environmental noise (rain, fog, dust) from actual solid objects. This is critical for Kodiak's trucking operations because:

  • Following another truck generates constant road spray in wet conditions
  • Permian Basin dust is persistent and dense
  • The IPE filters these before the data reaches Kodiak's software stack, reducing computational load

The noise filtering is entirely classical: statistical analysis of return waveform characteristics (pulse width, amplitude, temporal profile) to classify each return as "solid object" vs. "environmental noise." Fog returns have characteristically broader, weaker waveforms than solid object returns. Rain returns appear as randomly distributed, transient single-frame detections that do not persist across consecutive scans.

Environmental Validation

The Hesai OT128 undergoes classical reliability testing:

  • UV aging
  • IP6K7 and IP6K9K dust/water ingress ratings
  • Mechanical shock and vibration (critical for truck-mounted operation)
  • Thermal cycling with water splash
  • Salt spray exposure
  • Low-temperature cold start

These tests validate that the classical signal processing chain remains calibrated and functional across the full operating envelope of highway trucking.


3. ZF 4D Imaging Radar Signal Processing

FMCW Signal Generation and Reception

The ZF Full Range 4D Radar operates at 77 GHz using Frequency Modulated Continuous Wave (FMCW) modulation. The entire radar signal processing chain from RF emission to point cloud output is classical signal processing -- no ML is involved until the radar detections enter Kodiak's perception software.

Chirp Generation: The radar transmits a "chirp" -- a signal whose frequency increases linearly over time (a ramp). Each chirp sweeps across a bandwidth of several GHz (typically 1-4 GHz for 77 GHz automotive radar), with the sweep rate and bandwidth determining range resolution:

Range resolution = c / (2 * bandwidth)

For a 4 GHz bandwidth: range resolution = 3.75 cm.

Beat Signal Processing: When the transmitted chirp reflects off an object and returns, it is mixed with the currently transmitting chirp. Because the transmit frequency has changed during the round-trip time, the mixed signal produces a "beat frequency" that is directly proportional to the object's range:

f_beat = (2 * range * sweep_rate) / c

This mixing is an entirely analog, classical operation performed in the radar's RF front-end.

MIMO Virtual Array Processing

The ZF radar achieves its 192-channel count through cascaded MIMO (Multiple-Input Multiple-Output) operation using 4 monolithic microwave integrated circuits (MMICs), each containing multiple transmit (TX) and receive (RX) channels:

ParameterValue
MMICs4 cascaded
TX channels per MMIC3
RX channels per MMIC4
Physical channels12 TX + 16 RX = 28
Virtual channels12 x 16 = 192
Resolution improvement16x over conventional 12-channel radar

Virtual Array Formation: MIMO radar transmits orthogonal waveforms from each TX antenna (using time-division, frequency-division, or Doppler-division multiplexing) and receives on all RX antennas simultaneously. Each TX-RX pair creates a "virtual" antenna element. With 12 TX and 16 RX antennas, the virtual array has 192 elements arranged in a 2D grid that provides angular resolution in both azimuth and elevation.

Cascaded MMIC Synchronization: The master MMIC generates the local oscillator (LO) signal and distributes it to the three slave MMICs via a daisy-chain approach. This ensures phase coherence across all 192 virtual channels, which is essential for accurate beamforming. Any phase mismatch between MMICs would distort the angular measurements.

Digital Beamforming

After the beat signals from all 192 virtual channels are digitized, classical digital beamforming extracts angular information:

  1. Range FFT: A Fast Fourier Transform converts each channel's time-domain beat signal into the frequency domain, where each frequency bin corresponds to a specific range. This produces a range profile for each virtual channel.

  2. Doppler FFT: A second FFT across consecutive chirps within a frame extracts velocity (Doppler) information. The phase change between consecutive chirps from the same target encodes the target's radial velocity:

v_radial = (lambda * f_doppler) / 2

At 77 GHz (lambda = 3.9mm), even small velocity differences produce measurable Doppler shifts.

  1. Range-Doppler Map: The 2D FFT produces a Range-Doppler map for each virtual channel -- a 2D matrix where each cell contains the complex amplitude of any target at that (range, velocity) combination.

  2. Angle FFT / DOA Estimation: A third FFT (or more sophisticated Direction-of-Arrival algorithm) across the virtual array elements extracts azimuth and elevation angles. The phase difference between virtual array elements encodes the angle of arrival:

theta = arcsin(lambda * phase_difference / (2 * pi * d))

where d is the virtual array element spacing.

For the 2D virtual array, this produces both azimuth and elevation angles -- the fourth dimension that distinguishes 4D imaging radar from conventional 3D radar. More sophisticated super-resolution DOA algorithms (MUSIC, ESPRIT, Capon beamforming) may be used to exceed the diffraction-limited angular resolution of the physical aperture.

CFAR Detection

After beamforming, the Range-Doppler-Angle data cube contains energy at each (range, velocity, azimuth, elevation) cell. The Constant False Alarm Rate (CFAR) detector determines which cells contain actual targets versus noise:

CFAR Algorithm: For each cell under test (CUT), the algorithm estimates the local noise power from surrounding reference cells and sets a detection threshold as a multiple of that noise estimate:

threshold = alpha * noise_estimate

where alpha is chosen to maintain a desired false alarm probability.

Common CFAR variants used in automotive radar:

CFAR VariantMethodStrengthsWeaknesses
CA-CFAR (Cell Averaging)Averages reference cellsOptimal in uniform noiseFails near strong targets (masking)
OS-CFAR (Ordered Statistic)Uses kth-ordered reference cellRobust to interfering targetsHigher computational cost
GO-CFAR (Greatest Of)Takes max of leading/trailing estimatesGood at clutter edgesConservative (misses weak targets)
SO-CFAR (Smallest Of)Takes min of leading/trailing estimatesSensitive (finds weak targets)Higher false alarm rate at edges

The ZF radar likely uses a hybrid or adaptive CFAR strategy that switches between variants depending on the local environment. In highway scenarios, strong returns from guardrails, bridge abutments, and large trucks can mask weaker returns from smaller vehicles -- a classical radar problem that OS-CFAR partially addresses.

Doppler Velocity Extraction

The Doppler velocity measurement is perhaps the most valuable classical output of the radar for Kodiak's perception stack. Unlike LiDAR and cameras, radar provides instantaneous radial velocity for every detection:

  • Velocity accuracy: Typically 0.1-0.3 m/s for automotive radar
  • Velocity range: The maximum unambiguous velocity is determined by chirp repetition rate
  • Velocity resolution: Determined by the coherent processing interval (number of chirps per frame)

The Doppler velocity is the radial component (toward/away from the radar). For targets moving laterally (crossing traffic, merging vehicles), the radial velocity is a fraction of the true velocity, requiring geometric correction based on the target's angular position.

Stationary vs. Moving Target Discrimination: By comparing the measured Doppler velocity to the ego-vehicle's velocity (from the IMU/GPS), the radar classifies every detection as stationary or moving. Stationary detections (guardrails, signs, bridges) have a Doppler velocity equal to the ego-vehicle's speed projected along the line of sight. This classical ego-motion compensation is performed entirely in the radar's signal processing chain.


4. Camera ISP Pipeline

Image Signal Processing Chain

All 12 cameras on the Gen 6 Kodiak truck route their raw sensor data through the Image Signal Processing (ISP) pipeline on the Ambarella CV3-AD685 SoC. The ISP is a cascade of classical image processing algorithms that convert the raw Bayer-pattern sensor readout into a clean, color-calibrated image suitable for both ML inference and classical computer vision:

ISP StageClassical AlgorithmPurpose
Black level subtractionPer-pixel offset correctionRemove sensor dark current
Defective pixel correctionNeighbor interpolationReplace known bad pixels
Lens shading correctionRadial gain mapCompensate for lens vignetting
DemosaicingBayer interpolation (edge-directed)Convert Bayer RGGB to full RGB
White balanceGray-world or illuminant estimationCorrect for lighting color temperature
Color correction3x3 matrix multiplicationMap sensor color space to sRGB
Noise reductionSpatial and temporal NLM filteringReduce sensor noise
HDR tone mappingLocal/global tone mapping operatorsCompress dynamic range for display/inference
Gamma correctionPower-law transfer functionPerceptual linearization
SharpeningUnsharp mask or edge enhancementRecover detail lost in noise reduction
Distortion correctionPolynomial radial/tangential modelRemove lens geometric distortion

HDR Processing for Highway Conditions

Highway driving presents extreme dynamic range challenges that are handled entirely by classical ISP processing:

Tunnel transitions: When a truck enters a tunnel on a bright day, the scene dynamic range can exceed 120 dB -- the bright highway outside the tunnel entrance versus the dim tunnel interior. Standard cameras capture only ~60-70 dB. The Ambarella ISP addresses this through:

  1. Multi-exposure fusion: The camera sensor captures multiple exposures per frame (short, medium, long). The ISP fuses these into a single HDR image where both bright and dark regions retain detail.
  2. Local tone mapping: Rather than applying a global brightness curve, the ISP adjusts brightness locally -- keeping the bright sky properly exposed while boosting the dark tunnel interior. This is a classical operation using bilateral filtering or similar edge-preserving smoothing to compute local brightness estimates.
  3. Temporal smoothing: The ISP applies temporal filtering to avoid brightness flickering during rapid transitions, ensuring smooth exposure adaptation.

Sun glare: Direct sunlight at low angles (common on east-west Texas highways at sunrise/sunset) produces intense saturation in portions of the image. The ISP's HDR pipeline prevents the entire image from being overwhelmed by:

  • Short-exposure frames capture the bright glare region without saturation
  • Long-exposure frames capture the rest of the scene with proper brightness
  • The fused HDR image preserves detail everywhere

Night driving: The ISP's enhanced dynamic range processing handles the contrast between bright headlights/taillights (which can saturate the sensor) and the near-zero ambient illumination on the road surface. Classical temporal noise reduction accumulates signal across frames to reduce sensor noise in dark regions.

Classical Camera Processing Operations

Beyond the ISP, several classical computer vision operations run on the camera data before ML inference:

  • Image undistortion: Removing lens distortion using calibrated intrinsic parameters (polynomial distortion model)
  • Image rectification: Aligning stereo camera pairs for depth estimation
  • Image cropping and resizing: Preparing regions of interest for ML detectors
  • Color space conversion: Converting from YUV (sensor native) to RGB (network input)
  • Image pyramid generation: Creating multi-scale representations for scale-invariant detection

These operations are entirely classical and consume significant compute resources, which is why they are often offloaded to dedicated hardware (the PVA on NVIDIA Orin, or the ISP on Ambarella).


5. SensorPod Calibration Architecture

Patent-Pending Pre-Calibration System

The SensorPod's pre-calibration system is a critical piece of classical calibration engineering. Every SensorPod ships from the factory with all sensor extrinsic parameters (relative positions and orientations) precisely measured and stored. This eliminates the need for field calibration after SensorPod replacement.

Factory Calibration Process (inferred from patent filings and public descriptions):

  1. Intrinsic calibration: Each individual sensor (camera, LiDAR, radar) is calibrated for its internal parameters:

    • Cameras: focal length, principal point, distortion coefficients (using checkerboard/ChArUco targets)
    • LiDAR: channel-by-channel angular offsets, range bias, timing offsets
    • Radar: antenna phase center locations, MMIC-to-MMIC phase offsets
  2. Intra-pod extrinsic calibration: The relative 6-DOF pose (3 translation + 3 rotation) between every sensor pair within the pod is measured:

    • Luminar-to-Hesai: Measured using overlapping FOV regions with common targets
    • LiDAR-to-camera: Target-based calibration using calibration boards visible to both modalities
    • Radar-to-LiDAR: Using corner reflectors visible to both sensors
  3. Pod-to-vehicle extrinsic calibration: Because SensorPods mount to standardized mirror brackets, the pod-to-vehicle transformation is defined by the mechanical mounting interface plus a small per-installation adjustment.

Thermal Stability Design

Long-haul trucking subjects sensors to severe thermal cycling:

  • Engine heat radiating upward near the cab
  • Desert ambient temperatures exceeding 50 degrees C (120 degrees F)
  • Night temperatures dropping below 0 degrees C (32 degrees F) in winter
  • Rapid thermal transients when driving between sun and shade

Thermal expansion can shift sensor alignments by millimeters -- enough to cause significant calibration drift. Kodiak addresses this through:

  • Material selection: The SensorPod enclosure and internal mounting structures use materials with matched or low coefficients of thermal expansion
  • Thermal isolation: Critical optical components are thermally isolated from the external enclosure
  • Compensation models: Classical thermal models predict calibration shift as a function of temperature and apply corrections

Vibration Resistance

Highway trucking generates persistent vibration from:

  • Engine and drivetrain vibration transmitted through the cab structure
  • Road surface irregularities (expansion joints, potholes, rough pavement)
  • Aerodynamic buffeting at highway speeds (60-80 mph)
  • Trailer coupling dynamics

Kodiak's SensorPod design addresses vibration through:

  • Aerospace/military-grade connectors: The main power/data connector is rated for aviation, aerospace, and military applications -- designed to maintain connection integrity under sustained vibration
  • Rigid internal mounting: Sensors are rigidly mounted to minimize relative movement between sensors within the pod
  • Vibration testing: SensorPods are validated through standardized vibration profiles that simulate worst-case trucking conditions

Three-Bolt Mounting Precision

The mechanical interface between the SensorPod and the truck uses exactly three bolts. Three bolts define a plane and provide a repeatable, over-constrained mounting that ensures the SensorPod returns to the same position and orientation every time it is installed. This mechanical repeatability is what makes pre-calibration viable -- the pod-to-vehicle transformation is stable across installations.


6. LiDAR-Camera Cross-Calibration

Extrinsic Calibration Methods

Cross-calibrating LiDAR and cameras requires determining the 6-DOF rigid transformation (rotation matrix R and translation vector t) between the LiDAR coordinate frame and each camera coordinate frame. For Kodiak's SensorPods, this calibration must be accurate to sub-centimeter translation and sub-0.1-degree rotation to ensure that LiDAR points project correctly onto camera images.

Target-Based Calibration (Factory):

The standard factory approach uses calibration targets visible to both sensors:

  1. Checkerboard / ChArUco boards: The camera detects corner features using classical corner detection (Harris, Shi-Tomasi). The LiDAR detects the board as a planar surface. The transformation that best aligns the LiDAR plane normal and centroid with the camera-detected board pose is computed via:

    • Point-to-plane correspondence establishment
    • Least-squares optimization of the extrinsic parameters
    • Reprojection error minimization
  2. Multiple board poses: The calibration target is placed at many different positions and orientations (typically 20-50 poses) to ensure observability of all 6 DOF parameters and reduce sensitivity to noise.

  3. Nonlinear refinement: After initial estimation (e.g., via PnP algorithm for camera pose, RANSAC plane fitting for LiDAR), a joint nonlinear optimization (Levenberg-Marquardt) refines all parameters simultaneously to minimize the total reprojection error across all poses and all sensor pairs.

Targetless Verification (On-Road):

Once deployed, the system can verify calibration quality without dedicated targets by exploiting natural scene features:

  • Edge alignment: Comparing LiDAR depth discontinuities (object edges) with camera image edges. Well-calibrated systems show tight alignment; miscalibration manifests as systematic shifts.
  • Mutual information: Computing the statistical dependence between LiDAR reflectivity and camera intensity. Correct calibration maximizes mutual information because the same surface produces correlated reflectivity and intensity values.
  • Feature correspondence: Classical feature detectors (SIFT, ORB) on camera images matched with projected LiDAR reflectivity images.

Luminar-to-Camera Calibration

The Luminar Iris produces a particularly high-resolution reflectivity image (>300 points per square degree) that closely resembles a camera image. This makes Luminar-to-camera calibration especially robust because:

  • The reflectivity image provides rich texture for feature matching
  • Road markings, signs, and vehicle surfaces create strong features in both modalities
  • The high angular resolution minimizes quantization effects in point-to-pixel correspondence

Hesai-to-Camera Calibration

The Hesai spinning LiDARs have lower angular resolution than the Luminar Iris but provide 360-degree coverage. Calibrating Hesai to the side-facing and rear-facing cameras requires:

  • Coverage overlap between the LiDAR and camera FOVs
  • Careful handling of the motion distortion inherent in spinning LiDARs (the LiDAR scans over ~100ms per rotation, during which the truck moves ~3m at highway speed)
  • Ego-motion compensation to align the LiDAR points to the camera exposure timestamp

7. Radar Calibration

ZF 4D Radar Alignment

Radar calibration is distinctly different from LiDAR or camera calibration because radar operates on fundamentally different physics (RF reflection rather than optical). The key calibration parameters for the ZF 4D radar are:

Mounting Angle Estimation: The most critical radar calibration parameter. Even a 1-degree azimuth misalignment causes a lateral position error of 5.2m at 300m range. Mounting angle calibration methods include:

  1. Static alignment: Performed in a controlled environment using known radar targets (corner reflectors) at calibrated positions. The radar's measured angles are compared to the known target angles, and the mounting offset is computed.

  2. Dynamic alignment (motion-based): While the vehicle drives, stationary objects (guardrails, signs, buildings) should have a Doppler velocity exactly equal to the ego-vehicle's velocity projected along the line of sight. Any systematic angular offset between the predicted and measured Doppler velocities reveals the mounting angle error. This is a classical least-squares estimation:

mounting_angle = argmin sum((v_doppler_measured - v_ego * cos(theta_measured + mounting_angle))^2)
  1. Track alignment: Using extended driving data, the radar tracks of stationary objects are fit to straight lines (for highway guardrails) or known geometric shapes. The mounting angle that best aligns these tracks with the expected geometry is estimated.

MMIC Phase Calibration: For the cascaded 4-MMIC configuration, the relative phase between MMICs must be calibrated to maintain coherent beamforming. This is typically done at the radar module level during manufacturing using classical phase comparison techniques with a known reference signal.

Range Bias Calibration: A constant offset between measured and true range, caused by signal processing delays, cable lengths, and ASIC timing. Measured against known-distance targets and stored as a scalar correction.

Radar-to-Vehicle Frame Transformation

The radar's (range, azimuth, elevation, Doppler) measurements must be transformed to the vehicle coordinate frame. This requires:

  • The radar mounting position (x, y, z) in the vehicle frame
  • The radar mounting angles (yaw, pitch, roll) relative to the vehicle frame
  • These are measured during SensorPod factory calibration and stored alongside the LiDAR and camera extrinsics

8. Online Calibration

Why Online Calibration Is Necessary

Factory calibration establishes initial sensor alignment, but several factors cause calibration to drift during operation:

Drift SourceMechanismMagnitude
Thermal cyclingDifferential thermal expansion of mounting structures0.1-1.0 mm translation, 0.01-0.1 degrees rotation
Vibration-induced settlingFasteners and mounts shift under sustained vibrationCumulative over thousands of miles
Mechanical shockPotholes, railroad crossings, loading dock bumpsSudden shifts of 1+ mm possible
SensorPod replacementEven with three-bolt mounting, installation variability existsUp to several mm if bolts not fully torqued

For a system that fuses data from 22 sensors across two SensorPods, even sub-millimeter calibration drift can cause:

  • LiDAR points projecting to wrong camera pixels (misclassification)
  • Radar detections not correlating with LiDAR detections of the same object (data association failure)
  • Inconsistent position estimates from different sensors (tracker instability)

Post-Drive Calibration Analysis

Kodiak has described a post-drive calibration analysis system where, after every drive, the system analyzes Tracker performance and identifies systematic biases in individual detectors. The Tracker tells each detector: "During the last run, you were systematically biased by a few centimeters; please adjust your measurements accordingly."

This is classical statistical estimation:

  1. For each detector, collect the residuals between the detector's measurements and the Tracker's fused estimates (which are more accurate because they combine all sensors)
  2. Compute the mean residual -- this represents a systematic bias (calibration drift)
  3. If the mean residual exceeds a threshold, apply a correction to the detector's extrinsic parameters

Real-Time Calibration Monitoring

Kodiak has indicated that future versions will apply calibration corrections in real-time. The classical approaches for online calibration monitoring include:

Reprojection consistency: LiDAR points projected onto camera images should align with detected features (edges, lane markings). The running average of the alignment error serves as a calibration health metric.

Cross-sensor detection consistency: When multiple sensors detect the same object, the position estimates should agree within the sensors' noise characteristics. Systematic disagreement indicates calibration drift.

Ego-motion consistency: Each sensor can independently estimate the vehicle's ego-motion (LiDAR via scan matching, cameras via visual odometry, radar via Doppler). Disagreements between ego-motion estimates from different sensors indicate relative calibration drift.

Temperature-compensated models: Classical thermal expansion models predict calibration shift based on measured temperatures at various points in the SensorPod structure. These predicted shifts are proactively applied as corrections.


9. Sparse Map Generation

What Sparse Maps Contain

Kodiak's sparse maps store only essential highway information, measured in kilobytes per mile:

Data LayerContentClassical Generation Method
GeometricLane boundary locations, road edgesClassical lane detection + SLAM
TopologicalRoad connectivity (which lanes connect where)Graph extraction from lane geometry
SemanticSpeed limits, road surface type, lane attributesSign reading + manual annotation

This contrasts with HD maps that store full 3D geometry, every curb, every sign, and every lane marking -- requiring megabytes per mile and dedicated mapping vehicles.

How Sparse Maps Are Generated

Sparse map generation relies heavily on classical computer vision and geometry:

  1. Lane boundary extraction: As Kodiak trucks drive routes, their cameras detect lane markings using classical and ML-based lane detection. The detected lane boundaries are recorded with their GPS-referenced positions.

  2. Multi-pass aggregation: Because lane detection from a single pass is noisy, multiple passes over the same road are aggregated. Classical statistical methods (weighted averaging, outlier rejection) produce a consensus lane boundary position from multiple observations.

  3. Geometric fitting: The aggregated lane boundary points are fit to parametric curves:

    • Straight segments: Least-squares line fitting
    • Curved segments: Clothoid (Euler spiral) fitting or polynomial fitting
    • Transitions: Smooth interpolation between segments
  4. Topological graph construction: Lane connectivity (which lane connects to which at merges, splits, and exits) is extracted from the geometric data using classical graph algorithms.

  5. Compression: The parametric representation is inherently compact -- a lane boundary described by a sequence of clothoid parameters requires far less storage than a dense point cloud.

Fleet-Sourced Map Updates

Because the maps are just kilobytes per mile, updates can be distributed to the entire fleet over-the-air almost daily. The map update pipeline is:

  1. Production trucks detect changes (new lane markings, shifted lanes, new construction) through perception
  2. The changed data is uploaded to the cloud
  3. Classical map merging algorithms integrate the new observations with the existing map
  4. Updated maps are pushed to all trucks in the fleet

This turns every Kodiak truck into a mapping vehicle -- no dedicated mapping fleet is needed.


10. Map-Disagreement Handling

Discrepancy Detection

The Kodiak Driver continuously compares what it perceives in real-time with what the sparse map predicts. This comparison is a classical hypothesis testing problem:

Lane boundary comparison: The real-time lane detector reports lane markings at positions (x_perceived, y_perceived). The sparse map predicts lane markings at positions (x_map, y_map). The system computes:

discrepancy = ||(x_perceived, y_perceived) - (x_map, y_map)||

If the discrepancy exceeds a threshold (accounting for localization uncertainty and perception noise), a map-perception disagreement is declared.

Lane count comparison: If the map says there are 3 lanes but perception detects 4, or vice versa, this is a structural disagreement that indicates a road change.

Road geometry comparison: If the perceived road curvature differs significantly from the mapped curvature, this may indicate a traffic shift or new road alignment.

Trust Hierarchy

When disagreement is detected, the system follows the "perception over priors" principle with a classical decision logic:

  1. Perception wins: The system trusts real-time sensor data over stored map data
  2. Uncertainty increases: The planner is notified that map reliability is reduced, causing more conservative driving behavior
  3. Temporary map built: An on-the-fly local map is constructed from real-time perception
  4. Fleet notification: The discrepancy is reported to the cloud for fleet-wide map update

On-the-Fly Map Construction

When the stored map is invalidated (e.g., by a new construction zone), the system builds a temporary map using only real-time perception:

  1. Lane detection: Camera and LiDAR-based lane detection provides current lane boundaries
  2. Road edge detection: LiDAR detects curbs, barriers, and road surface boundaries
  3. Vehicle trajectories: The paths of other vehicles provide evidence of drivable lanes
  4. Cone/barrel lines: Construction cones and barrels are clustered into lines that define temporary lane boundaries

This temporary map is "good enough for the Kodiak Driver to drive safely until the map matches again."


11. Real-Time Road Model

Classical Road Geometry Estimation

When maps are insufficient or stale, the Kodiak Driver must estimate road geometry in real-time using classical parametric road models:

Polynomial Road Model: The simplest classical model represents each lane boundary as a polynomial function of longitudinal distance:

y(x) = c0 + c1*x + c2*x^2 + c3*x^3

where:

  • c0 = lateral offset (distance from ego-vehicle to lane boundary)
  • c1 = heading angle relative to lane boundary
  • c2 = curvature (1/radius)
  • c3 = curvature rate (change of curvature with distance)

The coefficients are estimated from detected lane marking points using weighted least-squares fitting, with weights decreasing with distance (farther detections are less reliable).

Clothoid (Euler Spiral) Road Model: A more physically accurate model used in road design, where curvature varies linearly with arc length:

kappa(s) = kappa_0 + kappa_1 * s

This models the fact that real roads transition smoothly between straight sections and curves -- the curvature does not jump instantaneously. Clothoid fitting from lane marking detections uses nonlinear least-squares optimization (Gauss-Newton or Levenberg-Marquardt).

Multi-Lane Road Model: For highway driving, the system estimates a full road model comprising:

  • Left and right lane boundaries for the ego lane
  • Adjacent lane boundaries (for lane change feasibility)
  • Road edges (barriers, curbs, shoulders)
  • Road surface normal (for grade estimation on hills)

Ground Plane Estimation

Separating ground points from obstacle points in the LiDAR point cloud is a fundamental classical preprocessing step:

RANSAC-Based Ground Plane Fitting: The Random Sample Consensus (RANSAC) algorithm iteratively fits a plane to the LiDAR points:

  1. Randomly sample 3 points
  2. Fit a plane through them
  3. Count inliers (points within a threshold distance of the plane)
  4. Repeat; keep the plane with the most inliers

For highway roads, the ground is approximately planar locally. However, grades, crown, and superelevation require multi-plane or curved surface fitting for robust ground segmentation.

Alternatives to RANSAC:

  • Region-wise PCA ground fitting: Divides the point cloud into sectors and fits ground planes per sector using Principal Component Analysis. 2x faster than RANSAC.
  • Grid-based height thresholding: Projects points to a 2D grid and classifies each cell based on the height distribution of points falling in it.
  • Ring-based methods: For spinning LiDARs, processes each laser ring independently, identifying ground points by comparing consecutive returns within each ring.

Ground plane estimation is the first step in the perception pipeline -- it separates the problem into "road surface" (for drivability assessment) and "above-ground objects" (for detection and tracking).


12. Lane Detection

Classical Lane Detection Methods

Lane detection is central to Kodiak's perception-over-priors philosophy because the system localizes primarily using perceived lane markings, like a human driver. The lane detection pipeline combines classical and ML techniques:

Classical Pre-Processing:

  1. Region of interest selection: Only the road surface region of the image is processed, reducing computation. The ROI is defined by the estimated road plane.
  2. Perspective transformation (IPM): An Inverse Perspective Mapping transforms the camera image to a top-down (bird's eye view) representation where lane markings appear as parallel lines rather than converging lines. This is a classical projective transformation using the known camera extrinsics and the estimated ground plane.
  3. Gradient computation: Sobel or Scharr operators detect intensity gradients that indicate lane marking edges.
  4. Color thresholding: Classical color space conversion (RGB to HSV or HLS) and thresholding to isolate yellow and white lane markings.

Classical Lane Line Fitting: After candidate lane marking pixels are identified (by ML segmentation or classical thresholding), the lane boundaries are fit using:

  • Sliding window search: Starting from a histogram peak at the image bottom, a sliding window traces the lane marking upward through the image
  • Polynomial fitting: A 2nd or 3rd order polynomial is fit to the lane marking pixels using least-squares
  • RANSAC line/curve fitting: Robust fitting that rejects outlier pixels (e.g., from road debris, shadows, tire marks)

Temporal filtering: Lane detections from consecutive frames are filtered using classical state estimation (Kalman filter or exponential moving average) to produce smooth, stable lane boundary estimates. This prevents the lane model from "jumping" due to single-frame detection noise.

LiDAR-Based Lane Detection

The Luminar Iris's reflectivity channel provides an alternative lane detection modality that is independent of cameras:

  • Lane markings (paint, thermoplastic, tape) have significantly higher reflectivity than asphalt
  • A simple reflectivity threshold on the LiDAR point cloud identifies lane marking points
  • These points are fit using the same polynomial or clothoid models as camera-detected markings
  • LiDAR-based lane detection works identically day and night (active illumination) and is unaffected by sun glare

Sensor Fusion for Lane Detection

The system fuses camera-detected and LiDAR-detected lane boundaries using classical weighted averaging:

  • Camera provides dense, high-resolution lane marking detection at short to medium range
  • LiDAR provides lighting-independent lane marking detection with precise range measurement
  • Radar guardrail detections provide road edge constraints
  • The fused lane model is more robust than any single sensor's estimate

13. Long-Range Processing

The Point Density Problem

LiDAR point density drops with the square of distance. For the Luminar Iris:

RangeApproximate Point DensityPoints on a SedanDetection Challenge
50mVery highHundredsEasy detection and classification
100mHigh~100Reliable detection
200mModerate~25Classification uncertain
400mLow~6Detection marginal, classification difficult
600mVery low~1-3Near detection threshold

At 600m, a sedan subtends less than 0.1 degrees of azimuth -- only a few LiDAR resolution cells. Detection at this range depends on:

Classical Compensation Methods

Temporal accumulation: Over consecutive frames, the system accumulates LiDAR returns from the same spatial region. Even if a single frame produces only 1-2 points on a distant object, 10 frames produce 10-20 points -- enough for reliable detection. This requires precise ego-motion compensation to align points from different frames into a common coordinate system.

Ego-motion compensation: At 65 mph, the truck moves ~2.9m per LiDAR frame (at 10 Hz). Multi-frame accumulation requires compensating for this motion using the ego-vehicle's IMU and GPS data. Classical coordinate transformations (rotation + translation) align each frame's points to a reference frame.

Adaptive detection thresholds: Classical detection theory (Neyman-Pearson) trades off between probability of detection (Pd) and probability of false alarm (Pfa). At long range, the system can lower the detection threshold (accepting more false alarms) and use temporal persistence (requiring the detection to appear in multiple consecutive frames) to maintain high Pd while controlling Pfa.

Sensor fusion for range extension: At 300-600m, LiDAR may have only a few points on an object, but narrow-FOV cameras can resolve the object as many pixels. Classical cross-modal validation correlates sparse LiDAR detections with camera image patches to confirm or reject long-range LiDAR detections.

Radar as long-range anchor: The ZF 4D radar detects moving vehicles at 300+ meters with native Doppler velocity. A moving vehicle at 350m that has not yet been reliably detected by LiDAR may already have a confirmed radar detection with velocity information. This radar detection primes the perception system to look for corresponding LiDAR returns at the predicted position.

Progressive Refinement Strategy

Kodiak's long-range perception follows a progressive refinement strategy that is fundamentally sequential and classical in its logic:

  1. 600m: First detection (LiDAR); low confidence; "something is there"
  2. 400m: Increasing LiDAR density; initial classification; camera begins resolving object
  3. 300m: Radar velocity confirms moving/stopped; 4D radar provides elevation
  4. 200m: Full multi-sensor coverage; high-confidence classification
  5. 100m: Near-field; detailed shape, behavior prediction, trajectory planning

This staged approach provides multiple independent opportunities to detect every object before the truck reaches it.


14. Highway Merge Detection

Geometric Detection of Merge Zones

Highway merges are detected through classical geometric analysis:

Lane boundary divergence/convergence: The lane detection system identifies points where:

  • A new lane boundary appears (on-ramp merging into the highway)
  • An existing lane boundary ends (lane drop)
  • Lane boundaries diverge (exit ramp)

These topological changes in the lane structure are detected by tracking the number of lane boundaries over time and identifying transitions. This is a classical pattern recognition problem on the lane model output.

Road geometry analysis: On-ramps have characteristic geometric features:

  • A curved entry lane with decreasing radius (acceleration lane)
  • A taper where the acceleration lane narrows and merges
  • A gore area (the painted triangle separating the ramp from the highway)

The system detects these features from the perceived lane geometry and classifies the road configuration.

Adjacent Vehicle Tracking for Merge

When a merge zone is detected, the perception system intensifies tracking of vehicles in the merge lane:

  • Hesai 360-degree LiDAR: Provides surround detection of merging vehicles approaching from the side
  • Side-mounted radar (Gen 6): Provides long-range detection and velocity measurement of vehicles on the on-ramp
  • Side cameras: Visual detection and classification of merging vehicles

Classical tracking (Kalman filtering) of merging vehicles estimates their future trajectory -- will they accelerate to merge ahead or decelerate to merge behind the truck? This prediction uses the constant velocity or constant acceleration motion model, extrapolating the vehicle's current state forward in time.

Gap Assessment

For the truck to execute its own lane change (to accommodate merging traffic or to exit), the system performs classical gap assessment:

  1. Track all vehicles in the target lane within a relevant range window
  2. Extrapolate their positions forward using constant velocity prediction
  3. Compute the available gap as a function of time
  4. Compare the gap to the minimum safe gap (function of truck speed, truck length, relative velocities)

This is a classical trajectory analysis problem solved entirely with deterministic geometry and kinematics.


15. Construction Zone Processing

Cone and Barrel Clustering

Construction zones are defined by physical delineators -- cones, barrels (channelizing drums), and barriers. The perception system must detect these individual objects and then group them into meaningful lane boundaries:

Classical Point Cloud Processing:

  1. Ground removal: RANSAC or grid-based ground plane estimation removes road surface points
  2. Euclidean clustering: The remaining above-ground points are grouped into clusters based on spatial proximity. Points within a distance threshold (e.g., 0.3m) are assigned to the same cluster. Standard algorithm: DBSCAN or KD-tree-based region growing.
  3. Cluster classification: Each cluster's geometric properties (height, width, aspect ratio, point count) are compared to known profiles:
    • Traffic cone: ~0.7m height, ~0.3m base, conical shape
    • Channelizing drum: ~1.0m height, ~0.5m diameter, cylindrical
    • Jersey barrier: ~0.8m height, elongated, flat top
  4. Line fitting: Individual cones/barrels are fit to lines (RANSAC line fitting on cluster centroids) to estimate temporary lane boundaries. The line through cone/barrel centroids defines the new lane edge.

Camera-Based Construction Detection:

  • Color-based detection: Orange cones and barrels are distinctive in HSV color space, enabling classical color thresholding as a complementary detection method
  • Sign reading: Temporary construction signs (speed limits, lane closures, "ROAD WORK AHEAD") are detected and read
  • Worker detection: Construction workers are detected as pedestrians with classification augmented by synthetic training data from Scale AI

Temporary Lane Estimation

Once cones and barrels are detected and grouped into lines, the system estimates temporary lane geometry:

  1. Cone/barrel lines on opposite sides of traffic define lane boundaries
  2. The space between adjacent cone lines is classified as a drivable lane
  3. The width is verified against expected lane width ranges (typically 3.0-3.7m)
  4. The resulting lane model replaces the stale map data for this road segment

Work Zone Geometry

Construction zones have characteristic geometric configurations that the system recognizes:

ConfigurationDetection Method
Lane closure (merge taper)One cone line converges toward another; lane count decreases
Lane shiftBoth cone lines shift laterally; lane maintains width but moves
Contraflow (crossover)Lanes cross the median into opposing traffic direction
Reduced speed zoneSpeed limit signs detected; cone spacing may indicate reduced speed
Work zone endCone lines terminate; normal lane markings resume

The system uses classical finite state machine logic to track its state within the construction zone: approaching, entering, traversing, exiting.

Fleet Propagation

When a truck encounters a construction zone that is not in its map:

  1. The temporary lane model is captured with GPS coordinates and timestamps
  2. This data is uploaded to the cloud
  3. Classical map-merging algorithms integrate the construction zone into the fleet-wide map
  4. The updated map is pushed to all trucks, so subsequent trucks are prepared before they arrive

16. Cut-In Detection

Classical Motion Analysis for Cut-In Prediction

A cut-in occurs when a vehicle from an adjacent lane moves into the ego-vehicle's lane ahead of it. At highway speeds, this is a safety-critical scenario because it suddenly reduces following distance. Early detection of cut-in intent is essential for smooth braking response.

Lateral Velocity Monitoring

The primary classical indicator of an impending cut-in is lateral velocity -- movement toward the ego lane:

  1. Track lateral position over time: The Tracker maintains a time series of each tracked vehicle's lateral position relative to lane boundaries
  2. Compute lateral velocity: Numerical differentiation (with smoothing) estimates the rate of lateral position change
  3. Threshold detection: When lateral velocity exceeds a threshold (indicating movement toward the ego lane), the system flags the vehicle as a potential cut-in

This is a classical signal detection problem: distinguishing intentional lateral movement (cut-in) from normal lane-keeping oscillations (noise). The threshold must be set to balance early detection (low threshold = early warning but more false alarms) against false alarm rate.

Prototype Trajectory Matching

A more sophisticated classical approach uses prototype trajectory matching:

  1. A database of typical lane change trajectories is built from real driving data (clustered using agglomerative hierarchical clustering)
  2. The observed vehicle trajectory is compared to each prototype using Dynamic Time Warping (DTW) or similar sequence matching
  3. The similarity score indicates the probability that the vehicle is executing a lane change
  4. High similarity to a "lane change left" prototype when the vehicle is to the right of the ego lane indicates an impending cut-in

Time-to-Lane-Crossing (TLC)

A classical geometric metric that estimates when the vehicle will cross the lane boundary:

TLC = lateral_distance_to_lane_boundary / lateral_velocity

When TLC drops below a critical threshold (e.g., 2 seconds), the system initiates defensive measures (alerting the planner, pre-staging braking).

Radar Velocity Advantage

The ZF 4D radar provides instantaneous radial velocity for the potential cut-in vehicle. If the vehicle is in an adjacent lane and has a velocity vector that points toward the ego lane, the radial velocity component measured by the radar reflects this lateral component. This provides a direct, physics-based measurement of cut-in intent that supplements the camera and LiDAR-based lateral position tracking.


17. Highway-Speed Tracking

State Vector and Motion Models

The Kodiak Vision Tracker maintains a state vector for every tracked object. Based on Kodiak's references to information theory and state estimation, and the standard practice in autonomous vehicle tracking, the state vector likely includes:

x = [px, py, pz, vx, vy, vz, ax, ay, heading, yaw_rate, length, width, height]

where:

  • (px, py, pz): Position in vehicle frame
  • (vx, vy, vz): Velocity components
  • (ax, ay): Acceleration components
  • heading: Object heading angle
  • yaw_rate: Rate of heading change
  • (length, width, height): Object dimensions

Motion Models for Highway Tracking

At highway speeds, the choice of motion model significantly affects tracking performance. Classical models used:

Constant Velocity (CV): Assumes the object maintains constant velocity. State prediction:

px(t+dt) = px(t) + vx(t)*dt
py(t+dt) = py(t) + vy(t)*dt

Simple and effective for vehicles maintaining speed in a lane. Fails during acceleration, braking, or lane changes.

Constant Acceleration (CA): Adds acceleration to the state:

vx(t+dt) = vx(t) + ax(t)*dt
px(t+dt) = px(t) + vx(t)*dt + 0.5*ax(t)*dt^2

Better for braking and accelerating vehicles.

Constant Turn Rate and Velocity (CTRV): Models a vehicle moving at constant speed along a circular arc:

heading(t+dt) = heading(t) + yaw_rate*dt
px(t+dt) = px(t) + (v/yaw_rate) * (sin(heading + yaw_rate*dt) - sin(heading))
py(t+dt) = py(t) + (v/yaw_rate) * (cos(heading) - cos(heading + yaw_rate*dt))

Essential for tracking vehicles executing lane changes (which follow curved paths).

Constant Turn Rate and Acceleration (CTRA): Extends CTRV with acceleration. Most complex but most accurate for vehicles simultaneously turning and accelerating (e.g., a vehicle accelerating through a highway curve).

Extended/Unscented Kalman Filter

Because CTRV and CTRA models are nonlinear, the standard Kalman Filter cannot be used directly. The Tracker likely uses:

  • Extended Kalman Filter (EKF): Linearizes the nonlinear models around the current state estimate using Jacobian matrices. Computationally efficient but may introduce linearization errors for strongly nonlinear dynamics.

  • Unscented Kalman Filter (UKF): Uses sigma points to propagate the state distribution through the nonlinear model without linearization. More accurate than EKF for nonlinear models, at higher computational cost.

Highway-Speed-Specific Challenges

ChallengeClassical Solution
High ego-velocityPrecise ego-motion compensation using IMU at high rate (100+ Hz)
High relative velocityLarge process noise to accommodate rapid range changes
Long tracksObjects tracked for 30+ seconds (600m to passing) require stable track management
Sparse long-range detectionsLarge prediction uncertainty at long range; covariance grows rapidly between updates
Lane-change detectionCTRV model captures curved motion; lateral velocity in state vector detects intent
20 Hz update rate50ms between updates; at 65 mph relative, objects move ~1.5m between frames

Multi-Model Tracking

For robust highway tracking, the Tracker may employ an Interacting Multiple Model (IMM) filter that runs several motion models in parallel (CV, CA, CTRV) and maintains a probability weight for each model. When a vehicle is driving straight, the CV model dominates. When it begins a lane change, the CTRV model's weight increases. This adaptive model selection is a classical Bayesian estimation technique.


18. Data Association

The Assignment Problem

Data association -- matching new sensor detections to existing tracks -- is a classical combinatorial optimization problem. At each timestep, the Tracker receives N detections and maintains M tracks. It must determine which detection corresponds to which track, and identify:

  • Detections that match existing tracks (track updates)
  • Detections that match no existing track (new track initialization)
  • Tracks with no matching detection (track prediction only; possible track termination)

Gating

Before solving the full assignment problem, classical gating eliminates implausible associations:

  1. Position gate: A detection is only considered for association with a track if it falls within the track's predicted position uncertainty ellipse (typically the Mahalanobis distance is below a threshold, e.g., 3-sigma).

  2. Velocity gate: If the detection has velocity information (radar Doppler), it must be consistent with the track's predicted velocity.

  3. Size gate: Detection dimensions must be compatible with the tracked object's estimated dimensions.

  4. Classification gate: A detection classified as "truck" is not associated with a track classified as "motorcycle."

At highway speeds with long-range sparse detections, gating is challenging because:

  • Prediction uncertainty grows large at long range
  • Multiple targets may fall within the same gate
  • Sparse LiDAR detections at 400-600m have high position uncertainty

Assignment Algorithms

Classical algorithms for solving the gated assignment problem:

Hungarian Algorithm (Munkres): Finds the optimal one-to-one assignment that minimizes total cost. The cost for each detection-track pair is typically the Mahalanobis distance (statistical distance accounting for uncertainty).

Joint Probabilistic Data Association (JPDA): Rather than making hard assignments, JPDA computes a weighted combination of all plausible associations. Each track's state is updated using a weighted average of all detections in its gate, with weights proportional to association probabilities. This is more robust in cluttered environments where multiple detections fall near a single track.

Multi-Hypothesis Tracking (MHT): Maintains multiple hypotheses about the data association and defers the decision until more data resolves the ambiguity. The most rigorous but most computationally expensive approach.

Kodiak's Tracker, built on information-theoretic principles, likely uses a Bayesian approach similar to JPDA or MHT -- "assessing what conclusion best explains the whole body of evidence, without prejudice, rules, or hard-coding."

Highway-Specific Association Challenges

ScenarioChallengeClassical Solution
Closely spaced vehicles in trafficDetections from adjacent vehicles may be ambiguousUse velocity and heading differences to disambiguate
Occluded vehiclesA vehicle hidden behind a truck produces no detectionTrack prediction maintains the track; Tracker acknowledges high uncertainty
LiDAR-to-radar association at long rangeLiDAR and radar position accuracy differ at 300m+Inflate association gates at long range; use velocity for confirmation
Track handoff between sensorsObject transitions from Luminar-only to Hesai coverageMaintain unified track across sensor transitions

19. Track Prediction

Forward Prediction for Gating and Planning

Track prediction -- projecting tracked objects forward in time -- serves two purposes:

  1. Gating (short-term): Predicting where each track will be at the next sensor update (50ms ahead) to establish the search region for data association
  2. Planning (medium-term): Predicting where tracked objects will be 2-8 seconds ahead to inform motion planning

Classical Prediction Models

Constant velocity prediction: The simplest model. Projects the current velocity forward:

predicted_position(t+dt) = current_position + velocity * dt

Accurate for vehicles maintaining speed in a lane (the most common highway scenario).

Constant turn rate prediction: For vehicles executing lane changes or following curved roads:

predicted_position(t+dt) = CTRV_propagation(current_state, dt)

This produces a curved predicted trajectory that better matches lane-change and curve-following behavior.

Lane-constrained prediction: Uses the estimated lane geometry to constrain predictions. A vehicle is predicted to follow the lane unless lateral velocity indicates a lane change. This combines the motion model with the road model to produce more realistic predictions:

  • A vehicle in a curve is predicted to follow the lane curvature
  • A vehicle with zero lateral velocity is predicted to stay centered in its lane
  • A vehicle with lateral velocity is predicted to execute a lane change

Prediction Uncertainty Growth

A critical classical result: prediction uncertainty grows with time. The predicted position covariance at time t+dt includes:

  • The propagated state covariance (current uncertainty projected forward)
  • Process noise (additional uncertainty from unmodeled behavior -- the vehicle might brake, accelerate, or change lanes)

At highway speeds, 3 seconds of prediction can result in substantial uncertainty:

  • Longitudinal uncertainty: ~10-30m (the vehicle might brake or accelerate)
  • Lateral uncertainty: ~3-5m (the vehicle might change lanes)

The planner must account for this growing uncertainty, which is why the Tracker's explicit uncertainty representation is so important.


20. Multi-Sensor Track Fusion

Kodiak Vision Fusion Architecture

Kodiak Vision treats all sensors as primary and fuses their detections using a shared information-theoretic framework. The fusion architecture can be characterized as a centralized tracker with multi-detector input:

Luminar Iris #1 -----> Detector A ----+
                 +--> Detector B -----|
                                      |
Luminar Iris #2 -----> Detector C -----|
                 +--> Detector D -----|     +---> Kodiak Vision Tracker ---> Fused Tracks
                                      |     |
Hesai LiDAR #1 ------> Detector E ----|     |
Hesai LiDAR #2 ------> Detector F ----|     |
                                      |     |
ZF Radar #1 ---------> Detector G --+ |     |
ZF Radar #2 ---------> Detector H --|-|     |
ZF Radar #3 ---------> Detector I --|-|-----+
ZF Radar #4 ---------> Detector J --|-|
ZF Radar #5 ---------> Detector K --|-|
ZF Radar #6 ---------> Detector L --|-|
                                      |
Camera 1-12 ---------> Detector M --+ |
                  +--> Detector N --|-|
                  +--> Detector O ---|

Every sensor measurement passes through multiple independent software detectors, and all detector outputs are expressed in the common information-theoretic language before reaching the Tracker.

Information-Theoretic Fusion

The Tracker's mathematical framework, drawing from Shannon, Wiener, Kolmogorov, and Chapman, operates on probability distributions rather than point estimates:

  1. Each detection is a likelihood function: Rather than saying "there is a vehicle at (100m, 2m)," a detector says "here is the probability distribution over vehicle positions given my measurement." This distribution captures both the measured position and the measurement uncertainty.

  2. Bayesian update: The Tracker combines the prior track state (from prediction) with the new detection likelihood using Bayes' theorem:

posterior = prior * likelihood / normalization

This is the mathematical core of Kalman filtering and its extensions.

  1. Multi-sensor fusion: When multiple sensors detect the same object, their likelihood functions are combined:
  • LiDAR provides tight spatial uncertainty (precise position) but no native velocity
  • Radar provides tight velocity uncertainty (precise Doppler) but looser spatial uncertainty
  • Camera provides tight angular uncertainty (precise bearing) but no native depth
  • The fused estimate combines the best of each sensor
  1. Conflict resolution: When sensors disagree (e.g., camera says "no object" but LiDAR says "object at 200m"), the Tracker evaluates the relative credibility of each sensor's evidence. The information-theoretic framework naturally down-weights unreliable or conflicting evidence without hard-coded rules.

Cross-Modal Verification

When multiple physically independent sensor modalities agree on an object's presence and state, the confidence is much higher than any single sensor alone. The Tracker exploits three independent physical principles:

SensorPhysical PrincipleWhat It Measures Best
LiDARLight time-of-flight3D position, shape, reflectivity
RadarRF Doppler reflectionRadial velocity, range (weather-robust)
CameraOptical imagingColor, texture, classification, angular position

Cross-verified detections (confirmed by 2+ independent physics) have dramatically lower false positive and false negative rates than single-sensor detections.

Sensor Transition Handling

Objects moving through the Kodiak truck's sensor coverage experience transitions between sensor modalities:

  1. 600-300m: Luminar-only detection (forward-facing long-range)
  2. 300-200m: Luminar + radar + narrow-FOV camera
  3. 200-50m: Luminar + Hesai + radar + cameras (full coverage)
  4. 50-0m: Hesai + radar + wide-FOV cameras (Luminar may not cover near-field)
  5. Alongside: Hesai + side radar + side cameras (vehicle passing)
  6. Behind: Hesai + rear radar + rear cameras

The Tracker maintains continuous track identity across these transitions. An object first detected by Luminar at 400m is the same track when Hesai picks it up at 200m. Classical track prediction bridges the gap when an object briefly falls between sensor coverage zones.


21. NVIDIA DRIVE Orin Classical Processing

CPU Architecture

The NVIDIA DRIVE Orin SoC contains 12 Arm Cortex-A78AE CPU cores. While the GPU handles ML inference workloads, these CPU cores are responsible for significant classical processing:

CPU WorkloadClassical Method
Tracker core loopKalman/UKF prediction and update at 20 Hz
Data associationHungarian algorithm / JPDA
Road model estimationPolynomial / clothoid fitting
Coordinate transformationsRotation matrices, frame conversions
Safety checksRule-based monitoring at 10 Hz
Ego-motion estimationIMU integration, GPS fusion
Map matchingLane-to-map correlation
Trajectory planningOptimization, constraint checking
CommunicationSensor data routing, telemetry

Programmable Vision Accelerator (PVA)

The Orin includes a dedicated hardware accelerator for classical computer vision -- the PVA (Programmable Vision Accelerator). This is a VLIW SIMD DSP specifically optimized for image processing and computer vision operations:

PVA SpecificationValue
ArchitectureVector SIMD VLIW DSP
Vector Processing Units (VPUs)2 per PVA
VLIW width7-way (2xScalar + 2xVector + 3xMemory simultaneously)
INT8 performance2048 GMACs per PVA instance
FP32 performance32 GMACs per PVA instance
Memory bandwidthUp to 15 GB/s per DMA

Classical CV operations that run on the PVA:

  • Image remapping (undistortion): Correcting lens distortion using calibrated parameters
  • Cropping and resizing: Preparing image regions for ML detectors
  • Color conversion: YUV to RGB conversion for neural network input
  • Layout conversion: Block-linear to pitch-linear format for memory efficiency
  • Image pyramid generation: Multi-scale image representations for detection
  • Sparse optical flow: Feature tracking between consecutive frames
  • Feature detection: Harris/Shi-Tomasi corner detection
  • Feature tracking: KLT (Kanade-Lucas-Tomasi) tracker for visual odometry
  • Median filtering: Outlier rejection in post-processing
  • Non-maximum suppression (NMS): Post-processing for detection outputs

The PVA runs concurrently with the GPU and CPU, offloading classical CV workloads to free GPU resources for ML inference. In production deployments (e.g., NIO's PVA usage on Orin), transferring classical operations to PVA achieved a 10% reduction in overall GPU resource usage.

Deep Learning Accelerator (DLA)

Orin also includes two DLA engines for efficient fixed-function neural network inference. While not classical processing, the DLA frees the GPU for other workloads, indirectly increasing the compute budget available for classical processing on the CPU and PVA.


22. Ambarella CV3-AD685 Classical Processing

General Vector Processor (GVP)

The Ambarella CV3-AD685 includes a dedicated classical processing engine within its CVflow architecture -- the General Vector Processor (GVP). This is specifically designed to offload classical computer vision and radar processing:

GVP CapabilityPurpose
Classical computer visionTraditional CV algorithms that don't require neural networks
HD radar processingSignal processing for 4D radar data
Floating-point algorithmsAlgorithms too computationally expensive for CPUs
Sensor fusionClassical fusion algorithms combining camera, radar, and LiDAR data

The GVP provides:

  • New floating-point general vector processor: Purpose-built for algorithms with intensive floating-point computation
  • Optimization for HD radar: Specific enhancements for processing high-definition radar signals, including FFT computation, CFAR detection, and beamforming
  • CPU offloading: Floating-point-intensive algorithms that would otherwise overload the Arm Cortex CPUs are offloaded to the GVP

Dense Stereo and Optical Flow Engine

The CV3-AD685 includes a dedicated hardware engine for two foundational classical computer vision operations:

Dense stereo: Computes per-pixel depth from stereo camera pairs. This is a classical correspondence matching problem where each pixel in the left image is matched to its corresponding pixel in the right image, and the disparity (pixel difference) is converted to depth using the known stereo baseline:

depth = (focal_length * baseline) / disparity

Dense optical flow: Computes per-pixel motion vectors between consecutive frames. This classical operation estimates how each pixel has moved from one frame to the next, providing:

  • Ego-motion estimation (if the scene is stationary)
  • Moving object detection (pixels whose flow is inconsistent with ego-motion)
  • Velocity estimation for tracked objects (visual velocity cues)

These hardware engines compute stereo and optical flow at frame rate with minimal CPU/GPU involvement.

ISP Pipeline (On-Chip)

The CV3-AD685's on-chip ISP is an extensive classical processing pipeline:

  • Handles all 12 camera streams simultaneously
  • Premium image quality processing described as suitable for "rain, fog, and darkness"
  • HDR processing for the extreme dynamic range of highway driving (tunnels, sun glare, night)
  • Low-light enhancement for nighttime driving
  • Real-time exposure control and auto-white-balance

Workload Split Between Ambarella and NVIDIA

The two compute platforms divide the classical workload:

WorkloadPrimary ProcessorRationale
Camera ISPAmbarella CV3-AD685On-chip ISP optimized for camera processing
Camera classical CVAmbarella GVPPurpose-built classical CV accelerator
Dense stereo / optical flowAmbarella hardware engineDedicated hardware for these operations
Radar signal processingAmbarella GVPOptimized for HD radar algorithms
Camera ML inferenceAmbarella CVflow NVP20x faster than CV2 for neural networks
LiDAR point cloud processingNVIDIA Orin GPUGPU-parallel point cloud algorithms
Tracker / fusionNVIDIA Orin CPUComplex sequential logic
Planning / predictionNVIDIA Orin CPU + GPUMixed classical and ML workload
Classical image preprocessingNVIDIA Orin PVAUndistortion, cropping, color conversion
Safety monitoringNXP ACE (independent)ASIL-D certified classical logic

23. NXP ACE Safety Computer

Hardware Architecture

The Kodiak Actuation Control Engine (ACE) is a custom-designed safety computer that operates independently from the main autonomy system. It integrates NXP automotive processors certified to the highest ISO 26262 safety integrity level (ASIL-D):

ComponentRoleSafety Function
NXP S32G3Vehicle network processorSafe actuation of braking, steering, and throttle
NXP S32K3Safety co-processorPower distribution, battery charging, safety HMI
NXP VR5510Multi-channel PMICPower generation with functional safety voltage monitoring
NXP PF53Core supply regulatorPower supply for S32G3 core

ASIL-D Compliance

ASIL-D (Automotive Safety Integrity Level D) is the highest safety classification in ISO 26262, corresponding to:

  • Failure rate: Fewer than 10 failures in 1 billion hours of operation
  • Development process: Formal requirements specification, formal design verification, structural code coverage, systematic testing
  • Hardware metrics: Single-point fault metric > 99%, latent fault metric > 90%

This level of certification requires that the ACE's processing is deterministic, verifiable, and simple enough to be formally analyzed. This inherently means classical processing -- deterministic rule-based logic rather than ML inference, which cannot be formally verified to ASIL-D standards.

What the ACE Processes

The ACE performs classical, deterministic safety functions:

  1. Actuation control: Translates trajectory commands into physical actuator signals for:

    • Triple-redundant pneumatic braking (three independent actuators)
    • Dual-redundant steering (two ZF steering actuators)
    • Throttle control
  2. Sensor health monitoring: Independent sensor data access allows the ACE to verify that sensors are functioning

  3. Communication watchdog: The ACE monitors the communication link with the main Kodiak Driver compute. If communication is lost (hardware failure, software crash, cable disconnect), the ACE automatically initiates fallback

  4. Actuation integrity: The ACE verifies that commanded actuations (brake pressure, steering angle) match actual actuator responses. Discrepancies indicate actuator failure.

  5. Power management: The S32K3 co-processors monitor power distribution and battery charging, ensuring electrical system integrity

Why the ACE Must Be Classical

The ACE cannot use ML-based processing because:

  • ML models are not deterministically verifiable (required for ASIL-D)
  • ML models can produce unexpected outputs on out-of-distribution inputs
  • Formal verification of neural networks is an unsolved problem at the scale needed for ASIL-D
  • The safety case requires provable worst-case behavior, which classical rule-based logic provides

The ACE's processing is entirely rule-based: IF (communication_lost) THEN (execute_fallback). IF (brake_pressure < commanded_pressure) THEN (flag_actuator_failure). This deterministic logic can be formally verified and exhaustively tested.

Dual ACE Redundancy

Gen 4+ Kodiak trucks include two ACE units. If one ACE fails, the second independently maintains vehicle control. This dual-redundancy is classical fault-tolerance architecture: the probability of both ACEs failing simultaneously is the product of their individual failure probabilities, achieving extremely low system-level failure rates.


24. Off-Road Perception

Military Terrain Classification

Kodiak's military perception system (deployed on the Ford F-150 prototype and Textron RIPSAW M3) must handle terrain that has no road markings, no lane structure, and no standardized appearance. The perception challenge shifts from "detect objects on a road" to "determine where is drivable":

Classical Terrain Analysis Methods:

MethodSensorWhat It Determines
Geometry-based traversabilityLiDARSlope angle, step height, roughness, gap width
Appearance-based classificationCameraTerrain type (dirt, gravel, mud, rock, vegetation, water)
Proprioceptive feedbackIMUActual vibration and traction (validates predicted traversability)
Ground surface estimationLiDAR + IMUDrivable surface geometry, obstacles above ground

Slope and Roughness Analysis: The LiDAR point cloud is divided into a 2D grid. For each grid cell, classical statistics are computed:

  • Mean height (ground elevation)
  • Height variance (roughness)
  • Maximum slope to neighboring cells (traversability constraint)
  • Step height at cell boundaries (obstacle detection)

A grid cell is classified as "traversable" if slope < max_slope, roughness < max_roughness, and step_height < max_step.

Negative Obstacle Detection: Unlike highway driving where all obstacles are above the road surface, off-road terrain includes negative obstacles -- ditches, gullies, drop-offs. These are detected by classical analysis of the LiDAR height grid:

  • A cell significantly below its neighbors indicates a depression
  • The depth and width of the depression determine if it is traversable

Dust Handling

Kodiak specifically hardened its system against dust during off-road testing:

  • The Hesai IPE filters dust returns at the sensor level (>99.9% environmental noise rejection)
  • Multi-return processing sees through dust to solid surfaces behind
  • The ZF 4D radar operates through dust with minimal attenuation (4mm wavelength >> dust particle size)
  • The SensorPod self-cleaning system (compressed air, water) clears lens contamination
  • DefensePod hardening provides additional environmental protection

Off-Road Path Planning Without Lane Structure

Without lane markings or road structure, the system navigates using:

  • Waypoint following (GPS-based when available)
  • Terrain-relative navigation (matching perceived terrain to known maps)
  • Lead vehicle following (following the path of a manned vehicle ahead in convoy operations)
  • Obstacle avoidance on traversability costmaps

25. Threat Detection

Classical Obstacle Detection for Military

Military threat detection extends beyond standard autonomous driving obstacles to include threats specific to combat and reconnaissance operations:

Geometric Anomaly Detection: Classical point cloud analysis identifies objects that do not match the expected terrain surface:

  • Objects protruding above the ground surface (rocks, logs, debris, potential IEDs)
  • Disturbed earth (changes in ground surface geometry that may indicate buried objects)
  • Wire-like structures (detected as thin, linear point clusters)

Multi-Sensor Anomaly Fusion: The classical approach fuses geometric anomalies from LiDAR with:

  • Radar cross-section analysis (metallic objects have characteristically high RCS)
  • Camera texture analysis (disturbed earth has different visual appearance from undisturbed terrain)
  • Thermal anomalies (from thermal cameras, if integrated in DefensePods)

Classical Change Detection: Comparing the current terrain to a prior reference (from a previous pass or a pre-mission map):

  • Grid-based subtraction of current elevation model from reference elevation model
  • Changes exceeding a threshold indicate new objects or terrain disturbance
  • This is a classical remote sensing technique adapted for ground-vehicle perspective

Limitations

Kodiak has not publicly disclosed specific IED detection algorithms. The military perception system uses the same Kodiak Driver software as the commercial trucking system, adapted for off-road environments. Specific threat detection capabilities are likely classified or proprietary.


26. GPS-Denied Navigation

Classical Navigation Without GPS

Kodiak's perception-over-priors philosophy inherently supports GPS-denied operation because the system is designed to navigate using real-time perception rather than relying on GPS-based localization against HD maps. The classical techniques for GPS-denied navigation include:

Inertial Navigation System (INS): The truck's IMU (accelerometers + gyroscopes) provides dead-reckoning navigation by integrating acceleration and angular velocity:

position(t+dt) = position(t) + velocity(t)*dt + 0.5*acceleration(t)*dt^2
velocity(t+dt) = velocity(t) + acceleration(t)*dt
heading(t+dt) = heading(t) + yaw_rate(t)*dt

INS drift is the fundamental problem: integration of noisy accelerometer data causes position error to grow quadratically with time. A tactical-grade IMU can maintain ~1% of distance traveled accuracy, meaning after 1 km, the position error is ~10m.

Visual Odometry (VO): Classical visual odometry estimates ego-motion by tracking visual features between consecutive camera frames:

  1. Detect features in frame N (Harris corners, FAST, ORB)
  2. Track features in frame N+1 (KLT tracker, optical flow)
  3. Compute the essential matrix from feature correspondences
  4. Decompose the essential matrix into rotation and translation
  5. Scale from stereo or from the known ground plane

This provides a drift-reduced ego-motion estimate that can correct INS drift.

LiDAR Odometry (SLAM): Classical scan matching algorithms (ICP -- Iterative Closest Point, or NDT -- Normal Distributions Transform) align consecutive LiDAR scans to estimate ego-motion:

  1. Current scan is aligned to previous scan (or a local map) using iterative optimization
  2. The transformation that best aligns the two scans provides the ego-motion estimate
  3. Loop closure detection (recognizing previously visited locations) can correct accumulated drift

Terrain-Relative Navigation: For military operations, the system can localize against known terrain elevation models:

  • The perceived terrain (from LiDAR) is matched against a stored digital elevation model (DEM)
  • Classical template matching or correlation techniques find the best alignment
  • This provides absolute position without GPS, given an accurate DEM

Sensor Fusion for GPS-Denied Localization

Classical sensor fusion (typically an Extended Kalman Filter) combines:

  • INS (high rate, drifts over time)
  • Visual odometry (moderate rate, drifts slowly)
  • LiDAR odometry (moderate rate, very accurate locally)
  • Wheel odometry (high rate, affected by slip)
  • Terrain matching (intermittent, provides absolute corrections)

The EKF optimally weights each source based on its noise characteristics, producing a fused position estimate that is more accurate than any individual source.


27. Degraded Sensor Operation

Classical Fallback Hierarchy

When sensors are damaged, degraded, or obscured, the perception system must gracefully degrade rather than fail. This is handled by classical fault-tolerance logic:

Sensor Health Classification (determined at 10 Hz by the safety monitoring system):

  • Nominal: Full capability
  • Degraded: Reduced performance (e.g., rain on camera lens, dust on LiDAR)
  • Failed: No usable output

Degradation Response:

ScenarioRemaining SensorsClassical Response
One camera failed11 cameras + 4 LiDAR + 6 radarReduce reliance on failed camera's FOV; other sensors compensate
One LiDAR failed12 cameras + 3 LiDAR + 6 radarOther LiDAR + radar + cameras maintain coverage; possible blind spot
One radar failed12 cameras + 4 LiDAR + 5 radarOther radars cover similar FOV; velocity estimation slightly degraded
One SensorPod failedHalf sensors from other podSignificant degradation; reduced FOV; initiate controlled pullover
Heavy rain/fogCameras degraded, LiDAR degraded, radar functionalRadar becomes primary sensor; increase following distance; reduce speed
Camera-only nightCameras degraded (darkness)LiDAR and radar (active sensors) become primary

Classical Uncertainty Propagation

The Kodiak Vision Tracker's information-theoretic framework handles degraded sensors naturally:

  • A degraded sensor's detections carry higher uncertainty
  • The Tracker automatically down-weights high-uncertainty detections
  • The overall fused estimate remains valid but with increased uncertainty
  • Increased uncertainty propagates to the planner, which responds with more conservative behavior

This is classical Bayesian inference in action -- the mathematics naturally handles varying sensor reliability without requiring explicit sensor-failure rules.

Military-Specific Degradation

In military operations, additional degradation sources include:

  • Battle damage to DefensePods
  • Electromagnetic interference (EMI) affecting radar
  • Laser dazzle or jamming affecting LiDAR
  • Smoke and obscurants degrading all optical sensors

The classical response is the same: the Tracker down-weights unreliable sensors, and the system operates with whatever sensors remain functional. The radar's robustness to visual obscurants (smoke, dust) makes it particularly valuable as a last-resort sensor.


28. 1,000+ Safety Checks at 10 Hz

What Is Monitored

Ten times per second, the Kodiak Driver evaluates the performance of more than 1,000 safety-critical processes and components. These checks are overwhelmingly rule-based (classical logic), not ML-based, because they must be deterministic and verifiable.

The 1,000+ checks span two domains:

Truck Platform Checks (classical threshold monitoring):

ComponentClassical CheckFailure Response
EngineRPM within expected range, temperature nominalReduce speed or initiate pullover
Oil levelsPressure above minimum thresholdInitiate pullover
Tire pressureAll tires within safe range (TPMS)Reduce speed or pullover
Braking systemAir pressure adequate, actuator responseFallback to redundant braking
SteeringActuator response matches commanded angleFallback to redundant steering
Electrical systemVoltage, current within boundsSwitch to backup power
CoolingCoolant temperature within rangeReduce load or pullover

Autonomy System Checks (classical process monitoring):

ComponentClassical CheckFailure Response
Sensor healthEach of 22 sensors producing data at expected rateDown-weight or bypass failed sensor
Sensor data qualitySignal-to-noise ratio, point density, image exposureFlag degraded sensor for Tracker
Software process healthEach process running, not hung, producing outputsRestart process or initiate fallback
Compute healthCPU/GPU temperature, utilization, memoryReduce load or initiate fallback
Communication linksData flowing between compute nodesInitiate fallback if link lost
Detector outputsEach detector producing detections at expected rateFlag missing detector for Tracker
Tracker consistencyWorld model internally consistent, tracks well-formedFlag inconsistency for investigation
Cross-sensor agreementLiDAR, radar, camera detections consistentFlag disagreement, increase uncertainty
Localization healthGPS quality, map correlation confidenceIncrease uncertainty in position
Planning sanityPlanned trajectory feasible, within vehicle dynamics limitsReject infeasible plan, replan

Rule-Based vs. ML-Based Checks

The vast majority of these checks are classical rule-based logic:

IF sensor_frame_rate < minimum_threshold THEN mark_sensor_degraded
IF compute_temperature > thermal_limit THEN reduce_compute_load
IF communication_latency > max_latency THEN initiate_fallback
IF brake_pressure < minimum_safe THEN activate_redundant_brake

This rule-based approach is chosen because:

  1. Rules are deterministically verifiable (required for ASIL-D safety case)
  2. Rules have guaranteed worst-case execution time (real-time constraint)
  3. Rules are transparent and auditable (regulatory requirement)
  4. Rules do not produce unexpected outputs (unlike ML models)

10 Hz Execution Rate

The 10 Hz rate (100ms period) is chosen to balance:

  • Responsiveness: Detecting a failure within 100ms allows corrective action before the situation becomes dangerous
  • Computational overhead: Running 1,000+ checks at higher frequency would consume excessive compute
  • Physical dynamics: Most mechanical failures (tire blowout, brake loss, engine failure) develop over timescales > 100ms

At 65 mph, the truck moves ~2.9m per 100ms cycle. A brake failure detected within 100ms allows intervention before the truck has traveled an additional 2.9m -- well within the control system's response margin.


29. ACE Fallback System

Dual-Path Planning Architecture

The Kodiak Driver always plans two paths simultaneously:

  1. Nominal trajectory: The planned route to the destination (normal driving)
  2. Fallback trajectory: A safe pullover path ready for immediate execution

Both trajectories are computed at every planning cycle (multiple times per second). The fallback trajectory is always available and always valid -- if a fault is detected, the system can immediately switch from executing the nominal trajectory to executing the fallback trajectory.

Fallback Execution

When the ACE determines that a fallback is necessary (based on the 1,000+ safety checks), the execution follows a classical control sequence:

  1. Fault detection: A safety check fails (sensor, compute, communication, or vehicle component)
  2. Trajectory switch: The controller transitions from executing the nominal trajectory to executing the fallback trajectory
  3. Controlled deceleration: The truck decelerates smoothly (not emergency braking unless imminent collision)
  4. Lane change to shoulder: If safe, the truck changes lanes to the right and onto the shoulder
  5. Stop: The truck comes to a controlled stop on the shoulder
  6. Hazard notification: Hazard lights are activated; the remote operations center is notified

ACE Independence

The ACE executes the fallback without any input from the main Kodiak Driver computer. This is critical because the most dangerous failure mode is a complete crash of the main autonomy software. In that scenario:

  • The main compute produces no output
  • The ACE detects the communication loss
  • The ACE independently executes the fallback using its own sensor access and the pre-planned fallback trajectory
  • The ACE controls braking, steering, and throttle through direct actuator connections

This is analogous to a biological brainstem reflex -- fast, automatic, and independent of higher-level processing.

Classical Control Theory

The ACE's trajectory following is a classical control problem:

  • Lateral control: PID or pure-pursuit controller maintains the vehicle on the fallback trajectory
  • Longitudinal control: Speed controller manages deceleration profile
  • Actuator control: Direct control of brake pressure, steering angle, and throttle position

These controllers are deterministic, formally verified, and certified to ASIL-D standards.

Testing and Validation

Kodiak validates fallback performance through:

  • Fault injection testing: Deliberately injecting faults (sensor failures, software crashes, communication drops) during controlled testing
  • Scenario variation: Testing fallback in various conditions (near objects, on curves, in traffic, on straight highways)
  • Simulation: Millions of simulated fallback scenarios via Applied Intuition simulation platform and Kodiak's Breakpoint AI tool

30. Geometric Collision Checking

Time-to-Collision (TTC) Computation

TTC is the most fundamental classical collision metric, computing the time until contact between the ego-vehicle and another object under the assumption of constant velocities:

TTC = relative_distance / relative_velocity

For a Kodiak Class 8 truck at highway speed:

ScenarioRelative DistanceRelative VelocityTTCRequired Action
Approaching stopped vehicle400m29 m/s (65 mph)13.8sBegin planning evasive maneuver
Approaching stopped vehicle200m29 m/s6.9sExecute lane change or begin braking
Approaching stopped vehicle100m29 m/s3.4sEmergency braking
Cut-in at 30m30m10 m/s3.0sEmergency braking
Lead vehicle hard braking100m15 m/s (closing)6.7sMatch braking

The radar provides the most accurate TTC computation because it measures relative velocity directly via Doppler, while LiDAR and camera must differentiate range over time to estimate velocity.

Truck-Specific Geometric Considerations

Collision checking for a Class 8 truck is more complex than for a passenger car because of the truck's dimensions and articulation:

DimensionValueImpact on Collision Checking
Cab length~6mFront collision zone
Trailer length~16m (53-foot)Total vehicle length ~22m
Width~2.6mTight lane clearance
Height~4.1mOverhead clearance (bridges, signs)
Turning radiusLargeWide swept path in curves and lane changes
ArticulationCab-trailer jointTrailer tracks differently than cab in turns

Swept Volume Analysis

When the truck changes lanes or follows curved roads, it sweeps through a volume of space that is larger than the static vehicle footprint. Classical swept volume computation:

  1. Kinematic model: The truck cab follows the planned trajectory. The trailer follows the cab's path with an offset due to the articulation joint (tractor-trailer kinematic model).

  2. Volume computation: At each time step along the trajectory, the vehicle's footprint (2D or 3D bounding box) is placed at the predicted position and orientation. The union of all footprints defines the swept volume.

  3. Collision check: The swept volume is checked against the predicted positions of all tracked objects. If any object's predicted volume intersects the truck's swept volume, a collision risk is flagged.

  4. Clearance margins: Safety margins are added to account for:

    • Position uncertainty of tracked objects
    • Ego-vehicle trajectory tracking error
    • Wind effects on the trailer
    • Object motion during the maneuver

Braking Distance Computation

Classical physics determines the truck's stopping distance:

stopping_distance = (v^2) / (2 * mu * g) + v * reaction_time

where:

  • v = speed (m/s)
  • mu = tire-road friction coefficient (0.7-0.9 dry, 0.3-0.5 wet, 0.1-0.3 ice)
  • g = gravitational acceleration (9.81 m/s^2)
  • reaction_time = system response time (~0.1-0.3s for autonomous)

For a loaded 80,000-pound truck at 65 mph on dry pavement:

  • Braking deceleration: ~5 m/s^2 (loaded truck)
  • Stopping distance: ~85m (280 feet) from braking initiation
  • Plus reaction distance: ~9m (30 feet)
  • Total: ~94m (310 feet)

At 65 mph on wet pavement:

  • Braking deceleration: ~3 m/s^2
  • Stopping distance: ~140m (460 feet)
  • Plus reaction distance: ~9m
  • Total: ~149m (490 feet)

These classical calculations directly inform the following distance the planner maintains and the trigger distance for emergency braking.

Overhead Clearance Checking

The 4D radar's elevation measurement enables classical overhead clearance checking:

  1. The radar detects an object ahead (bridge, sign gantry, overpass)
  2. The elevation measurement places the object above the road surface
  3. Classical geometry: if the object's elevation > truck height + safety margin, it is cleared
  4. If the clearance is insufficient, the system alerts or stops

This resolves a class of false-positive problems that plague 3D radar: overhead structures that appear as road-level obstacles because 3D radar cannot measure elevation.


Summary: Classical vs. ML Workload Distribution

The Kodiak perception stack is a deeply hybrid system where classical methods and ML methods are tightly integrated. The following table summarizes the division:

Processing LayerClassical DominanceML DominanceHybrid
Sensor signal processing (LiDAR, radar, camera ISP)Entirely classical
Calibration (factory and online)Entirely classical
Ground plane estimationPrimarily classical
Lane detectionHybrid (classical preprocessing + ML segmentation + classical fitting)
Object detectionPrimarily ML
Object tracking (Kodiak Vision Tracker)Primarily classical (information-theoretic Bayesian estimation)
Data associationEntirely classical
Track predictionPrimarily classical (Kalman prediction)
Multi-sensor fusionPrimarily classical (Bayesian fusion)
Road model estimationPrimarily classical
Construction zone geometryHybrid (ML detection + classical clustering/fitting)
Safety monitoring (1,000+ checks)Entirely classical (rule-based)
ACE fallbackEntirely classical (ASIL-D certified)
Collision checkingEntirely classical (geometry + kinematics)
VLM scene understandingEntirely ML
Sparse map generationHybrid (ML lane detection + classical aggregation)
GPS-denied navigationPrimarily classical (INS, VO, SLAM)
Off-road traversabilityHybrid (classical geometry + ML classification)

The safety-critical components (ACE, safety monitoring, collision checking, calibration) are entirely classical because they must be deterministically verifiable. The perception components (detection, tracking, fusion) use classical mathematical frameworks (information theory, Bayesian estimation) as the backbone, with ML models providing inputs that the classical framework integrates and validates. The most ML-heavy components (VLMs, deep learning detectors) augment rather than replace the classical foundation.

This architecture -- classical foundations augmented by ML -- is deliberate. As Kodiak has stated, the system is built on "fundamental mathematics instead of clever software tricks," allowing it to incorporate advances in deep learning "without rewriting our codebase."


Sources

Kodiak AI Official

Kodiak AI Medium Blog

Sensor Vendor Sources

Compute Platform Sources

Academic and Technical References

Industry Coverage

Patents

Public research notes collected from public sources.