Kodiak AI -- Non-ML and Hybrid-ML Perception Techniques
Exhaustive Technical Analysis of Classical Computer Vision, Signal Processing, and Hybrid Methods in the Kodiak Driver Perception Stack
March 2026
Table of Contents
Sensor Signal Processing
- Luminar Iris LiDAR Signal Processing
- Hesai 360-Degree LiDAR Signal Processing
- ZF 4D Imaging Radar Signal Processing
- Camera ISP Pipeline
Calibration
Lightweight Map Classical Processing
Highway-Specific Classical Methods
State Estimation and Tracking
Dual Compute Classical Workload
- NVIDIA DRIVE Orin Classical Processing
- Ambarella CV3-AD685 Classical Processing
- NXP ACE Safety Computer
Military/Defense Perception
Safety Classical Components
1. Luminar Iris LiDAR Signal Processing
The 1550nm InGaAs Detector Signal Chain
The Luminar Iris employs a 1550nm fiber laser source paired with an Indium Gallium Arsenide (InGaAs) photodetector -- a compound semiconductor material with a bandgap specifically tuned to absorb photons in the short-wave infrared (SWIR) range around 1550nm. The signal chain from laser emission to point cloud output is entirely classical signal processing:
Laser Emission: Two fiber laser tips emit at 1550nm. The 1550nm wavelength is eye-safe at 1,000,000x the pulse energy of 905nm systems, because the human cornea absorbs 1550nm light before it reaches the retina. This allows dramatically higher laser power, which directly determines maximum detection range.
Scanning: A 2-axis scanning mirror system steers the laser beam across a 120-degree horizontal by up to 28-degree vertical field of view. Critically, only the laser beam is scanned -- not the entire device -- allowing software-configurable scan patterns including both uniform and Gaussian density distributions. The scan pattern can be dynamically adjusted: denser at the horizon (where most objects of interest exist) and sparser at the top and bottom of the FOV.
Receiver and ASIC Processing: The returned photons strike the InGaAs photodetector, which converts them to electrical current via the photoelectric effect. This analog signal feeds into Luminar's custom-designed mixed-signal ASIC -- described by Luminar as "the most sensitive, highest dynamic range InGaAs receiver of its kind in the world." The ASIC performs:
| Processing Stage | Classical Method | Purpose |
|---|---|---|
| Transimpedance amplification | Analog current-to-voltage conversion | Convert photodetector current to measurable voltage |
| Analog filtering | Bandpass filtering | Reject out-of-band noise |
| Time-of-flight measurement | Threshold detection / waveform analysis | Determine round-trip time to compute range |
| Multi-return detection | Peak detection on return waveform | Identify multiple reflecting surfaces per pulse |
| Reflectivity extraction | Amplitude measurement | Measure returned energy to compute per-point reflectivity |
| Noise rejection | Signal-to-noise thresholding | Reject returns below detection threshold |
The ASIC acquires and processes gigabits of data per second, converting raw analog returns into digital point measurements. Luminar specifically chose a custom mixed-signal ASIC over off-the-shelf ADC chips for "better performance for significantly less cost."
Multi-Return Processing for Semi-Transparent Objects
Multi-return capability is a purely classical signal processing technique where the receiver analyzes the full return waveform from a single laser pulse rather than just the first or strongest peak. When a pulse hits a semi-transparent object (rain, fog, dust, vegetation, glass, chain-link fencing), part of the energy reflects back while the remaining energy continues to a solid surface behind it. The ASIC detects multiple peaks in the return waveform:
- First return: The nearest reflecting surface (e.g., raindrops, fog particles, tree canopy)
- Strongest return: The surface reflecting the most energy (typically the primary target)
- Last return: The most distant reflecting surface (e.g., the ground behind vegetation)
This is critical for highway trucking perception because:
- Rain and fog produce many first-returns from water droplets; multi-return processing lets the system "see through" precipitation to the solid objects behind
- Semi-transparent objects like chain-link fences, highway guardrails with gaps, and construction netting are resolved as partial occluders rather than solid walls
- Dust clouds in the Permian Basin oilfield operations scatter the laser; multi-return processing recovers the target behind the dust
Per-Point Reflectivity Processing
Beyond (x, y, z) position, each point carries a reflectivity value -- a measurement of how much laser energy was reflected back to the detector. This is classical radiometric processing: the ASIC measures the amplitude of the return pulse relative to the transmitted pulse energy, accounting for the inverse-square range attenuation. The result is a camera-like reflectance image with >300 points per square degree resolution.
Reflectivity provides texture-like information that aids classical and ML-based classification:
- Retroreflective road signs and lane markings appear very bright
- Dark asphalt and dark-colored vehicles appear dim
- Rubber tire treads on the road have different reflectivity than asphalt
- The reflectivity contrast between lane markings and road surface enables classical lane detection from LiDAR data alone
600m Range Processing
At maximum range (600m), the number of returned photons drops dramatically due to inverse-square law attenuation. The ASIC must perform extremely sensitive signal detection at the single-photon level. Classical techniques applied at long range include:
- Matched filtering: Correlating the return waveform with the known transmitted pulse shape to maximize SNR
- Temporal averaging: Accumulating returns from multiple pulses at the same scan angle to build up signal
- Adaptive thresholding: Lowering detection thresholds at long range (accepting higher false alarm rate) and relying on temporal consistency (multiple scans confirming the same return) to validate detections
At 600m, a sedan may produce only a handful of LiDAR points. The signal processing chain must reliably distinguish these sparse, weak returns from noise floor fluctuations.
2. Hesai 360-Degree LiDAR Signal Processing
Mechanical Spinning LiDAR Signal Chain
The Hesai LiDARs (likely OT128 or similar 128-channel automotive-grade units) on the Kodiak trucks use a fundamentally different architecture from the Luminar Iris. These are spinning mechanical LiDARs: the entire optical assembly rotates 360 degrees to provide surround coverage.
Signal Processing Pipeline:
Laser firing sequence: 128 laser diodes fire in a precisely timed sequence as the optical assembly rotates. Each channel has a fixed vertical angle offset, collectively spanning 40 degrees vertically. The firing timing is synchronized to the rotation encoder, ensuring consistent angular sampling.
Time-of-flight measurement: Each channel has a dedicated receiver that measures the round-trip time of the laser pulse. At 905nm (Hesai's standard wavelength), silicon APD (Avalanche Photodiode) detectors convert photons to electrical signal.
Dual-return processing: The Hesai OT128 supports dual-return mode, effectively doubling point cloud density from 3.45 million points/second (single return) to 6.91 million points/second (dual return). This is the same classical multi-return waveform analysis as described for Luminar, detecting multiple peaks in each return pulse.
Point cloud assembly: Raw timing and angle measurements are converted to Cartesian (x, y, z) coordinates using the calibrated angular offsets and measured range. This is classical coordinate transformation: each point's position is computed from (range, azimuth_angle, elevation_angle) in the sensor frame.
Hesai Intelligent Point Cloud Engine (IPE)
Hesai's proprietary IPE is a significant classical signal processing subsystem embedded in the LiDAR sensor itself. The IPE operates at the firmware level, processing 24.6 billion samples per second, and performs:
| IPE Function | Classical Method | Performance |
|---|---|---|
| Rain detection | Per-return waveform classification | Real-time pixel-level flagging |
| Fog detection | Return pattern analysis (diffuse vs. specular) | Pixel-level classification |
| Exhaust fume detection | Multi-return temporal analysis | Real-time marking |
| Water splash detection | Spatial clustering of transient returns | Per-frame flagging |
| Environmental noise filtering | Statistical outlier rejection | >99.9% noise rejection in adverse weather |
The IPE decodes laser return waveforms with nanosecond-level precision to distinguish environmental noise (rain, fog, dust) from actual solid objects. This is critical for Kodiak's trucking operations because:
- Following another truck generates constant road spray in wet conditions
- Permian Basin dust is persistent and dense
- The IPE filters these before the data reaches Kodiak's software stack, reducing computational load
The noise filtering is entirely classical: statistical analysis of return waveform characteristics (pulse width, amplitude, temporal profile) to classify each return as "solid object" vs. "environmental noise." Fog returns have characteristically broader, weaker waveforms than solid object returns. Rain returns appear as randomly distributed, transient single-frame detections that do not persist across consecutive scans.
Environmental Validation
The Hesai OT128 undergoes classical reliability testing:
- UV aging
- IP6K7 and IP6K9K dust/water ingress ratings
- Mechanical shock and vibration (critical for truck-mounted operation)
- Thermal cycling with water splash
- Salt spray exposure
- Low-temperature cold start
These tests validate that the classical signal processing chain remains calibrated and functional across the full operating envelope of highway trucking.
3. ZF 4D Imaging Radar Signal Processing
FMCW Signal Generation and Reception
The ZF Full Range 4D Radar operates at 77 GHz using Frequency Modulated Continuous Wave (FMCW) modulation. The entire radar signal processing chain from RF emission to point cloud output is classical signal processing -- no ML is involved until the radar detections enter Kodiak's perception software.
Chirp Generation: The radar transmits a "chirp" -- a signal whose frequency increases linearly over time (a ramp). Each chirp sweeps across a bandwidth of several GHz (typically 1-4 GHz for 77 GHz automotive radar), with the sweep rate and bandwidth determining range resolution:
Range resolution = c / (2 * bandwidth)For a 4 GHz bandwidth: range resolution = 3.75 cm.
Beat Signal Processing: When the transmitted chirp reflects off an object and returns, it is mixed with the currently transmitting chirp. Because the transmit frequency has changed during the round-trip time, the mixed signal produces a "beat frequency" that is directly proportional to the object's range:
f_beat = (2 * range * sweep_rate) / cThis mixing is an entirely analog, classical operation performed in the radar's RF front-end.
MIMO Virtual Array Processing
The ZF radar achieves its 192-channel count through cascaded MIMO (Multiple-Input Multiple-Output) operation using 4 monolithic microwave integrated circuits (MMICs), each containing multiple transmit (TX) and receive (RX) channels:
| Parameter | Value |
|---|---|
| MMICs | 4 cascaded |
| TX channels per MMIC | 3 |
| RX channels per MMIC | 4 |
| Physical channels | 12 TX + 16 RX = 28 |
| Virtual channels | 12 x 16 = 192 |
| Resolution improvement | 16x over conventional 12-channel radar |
Virtual Array Formation: MIMO radar transmits orthogonal waveforms from each TX antenna (using time-division, frequency-division, or Doppler-division multiplexing) and receives on all RX antennas simultaneously. Each TX-RX pair creates a "virtual" antenna element. With 12 TX and 16 RX antennas, the virtual array has 192 elements arranged in a 2D grid that provides angular resolution in both azimuth and elevation.
Cascaded MMIC Synchronization: The master MMIC generates the local oscillator (LO) signal and distributes it to the three slave MMICs via a daisy-chain approach. This ensures phase coherence across all 192 virtual channels, which is essential for accurate beamforming. Any phase mismatch between MMICs would distort the angular measurements.
Digital Beamforming
After the beat signals from all 192 virtual channels are digitized, classical digital beamforming extracts angular information:
Range FFT: A Fast Fourier Transform converts each channel's time-domain beat signal into the frequency domain, where each frequency bin corresponds to a specific range. This produces a range profile for each virtual channel.
Doppler FFT: A second FFT across consecutive chirps within a frame extracts velocity (Doppler) information. The phase change between consecutive chirps from the same target encodes the target's radial velocity:
v_radial = (lambda * f_doppler) / 2At 77 GHz (lambda = 3.9mm), even small velocity differences produce measurable Doppler shifts.
Range-Doppler Map: The 2D FFT produces a Range-Doppler map for each virtual channel -- a 2D matrix where each cell contains the complex amplitude of any target at that (range, velocity) combination.
Angle FFT / DOA Estimation: A third FFT (or more sophisticated Direction-of-Arrival algorithm) across the virtual array elements extracts azimuth and elevation angles. The phase difference between virtual array elements encodes the angle of arrival:
theta = arcsin(lambda * phase_difference / (2 * pi * d))where d is the virtual array element spacing.
For the 2D virtual array, this produces both azimuth and elevation angles -- the fourth dimension that distinguishes 4D imaging radar from conventional 3D radar. More sophisticated super-resolution DOA algorithms (MUSIC, ESPRIT, Capon beamforming) may be used to exceed the diffraction-limited angular resolution of the physical aperture.
CFAR Detection
After beamforming, the Range-Doppler-Angle data cube contains energy at each (range, velocity, azimuth, elevation) cell. The Constant False Alarm Rate (CFAR) detector determines which cells contain actual targets versus noise:
CFAR Algorithm: For each cell under test (CUT), the algorithm estimates the local noise power from surrounding reference cells and sets a detection threshold as a multiple of that noise estimate:
threshold = alpha * noise_estimatewhere alpha is chosen to maintain a desired false alarm probability.
Common CFAR variants used in automotive radar:
| CFAR Variant | Method | Strengths | Weaknesses |
|---|---|---|---|
| CA-CFAR (Cell Averaging) | Averages reference cells | Optimal in uniform noise | Fails near strong targets (masking) |
| OS-CFAR (Ordered Statistic) | Uses kth-ordered reference cell | Robust to interfering targets | Higher computational cost |
| GO-CFAR (Greatest Of) | Takes max of leading/trailing estimates | Good at clutter edges | Conservative (misses weak targets) |
| SO-CFAR (Smallest Of) | Takes min of leading/trailing estimates | Sensitive (finds weak targets) | Higher false alarm rate at edges |
The ZF radar likely uses a hybrid or adaptive CFAR strategy that switches between variants depending on the local environment. In highway scenarios, strong returns from guardrails, bridge abutments, and large trucks can mask weaker returns from smaller vehicles -- a classical radar problem that OS-CFAR partially addresses.
Doppler Velocity Extraction
The Doppler velocity measurement is perhaps the most valuable classical output of the radar for Kodiak's perception stack. Unlike LiDAR and cameras, radar provides instantaneous radial velocity for every detection:
- Velocity accuracy: Typically 0.1-0.3 m/s for automotive radar
- Velocity range: The maximum unambiguous velocity is determined by chirp repetition rate
- Velocity resolution: Determined by the coherent processing interval (number of chirps per frame)
The Doppler velocity is the radial component (toward/away from the radar). For targets moving laterally (crossing traffic, merging vehicles), the radial velocity is a fraction of the true velocity, requiring geometric correction based on the target's angular position.
Stationary vs. Moving Target Discrimination: By comparing the measured Doppler velocity to the ego-vehicle's velocity (from the IMU/GPS), the radar classifies every detection as stationary or moving. Stationary detections (guardrails, signs, bridges) have a Doppler velocity equal to the ego-vehicle's speed projected along the line of sight. This classical ego-motion compensation is performed entirely in the radar's signal processing chain.
4. Camera ISP Pipeline
Image Signal Processing Chain
All 12 cameras on the Gen 6 Kodiak truck route their raw sensor data through the Image Signal Processing (ISP) pipeline on the Ambarella CV3-AD685 SoC. The ISP is a cascade of classical image processing algorithms that convert the raw Bayer-pattern sensor readout into a clean, color-calibrated image suitable for both ML inference and classical computer vision:
| ISP Stage | Classical Algorithm | Purpose |
|---|---|---|
| Black level subtraction | Per-pixel offset correction | Remove sensor dark current |
| Defective pixel correction | Neighbor interpolation | Replace known bad pixels |
| Lens shading correction | Radial gain map | Compensate for lens vignetting |
| Demosaicing | Bayer interpolation (edge-directed) | Convert Bayer RGGB to full RGB |
| White balance | Gray-world or illuminant estimation | Correct for lighting color temperature |
| Color correction | 3x3 matrix multiplication | Map sensor color space to sRGB |
| Noise reduction | Spatial and temporal NLM filtering | Reduce sensor noise |
| HDR tone mapping | Local/global tone mapping operators | Compress dynamic range for display/inference |
| Gamma correction | Power-law transfer function | Perceptual linearization |
| Sharpening | Unsharp mask or edge enhancement | Recover detail lost in noise reduction |
| Distortion correction | Polynomial radial/tangential model | Remove lens geometric distortion |
HDR Processing for Highway Conditions
Highway driving presents extreme dynamic range challenges that are handled entirely by classical ISP processing:
Tunnel transitions: When a truck enters a tunnel on a bright day, the scene dynamic range can exceed 120 dB -- the bright highway outside the tunnel entrance versus the dim tunnel interior. Standard cameras capture only ~60-70 dB. The Ambarella ISP addresses this through:
- Multi-exposure fusion: The camera sensor captures multiple exposures per frame (short, medium, long). The ISP fuses these into a single HDR image where both bright and dark regions retain detail.
- Local tone mapping: Rather than applying a global brightness curve, the ISP adjusts brightness locally -- keeping the bright sky properly exposed while boosting the dark tunnel interior. This is a classical operation using bilateral filtering or similar edge-preserving smoothing to compute local brightness estimates.
- Temporal smoothing: The ISP applies temporal filtering to avoid brightness flickering during rapid transitions, ensuring smooth exposure adaptation.
Sun glare: Direct sunlight at low angles (common on east-west Texas highways at sunrise/sunset) produces intense saturation in portions of the image. The ISP's HDR pipeline prevents the entire image from being overwhelmed by:
- Short-exposure frames capture the bright glare region without saturation
- Long-exposure frames capture the rest of the scene with proper brightness
- The fused HDR image preserves detail everywhere
Night driving: The ISP's enhanced dynamic range processing handles the contrast between bright headlights/taillights (which can saturate the sensor) and the near-zero ambient illumination on the road surface. Classical temporal noise reduction accumulates signal across frames to reduce sensor noise in dark regions.
Classical Camera Processing Operations
Beyond the ISP, several classical computer vision operations run on the camera data before ML inference:
- Image undistortion: Removing lens distortion using calibrated intrinsic parameters (polynomial distortion model)
- Image rectification: Aligning stereo camera pairs for depth estimation
- Image cropping and resizing: Preparing regions of interest for ML detectors
- Color space conversion: Converting from YUV (sensor native) to RGB (network input)
- Image pyramid generation: Creating multi-scale representations for scale-invariant detection
These operations are entirely classical and consume significant compute resources, which is why they are often offloaded to dedicated hardware (the PVA on NVIDIA Orin, or the ISP on Ambarella).
5. SensorPod Calibration Architecture
Patent-Pending Pre-Calibration System
The SensorPod's pre-calibration system is a critical piece of classical calibration engineering. Every SensorPod ships from the factory with all sensor extrinsic parameters (relative positions and orientations) precisely measured and stored. This eliminates the need for field calibration after SensorPod replacement.
Factory Calibration Process (inferred from patent filings and public descriptions):
Intrinsic calibration: Each individual sensor (camera, LiDAR, radar) is calibrated for its internal parameters:
- Cameras: focal length, principal point, distortion coefficients (using checkerboard/ChArUco targets)
- LiDAR: channel-by-channel angular offsets, range bias, timing offsets
- Radar: antenna phase center locations, MMIC-to-MMIC phase offsets
Intra-pod extrinsic calibration: The relative 6-DOF pose (3 translation + 3 rotation) between every sensor pair within the pod is measured:
- Luminar-to-Hesai: Measured using overlapping FOV regions with common targets
- LiDAR-to-camera: Target-based calibration using calibration boards visible to both modalities
- Radar-to-LiDAR: Using corner reflectors visible to both sensors
Pod-to-vehicle extrinsic calibration: Because SensorPods mount to standardized mirror brackets, the pod-to-vehicle transformation is defined by the mechanical mounting interface plus a small per-installation adjustment.
Thermal Stability Design
Long-haul trucking subjects sensors to severe thermal cycling:
- Engine heat radiating upward near the cab
- Desert ambient temperatures exceeding 50 degrees C (120 degrees F)
- Night temperatures dropping below 0 degrees C (32 degrees F) in winter
- Rapid thermal transients when driving between sun and shade
Thermal expansion can shift sensor alignments by millimeters -- enough to cause significant calibration drift. Kodiak addresses this through:
- Material selection: The SensorPod enclosure and internal mounting structures use materials with matched or low coefficients of thermal expansion
- Thermal isolation: Critical optical components are thermally isolated from the external enclosure
- Compensation models: Classical thermal models predict calibration shift as a function of temperature and apply corrections
Vibration Resistance
Highway trucking generates persistent vibration from:
- Engine and drivetrain vibration transmitted through the cab structure
- Road surface irregularities (expansion joints, potholes, rough pavement)
- Aerodynamic buffeting at highway speeds (60-80 mph)
- Trailer coupling dynamics
Kodiak's SensorPod design addresses vibration through:
- Aerospace/military-grade connectors: The main power/data connector is rated for aviation, aerospace, and military applications -- designed to maintain connection integrity under sustained vibration
- Rigid internal mounting: Sensors are rigidly mounted to minimize relative movement between sensors within the pod
- Vibration testing: SensorPods are validated through standardized vibration profiles that simulate worst-case trucking conditions
Three-Bolt Mounting Precision
The mechanical interface between the SensorPod and the truck uses exactly three bolts. Three bolts define a plane and provide a repeatable, over-constrained mounting that ensures the SensorPod returns to the same position and orientation every time it is installed. This mechanical repeatability is what makes pre-calibration viable -- the pod-to-vehicle transformation is stable across installations.
6. LiDAR-Camera Cross-Calibration
Extrinsic Calibration Methods
Cross-calibrating LiDAR and cameras requires determining the 6-DOF rigid transformation (rotation matrix R and translation vector t) between the LiDAR coordinate frame and each camera coordinate frame. For Kodiak's SensorPods, this calibration must be accurate to sub-centimeter translation and sub-0.1-degree rotation to ensure that LiDAR points project correctly onto camera images.
Target-Based Calibration (Factory):
The standard factory approach uses calibration targets visible to both sensors:
Checkerboard / ChArUco boards: The camera detects corner features using classical corner detection (Harris, Shi-Tomasi). The LiDAR detects the board as a planar surface. The transformation that best aligns the LiDAR plane normal and centroid with the camera-detected board pose is computed via:
- Point-to-plane correspondence establishment
- Least-squares optimization of the extrinsic parameters
- Reprojection error minimization
Multiple board poses: The calibration target is placed at many different positions and orientations (typically 20-50 poses) to ensure observability of all 6 DOF parameters and reduce sensitivity to noise.
Nonlinear refinement: After initial estimation (e.g., via PnP algorithm for camera pose, RANSAC plane fitting for LiDAR), a joint nonlinear optimization (Levenberg-Marquardt) refines all parameters simultaneously to minimize the total reprojection error across all poses and all sensor pairs.
Targetless Verification (On-Road):
Once deployed, the system can verify calibration quality without dedicated targets by exploiting natural scene features:
- Edge alignment: Comparing LiDAR depth discontinuities (object edges) with camera image edges. Well-calibrated systems show tight alignment; miscalibration manifests as systematic shifts.
- Mutual information: Computing the statistical dependence between LiDAR reflectivity and camera intensity. Correct calibration maximizes mutual information because the same surface produces correlated reflectivity and intensity values.
- Feature correspondence: Classical feature detectors (SIFT, ORB) on camera images matched with projected LiDAR reflectivity images.
Luminar-to-Camera Calibration
The Luminar Iris produces a particularly high-resolution reflectivity image (>300 points per square degree) that closely resembles a camera image. This makes Luminar-to-camera calibration especially robust because:
- The reflectivity image provides rich texture for feature matching
- Road markings, signs, and vehicle surfaces create strong features in both modalities
- The high angular resolution minimizes quantization effects in point-to-pixel correspondence
Hesai-to-Camera Calibration
The Hesai spinning LiDARs have lower angular resolution than the Luminar Iris but provide 360-degree coverage. Calibrating Hesai to the side-facing and rear-facing cameras requires:
- Coverage overlap between the LiDAR and camera FOVs
- Careful handling of the motion distortion inherent in spinning LiDARs (the LiDAR scans over ~100ms per rotation, during which the truck moves ~3m at highway speed)
- Ego-motion compensation to align the LiDAR points to the camera exposure timestamp
7. Radar Calibration
ZF 4D Radar Alignment
Radar calibration is distinctly different from LiDAR or camera calibration because radar operates on fundamentally different physics (RF reflection rather than optical). The key calibration parameters for the ZF 4D radar are:
Mounting Angle Estimation: The most critical radar calibration parameter. Even a 1-degree azimuth misalignment causes a lateral position error of 5.2m at 300m range. Mounting angle calibration methods include:
Static alignment: Performed in a controlled environment using known radar targets (corner reflectors) at calibrated positions. The radar's measured angles are compared to the known target angles, and the mounting offset is computed.
Dynamic alignment (motion-based): While the vehicle drives, stationary objects (guardrails, signs, buildings) should have a Doppler velocity exactly equal to the ego-vehicle's velocity projected along the line of sight. Any systematic angular offset between the predicted and measured Doppler velocities reveals the mounting angle error. This is a classical least-squares estimation:
mounting_angle = argmin sum((v_doppler_measured - v_ego * cos(theta_measured + mounting_angle))^2)- Track alignment: Using extended driving data, the radar tracks of stationary objects are fit to straight lines (for highway guardrails) or known geometric shapes. The mounting angle that best aligns these tracks with the expected geometry is estimated.
MMIC Phase Calibration: For the cascaded 4-MMIC configuration, the relative phase between MMICs must be calibrated to maintain coherent beamforming. This is typically done at the radar module level during manufacturing using classical phase comparison techniques with a known reference signal.
Range Bias Calibration: A constant offset between measured and true range, caused by signal processing delays, cable lengths, and ASIC timing. Measured against known-distance targets and stored as a scalar correction.
Radar-to-Vehicle Frame Transformation
The radar's (range, azimuth, elevation, Doppler) measurements must be transformed to the vehicle coordinate frame. This requires:
- The radar mounting position (x, y, z) in the vehicle frame
- The radar mounting angles (yaw, pitch, roll) relative to the vehicle frame
- These are measured during SensorPod factory calibration and stored alongside the LiDAR and camera extrinsics
8. Online Calibration
Why Online Calibration Is Necessary
Factory calibration establishes initial sensor alignment, but several factors cause calibration to drift during operation:
| Drift Source | Mechanism | Magnitude |
|---|---|---|
| Thermal cycling | Differential thermal expansion of mounting structures | 0.1-1.0 mm translation, 0.01-0.1 degrees rotation |
| Vibration-induced settling | Fasteners and mounts shift under sustained vibration | Cumulative over thousands of miles |
| Mechanical shock | Potholes, railroad crossings, loading dock bumps | Sudden shifts of 1+ mm possible |
| SensorPod replacement | Even with three-bolt mounting, installation variability exists | Up to several mm if bolts not fully torqued |
For a system that fuses data from 22 sensors across two SensorPods, even sub-millimeter calibration drift can cause:
- LiDAR points projecting to wrong camera pixels (misclassification)
- Radar detections not correlating with LiDAR detections of the same object (data association failure)
- Inconsistent position estimates from different sensors (tracker instability)
Post-Drive Calibration Analysis
Kodiak has described a post-drive calibration analysis system where, after every drive, the system analyzes Tracker performance and identifies systematic biases in individual detectors. The Tracker tells each detector: "During the last run, you were systematically biased by a few centimeters; please adjust your measurements accordingly."
This is classical statistical estimation:
- For each detector, collect the residuals between the detector's measurements and the Tracker's fused estimates (which are more accurate because they combine all sensors)
- Compute the mean residual -- this represents a systematic bias (calibration drift)
- If the mean residual exceeds a threshold, apply a correction to the detector's extrinsic parameters
Real-Time Calibration Monitoring
Kodiak has indicated that future versions will apply calibration corrections in real-time. The classical approaches for online calibration monitoring include:
Reprojection consistency: LiDAR points projected onto camera images should align with detected features (edges, lane markings). The running average of the alignment error serves as a calibration health metric.
Cross-sensor detection consistency: When multiple sensors detect the same object, the position estimates should agree within the sensors' noise characteristics. Systematic disagreement indicates calibration drift.
Ego-motion consistency: Each sensor can independently estimate the vehicle's ego-motion (LiDAR via scan matching, cameras via visual odometry, radar via Doppler). Disagreements between ego-motion estimates from different sensors indicate relative calibration drift.
Temperature-compensated models: Classical thermal expansion models predict calibration shift based on measured temperatures at various points in the SensorPod structure. These predicted shifts are proactively applied as corrections.
9. Sparse Map Generation
What Sparse Maps Contain
Kodiak's sparse maps store only essential highway information, measured in kilobytes per mile:
| Data Layer | Content | Classical Generation Method |
|---|---|---|
| Geometric | Lane boundary locations, road edges | Classical lane detection + SLAM |
| Topological | Road connectivity (which lanes connect where) | Graph extraction from lane geometry |
| Semantic | Speed limits, road surface type, lane attributes | Sign reading + manual annotation |
This contrasts with HD maps that store full 3D geometry, every curb, every sign, and every lane marking -- requiring megabytes per mile and dedicated mapping vehicles.
How Sparse Maps Are Generated
Sparse map generation relies heavily on classical computer vision and geometry:
Lane boundary extraction: As Kodiak trucks drive routes, their cameras detect lane markings using classical and ML-based lane detection. The detected lane boundaries are recorded with their GPS-referenced positions.
Multi-pass aggregation: Because lane detection from a single pass is noisy, multiple passes over the same road are aggregated. Classical statistical methods (weighted averaging, outlier rejection) produce a consensus lane boundary position from multiple observations.
Geometric fitting: The aggregated lane boundary points are fit to parametric curves:
- Straight segments: Least-squares line fitting
- Curved segments: Clothoid (Euler spiral) fitting or polynomial fitting
- Transitions: Smooth interpolation between segments
Topological graph construction: Lane connectivity (which lane connects to which at merges, splits, and exits) is extracted from the geometric data using classical graph algorithms.
Compression: The parametric representation is inherently compact -- a lane boundary described by a sequence of clothoid parameters requires far less storage than a dense point cloud.
Fleet-Sourced Map Updates
Because the maps are just kilobytes per mile, updates can be distributed to the entire fleet over-the-air almost daily. The map update pipeline is:
- Production trucks detect changes (new lane markings, shifted lanes, new construction) through perception
- The changed data is uploaded to the cloud
- Classical map merging algorithms integrate the new observations with the existing map
- Updated maps are pushed to all trucks in the fleet
This turns every Kodiak truck into a mapping vehicle -- no dedicated mapping fleet is needed.
10. Map-Disagreement Handling
Discrepancy Detection
The Kodiak Driver continuously compares what it perceives in real-time with what the sparse map predicts. This comparison is a classical hypothesis testing problem:
Lane boundary comparison: The real-time lane detector reports lane markings at positions (x_perceived, y_perceived). The sparse map predicts lane markings at positions (x_map, y_map). The system computes:
discrepancy = ||(x_perceived, y_perceived) - (x_map, y_map)||If the discrepancy exceeds a threshold (accounting for localization uncertainty and perception noise), a map-perception disagreement is declared.
Lane count comparison: If the map says there are 3 lanes but perception detects 4, or vice versa, this is a structural disagreement that indicates a road change.
Road geometry comparison: If the perceived road curvature differs significantly from the mapped curvature, this may indicate a traffic shift or new road alignment.
Trust Hierarchy
When disagreement is detected, the system follows the "perception over priors" principle with a classical decision logic:
- Perception wins: The system trusts real-time sensor data over stored map data
- Uncertainty increases: The planner is notified that map reliability is reduced, causing more conservative driving behavior
- Temporary map built: An on-the-fly local map is constructed from real-time perception
- Fleet notification: The discrepancy is reported to the cloud for fleet-wide map update
On-the-Fly Map Construction
When the stored map is invalidated (e.g., by a new construction zone), the system builds a temporary map using only real-time perception:
- Lane detection: Camera and LiDAR-based lane detection provides current lane boundaries
- Road edge detection: LiDAR detects curbs, barriers, and road surface boundaries
- Vehicle trajectories: The paths of other vehicles provide evidence of drivable lanes
- Cone/barrel lines: Construction cones and barrels are clustered into lines that define temporary lane boundaries
This temporary map is "good enough for the Kodiak Driver to drive safely until the map matches again."
11. Real-Time Road Model
Classical Road Geometry Estimation
When maps are insufficient or stale, the Kodiak Driver must estimate road geometry in real-time using classical parametric road models:
Polynomial Road Model: The simplest classical model represents each lane boundary as a polynomial function of longitudinal distance:
y(x) = c0 + c1*x + c2*x^2 + c3*x^3where:
- c0 = lateral offset (distance from ego-vehicle to lane boundary)
- c1 = heading angle relative to lane boundary
- c2 = curvature (1/radius)
- c3 = curvature rate (change of curvature with distance)
The coefficients are estimated from detected lane marking points using weighted least-squares fitting, with weights decreasing with distance (farther detections are less reliable).
Clothoid (Euler Spiral) Road Model: A more physically accurate model used in road design, where curvature varies linearly with arc length:
kappa(s) = kappa_0 + kappa_1 * sThis models the fact that real roads transition smoothly between straight sections and curves -- the curvature does not jump instantaneously. Clothoid fitting from lane marking detections uses nonlinear least-squares optimization (Gauss-Newton or Levenberg-Marquardt).
Multi-Lane Road Model: For highway driving, the system estimates a full road model comprising:
- Left and right lane boundaries for the ego lane
- Adjacent lane boundaries (for lane change feasibility)
- Road edges (barriers, curbs, shoulders)
- Road surface normal (for grade estimation on hills)
Ground Plane Estimation
Separating ground points from obstacle points in the LiDAR point cloud is a fundamental classical preprocessing step:
RANSAC-Based Ground Plane Fitting: The Random Sample Consensus (RANSAC) algorithm iteratively fits a plane to the LiDAR points:
- Randomly sample 3 points
- Fit a plane through them
- Count inliers (points within a threshold distance of the plane)
- Repeat; keep the plane with the most inliers
For highway roads, the ground is approximately planar locally. However, grades, crown, and superelevation require multi-plane or curved surface fitting for robust ground segmentation.
Alternatives to RANSAC:
- Region-wise PCA ground fitting: Divides the point cloud into sectors and fits ground planes per sector using Principal Component Analysis. 2x faster than RANSAC.
- Grid-based height thresholding: Projects points to a 2D grid and classifies each cell based on the height distribution of points falling in it.
- Ring-based methods: For spinning LiDARs, processes each laser ring independently, identifying ground points by comparing consecutive returns within each ring.
Ground plane estimation is the first step in the perception pipeline -- it separates the problem into "road surface" (for drivability assessment) and "above-ground objects" (for detection and tracking).
12. Lane Detection
Classical Lane Detection Methods
Lane detection is central to Kodiak's perception-over-priors philosophy because the system localizes primarily using perceived lane markings, like a human driver. The lane detection pipeline combines classical and ML techniques:
Classical Pre-Processing:
- Region of interest selection: Only the road surface region of the image is processed, reducing computation. The ROI is defined by the estimated road plane.
- Perspective transformation (IPM): An Inverse Perspective Mapping transforms the camera image to a top-down (bird's eye view) representation where lane markings appear as parallel lines rather than converging lines. This is a classical projective transformation using the known camera extrinsics and the estimated ground plane.
- Gradient computation: Sobel or Scharr operators detect intensity gradients that indicate lane marking edges.
- Color thresholding: Classical color space conversion (RGB to HSV or HLS) and thresholding to isolate yellow and white lane markings.
Classical Lane Line Fitting: After candidate lane marking pixels are identified (by ML segmentation or classical thresholding), the lane boundaries are fit using:
- Sliding window search: Starting from a histogram peak at the image bottom, a sliding window traces the lane marking upward through the image
- Polynomial fitting: A 2nd or 3rd order polynomial is fit to the lane marking pixels using least-squares
- RANSAC line/curve fitting: Robust fitting that rejects outlier pixels (e.g., from road debris, shadows, tire marks)
Temporal filtering: Lane detections from consecutive frames are filtered using classical state estimation (Kalman filter or exponential moving average) to produce smooth, stable lane boundary estimates. This prevents the lane model from "jumping" due to single-frame detection noise.
LiDAR-Based Lane Detection
The Luminar Iris's reflectivity channel provides an alternative lane detection modality that is independent of cameras:
- Lane markings (paint, thermoplastic, tape) have significantly higher reflectivity than asphalt
- A simple reflectivity threshold on the LiDAR point cloud identifies lane marking points
- These points are fit using the same polynomial or clothoid models as camera-detected markings
- LiDAR-based lane detection works identically day and night (active illumination) and is unaffected by sun glare
Sensor Fusion for Lane Detection
The system fuses camera-detected and LiDAR-detected lane boundaries using classical weighted averaging:
- Camera provides dense, high-resolution lane marking detection at short to medium range
- LiDAR provides lighting-independent lane marking detection with precise range measurement
- Radar guardrail detections provide road edge constraints
- The fused lane model is more robust than any single sensor's estimate
13. Long-Range Processing
The Point Density Problem
LiDAR point density drops with the square of distance. For the Luminar Iris:
| Range | Approximate Point Density | Points on a Sedan | Detection Challenge |
|---|---|---|---|
| 50m | Very high | Hundreds | Easy detection and classification |
| 100m | High | ~100 | Reliable detection |
| 200m | Moderate | ~25 | Classification uncertain |
| 400m | Low | ~6 | Detection marginal, classification difficult |
| 600m | Very low | ~1-3 | Near detection threshold |
At 600m, a sedan subtends less than 0.1 degrees of azimuth -- only a few LiDAR resolution cells. Detection at this range depends on:
Classical Compensation Methods
Temporal accumulation: Over consecutive frames, the system accumulates LiDAR returns from the same spatial region. Even if a single frame produces only 1-2 points on a distant object, 10 frames produce 10-20 points -- enough for reliable detection. This requires precise ego-motion compensation to align points from different frames into a common coordinate system.
Ego-motion compensation: At 65 mph, the truck moves ~2.9m per LiDAR frame (at 10 Hz). Multi-frame accumulation requires compensating for this motion using the ego-vehicle's IMU and GPS data. Classical coordinate transformations (rotation + translation) align each frame's points to a reference frame.
Adaptive detection thresholds: Classical detection theory (Neyman-Pearson) trades off between probability of detection (Pd) and probability of false alarm (Pfa). At long range, the system can lower the detection threshold (accepting more false alarms) and use temporal persistence (requiring the detection to appear in multiple consecutive frames) to maintain high Pd while controlling Pfa.
Sensor fusion for range extension: At 300-600m, LiDAR may have only a few points on an object, but narrow-FOV cameras can resolve the object as many pixels. Classical cross-modal validation correlates sparse LiDAR detections with camera image patches to confirm or reject long-range LiDAR detections.
Radar as long-range anchor: The ZF 4D radar detects moving vehicles at 300+ meters with native Doppler velocity. A moving vehicle at 350m that has not yet been reliably detected by LiDAR may already have a confirmed radar detection with velocity information. This radar detection primes the perception system to look for corresponding LiDAR returns at the predicted position.
Progressive Refinement Strategy
Kodiak's long-range perception follows a progressive refinement strategy that is fundamentally sequential and classical in its logic:
- 600m: First detection (LiDAR); low confidence; "something is there"
- 400m: Increasing LiDAR density; initial classification; camera begins resolving object
- 300m: Radar velocity confirms moving/stopped; 4D radar provides elevation
- 200m: Full multi-sensor coverage; high-confidence classification
- 100m: Near-field; detailed shape, behavior prediction, trajectory planning
This staged approach provides multiple independent opportunities to detect every object before the truck reaches it.
14. Highway Merge Detection
Geometric Detection of Merge Zones
Highway merges are detected through classical geometric analysis:
Lane boundary divergence/convergence: The lane detection system identifies points where:
- A new lane boundary appears (on-ramp merging into the highway)
- An existing lane boundary ends (lane drop)
- Lane boundaries diverge (exit ramp)
These topological changes in the lane structure are detected by tracking the number of lane boundaries over time and identifying transitions. This is a classical pattern recognition problem on the lane model output.
Road geometry analysis: On-ramps have characteristic geometric features:
- A curved entry lane with decreasing radius (acceleration lane)
- A taper where the acceleration lane narrows and merges
- A gore area (the painted triangle separating the ramp from the highway)
The system detects these features from the perceived lane geometry and classifies the road configuration.
Adjacent Vehicle Tracking for Merge
When a merge zone is detected, the perception system intensifies tracking of vehicles in the merge lane:
- Hesai 360-degree LiDAR: Provides surround detection of merging vehicles approaching from the side
- Side-mounted radar (Gen 6): Provides long-range detection and velocity measurement of vehicles on the on-ramp
- Side cameras: Visual detection and classification of merging vehicles
Classical tracking (Kalman filtering) of merging vehicles estimates their future trajectory -- will they accelerate to merge ahead or decelerate to merge behind the truck? This prediction uses the constant velocity or constant acceleration motion model, extrapolating the vehicle's current state forward in time.
Gap Assessment
For the truck to execute its own lane change (to accommodate merging traffic or to exit), the system performs classical gap assessment:
- Track all vehicles in the target lane within a relevant range window
- Extrapolate their positions forward using constant velocity prediction
- Compute the available gap as a function of time
- Compare the gap to the minimum safe gap (function of truck speed, truck length, relative velocities)
This is a classical trajectory analysis problem solved entirely with deterministic geometry and kinematics.
15. Construction Zone Processing
Cone and Barrel Clustering
Construction zones are defined by physical delineators -- cones, barrels (channelizing drums), and barriers. The perception system must detect these individual objects and then group them into meaningful lane boundaries:
Classical Point Cloud Processing:
- Ground removal: RANSAC or grid-based ground plane estimation removes road surface points
- Euclidean clustering: The remaining above-ground points are grouped into clusters based on spatial proximity. Points within a distance threshold (e.g., 0.3m) are assigned to the same cluster. Standard algorithm: DBSCAN or KD-tree-based region growing.
- Cluster classification: Each cluster's geometric properties (height, width, aspect ratio, point count) are compared to known profiles:
- Traffic cone: ~0.7m height, ~0.3m base, conical shape
- Channelizing drum: ~1.0m height, ~0.5m diameter, cylindrical
- Jersey barrier: ~0.8m height, elongated, flat top
- Line fitting: Individual cones/barrels are fit to lines (RANSAC line fitting on cluster centroids) to estimate temporary lane boundaries. The line through cone/barrel centroids defines the new lane edge.
Camera-Based Construction Detection:
- Color-based detection: Orange cones and barrels are distinctive in HSV color space, enabling classical color thresholding as a complementary detection method
- Sign reading: Temporary construction signs (speed limits, lane closures, "ROAD WORK AHEAD") are detected and read
- Worker detection: Construction workers are detected as pedestrians with classification augmented by synthetic training data from Scale AI
Temporary Lane Estimation
Once cones and barrels are detected and grouped into lines, the system estimates temporary lane geometry:
- Cone/barrel lines on opposite sides of traffic define lane boundaries
- The space between adjacent cone lines is classified as a drivable lane
- The width is verified against expected lane width ranges (typically 3.0-3.7m)
- The resulting lane model replaces the stale map data for this road segment
Work Zone Geometry
Construction zones have characteristic geometric configurations that the system recognizes:
| Configuration | Detection Method |
|---|---|
| Lane closure (merge taper) | One cone line converges toward another; lane count decreases |
| Lane shift | Both cone lines shift laterally; lane maintains width but moves |
| Contraflow (crossover) | Lanes cross the median into opposing traffic direction |
| Reduced speed zone | Speed limit signs detected; cone spacing may indicate reduced speed |
| Work zone end | Cone lines terminate; normal lane markings resume |
The system uses classical finite state machine logic to track its state within the construction zone: approaching, entering, traversing, exiting.
Fleet Propagation
When a truck encounters a construction zone that is not in its map:
- The temporary lane model is captured with GPS coordinates and timestamps
- This data is uploaded to the cloud
- Classical map-merging algorithms integrate the construction zone into the fleet-wide map
- The updated map is pushed to all trucks, so subsequent trucks are prepared before they arrive
16. Cut-In Detection
Classical Motion Analysis for Cut-In Prediction
A cut-in occurs when a vehicle from an adjacent lane moves into the ego-vehicle's lane ahead of it. At highway speeds, this is a safety-critical scenario because it suddenly reduces following distance. Early detection of cut-in intent is essential for smooth braking response.
Lateral Velocity Monitoring
The primary classical indicator of an impending cut-in is lateral velocity -- movement toward the ego lane:
- Track lateral position over time: The Tracker maintains a time series of each tracked vehicle's lateral position relative to lane boundaries
- Compute lateral velocity: Numerical differentiation (with smoothing) estimates the rate of lateral position change
- Threshold detection: When lateral velocity exceeds a threshold (indicating movement toward the ego lane), the system flags the vehicle as a potential cut-in
This is a classical signal detection problem: distinguishing intentional lateral movement (cut-in) from normal lane-keeping oscillations (noise). The threshold must be set to balance early detection (low threshold = early warning but more false alarms) against false alarm rate.
Prototype Trajectory Matching
A more sophisticated classical approach uses prototype trajectory matching:
- A database of typical lane change trajectories is built from real driving data (clustered using agglomerative hierarchical clustering)
- The observed vehicle trajectory is compared to each prototype using Dynamic Time Warping (DTW) or similar sequence matching
- The similarity score indicates the probability that the vehicle is executing a lane change
- High similarity to a "lane change left" prototype when the vehicle is to the right of the ego lane indicates an impending cut-in
Time-to-Lane-Crossing (TLC)
A classical geometric metric that estimates when the vehicle will cross the lane boundary:
TLC = lateral_distance_to_lane_boundary / lateral_velocityWhen TLC drops below a critical threshold (e.g., 2 seconds), the system initiates defensive measures (alerting the planner, pre-staging braking).
Radar Velocity Advantage
The ZF 4D radar provides instantaneous radial velocity for the potential cut-in vehicle. If the vehicle is in an adjacent lane and has a velocity vector that points toward the ego lane, the radial velocity component measured by the radar reflects this lateral component. This provides a direct, physics-based measurement of cut-in intent that supplements the camera and LiDAR-based lateral position tracking.
17. Highway-Speed Tracking
State Vector and Motion Models
The Kodiak Vision Tracker maintains a state vector for every tracked object. Based on Kodiak's references to information theory and state estimation, and the standard practice in autonomous vehicle tracking, the state vector likely includes:
x = [px, py, pz, vx, vy, vz, ax, ay, heading, yaw_rate, length, width, height]where:
- (px, py, pz): Position in vehicle frame
- (vx, vy, vz): Velocity components
- (ax, ay): Acceleration components
- heading: Object heading angle
- yaw_rate: Rate of heading change
- (length, width, height): Object dimensions
Motion Models for Highway Tracking
At highway speeds, the choice of motion model significantly affects tracking performance. Classical models used:
Constant Velocity (CV): Assumes the object maintains constant velocity. State prediction:
px(t+dt) = px(t) + vx(t)*dt
py(t+dt) = py(t) + vy(t)*dtSimple and effective for vehicles maintaining speed in a lane. Fails during acceleration, braking, or lane changes.
Constant Acceleration (CA): Adds acceleration to the state:
vx(t+dt) = vx(t) + ax(t)*dt
px(t+dt) = px(t) + vx(t)*dt + 0.5*ax(t)*dt^2Better for braking and accelerating vehicles.
Constant Turn Rate and Velocity (CTRV): Models a vehicle moving at constant speed along a circular arc:
heading(t+dt) = heading(t) + yaw_rate*dt
px(t+dt) = px(t) + (v/yaw_rate) * (sin(heading + yaw_rate*dt) - sin(heading))
py(t+dt) = py(t) + (v/yaw_rate) * (cos(heading) - cos(heading + yaw_rate*dt))Essential for tracking vehicles executing lane changes (which follow curved paths).
Constant Turn Rate and Acceleration (CTRA): Extends CTRV with acceleration. Most complex but most accurate for vehicles simultaneously turning and accelerating (e.g., a vehicle accelerating through a highway curve).
Extended/Unscented Kalman Filter
Because CTRV and CTRA models are nonlinear, the standard Kalman Filter cannot be used directly. The Tracker likely uses:
Extended Kalman Filter (EKF): Linearizes the nonlinear models around the current state estimate using Jacobian matrices. Computationally efficient but may introduce linearization errors for strongly nonlinear dynamics.
Unscented Kalman Filter (UKF): Uses sigma points to propagate the state distribution through the nonlinear model without linearization. More accurate than EKF for nonlinear models, at higher computational cost.
Highway-Speed-Specific Challenges
| Challenge | Classical Solution |
|---|---|
| High ego-velocity | Precise ego-motion compensation using IMU at high rate (100+ Hz) |
| High relative velocity | Large process noise to accommodate rapid range changes |
| Long tracks | Objects tracked for 30+ seconds (600m to passing) require stable track management |
| Sparse long-range detections | Large prediction uncertainty at long range; covariance grows rapidly between updates |
| Lane-change detection | CTRV model captures curved motion; lateral velocity in state vector detects intent |
| 20 Hz update rate | 50ms between updates; at 65 mph relative, objects move ~1.5m between frames |
Multi-Model Tracking
For robust highway tracking, the Tracker may employ an Interacting Multiple Model (IMM) filter that runs several motion models in parallel (CV, CA, CTRV) and maintains a probability weight for each model. When a vehicle is driving straight, the CV model dominates. When it begins a lane change, the CTRV model's weight increases. This adaptive model selection is a classical Bayesian estimation technique.
18. Data Association
The Assignment Problem
Data association -- matching new sensor detections to existing tracks -- is a classical combinatorial optimization problem. At each timestep, the Tracker receives N detections and maintains M tracks. It must determine which detection corresponds to which track, and identify:
- Detections that match existing tracks (track updates)
- Detections that match no existing track (new track initialization)
- Tracks with no matching detection (track prediction only; possible track termination)
Gating
Before solving the full assignment problem, classical gating eliminates implausible associations:
Position gate: A detection is only considered for association with a track if it falls within the track's predicted position uncertainty ellipse (typically the Mahalanobis distance is below a threshold, e.g., 3-sigma).
Velocity gate: If the detection has velocity information (radar Doppler), it must be consistent with the track's predicted velocity.
Size gate: Detection dimensions must be compatible with the tracked object's estimated dimensions.
Classification gate: A detection classified as "truck" is not associated with a track classified as "motorcycle."
At highway speeds with long-range sparse detections, gating is challenging because:
- Prediction uncertainty grows large at long range
- Multiple targets may fall within the same gate
- Sparse LiDAR detections at 400-600m have high position uncertainty
Assignment Algorithms
Classical algorithms for solving the gated assignment problem:
Hungarian Algorithm (Munkres): Finds the optimal one-to-one assignment that minimizes total cost. The cost for each detection-track pair is typically the Mahalanobis distance (statistical distance accounting for uncertainty).
Joint Probabilistic Data Association (JPDA): Rather than making hard assignments, JPDA computes a weighted combination of all plausible associations. Each track's state is updated using a weighted average of all detections in its gate, with weights proportional to association probabilities. This is more robust in cluttered environments where multiple detections fall near a single track.
Multi-Hypothesis Tracking (MHT): Maintains multiple hypotheses about the data association and defers the decision until more data resolves the ambiguity. The most rigorous but most computationally expensive approach.
Kodiak's Tracker, built on information-theoretic principles, likely uses a Bayesian approach similar to JPDA or MHT -- "assessing what conclusion best explains the whole body of evidence, without prejudice, rules, or hard-coding."
Highway-Specific Association Challenges
| Scenario | Challenge | Classical Solution |
|---|---|---|
| Closely spaced vehicles in traffic | Detections from adjacent vehicles may be ambiguous | Use velocity and heading differences to disambiguate |
| Occluded vehicles | A vehicle hidden behind a truck produces no detection | Track prediction maintains the track; Tracker acknowledges high uncertainty |
| LiDAR-to-radar association at long range | LiDAR and radar position accuracy differ at 300m+ | Inflate association gates at long range; use velocity for confirmation |
| Track handoff between sensors | Object transitions from Luminar-only to Hesai coverage | Maintain unified track across sensor transitions |
19. Track Prediction
Forward Prediction for Gating and Planning
Track prediction -- projecting tracked objects forward in time -- serves two purposes:
- Gating (short-term): Predicting where each track will be at the next sensor update (50ms ahead) to establish the search region for data association
- Planning (medium-term): Predicting where tracked objects will be 2-8 seconds ahead to inform motion planning
Classical Prediction Models
Constant velocity prediction: The simplest model. Projects the current velocity forward:
predicted_position(t+dt) = current_position + velocity * dtAccurate for vehicles maintaining speed in a lane (the most common highway scenario).
Constant turn rate prediction: For vehicles executing lane changes or following curved roads:
predicted_position(t+dt) = CTRV_propagation(current_state, dt)This produces a curved predicted trajectory that better matches lane-change and curve-following behavior.
Lane-constrained prediction: Uses the estimated lane geometry to constrain predictions. A vehicle is predicted to follow the lane unless lateral velocity indicates a lane change. This combines the motion model with the road model to produce more realistic predictions:
- A vehicle in a curve is predicted to follow the lane curvature
- A vehicle with zero lateral velocity is predicted to stay centered in its lane
- A vehicle with lateral velocity is predicted to execute a lane change
Prediction Uncertainty Growth
A critical classical result: prediction uncertainty grows with time. The predicted position covariance at time t+dt includes:
- The propagated state covariance (current uncertainty projected forward)
- Process noise (additional uncertainty from unmodeled behavior -- the vehicle might brake, accelerate, or change lanes)
At highway speeds, 3 seconds of prediction can result in substantial uncertainty:
- Longitudinal uncertainty: ~10-30m (the vehicle might brake or accelerate)
- Lateral uncertainty: ~3-5m (the vehicle might change lanes)
The planner must account for this growing uncertainty, which is why the Tracker's explicit uncertainty representation is so important.
20. Multi-Sensor Track Fusion
Kodiak Vision Fusion Architecture
Kodiak Vision treats all sensors as primary and fuses their detections using a shared information-theoretic framework. The fusion architecture can be characterized as a centralized tracker with multi-detector input:
Luminar Iris #1 -----> Detector A ----+
+--> Detector B -----|
|
Luminar Iris #2 -----> Detector C -----|
+--> Detector D -----| +---> Kodiak Vision Tracker ---> Fused Tracks
| |
Hesai LiDAR #1 ------> Detector E ----| |
Hesai LiDAR #2 ------> Detector F ----| |
| |
ZF Radar #1 ---------> Detector G --+ | |
ZF Radar #2 ---------> Detector H --|-| |
ZF Radar #3 ---------> Detector I --|-|-----+
ZF Radar #4 ---------> Detector J --|-|
ZF Radar #5 ---------> Detector K --|-|
ZF Radar #6 ---------> Detector L --|-|
|
Camera 1-12 ---------> Detector M --+ |
+--> Detector N --|-|
+--> Detector O ---|Every sensor measurement passes through multiple independent software detectors, and all detector outputs are expressed in the common information-theoretic language before reaching the Tracker.
Information-Theoretic Fusion
The Tracker's mathematical framework, drawing from Shannon, Wiener, Kolmogorov, and Chapman, operates on probability distributions rather than point estimates:
Each detection is a likelihood function: Rather than saying "there is a vehicle at (100m, 2m)," a detector says "here is the probability distribution over vehicle positions given my measurement." This distribution captures both the measured position and the measurement uncertainty.
Bayesian update: The Tracker combines the prior track state (from prediction) with the new detection likelihood using Bayes' theorem:
posterior = prior * likelihood / normalizationThis is the mathematical core of Kalman filtering and its extensions.
- Multi-sensor fusion: When multiple sensors detect the same object, their likelihood functions are combined:
- LiDAR provides tight spatial uncertainty (precise position) but no native velocity
- Radar provides tight velocity uncertainty (precise Doppler) but looser spatial uncertainty
- Camera provides tight angular uncertainty (precise bearing) but no native depth
- The fused estimate combines the best of each sensor
- Conflict resolution: When sensors disagree (e.g., camera says "no object" but LiDAR says "object at 200m"), the Tracker evaluates the relative credibility of each sensor's evidence. The information-theoretic framework naturally down-weights unreliable or conflicting evidence without hard-coded rules.
Cross-Modal Verification
When multiple physically independent sensor modalities agree on an object's presence and state, the confidence is much higher than any single sensor alone. The Tracker exploits three independent physical principles:
| Sensor | Physical Principle | What It Measures Best |
|---|---|---|
| LiDAR | Light time-of-flight | 3D position, shape, reflectivity |
| Radar | RF Doppler reflection | Radial velocity, range (weather-robust) |
| Camera | Optical imaging | Color, texture, classification, angular position |
Cross-verified detections (confirmed by 2+ independent physics) have dramatically lower false positive and false negative rates than single-sensor detections.
Sensor Transition Handling
Objects moving through the Kodiak truck's sensor coverage experience transitions between sensor modalities:
- 600-300m: Luminar-only detection (forward-facing long-range)
- 300-200m: Luminar + radar + narrow-FOV camera
- 200-50m: Luminar + Hesai + radar + cameras (full coverage)
- 50-0m: Hesai + radar + wide-FOV cameras (Luminar may not cover near-field)
- Alongside: Hesai + side radar + side cameras (vehicle passing)
- Behind: Hesai + rear radar + rear cameras
The Tracker maintains continuous track identity across these transitions. An object first detected by Luminar at 400m is the same track when Hesai picks it up at 200m. Classical track prediction bridges the gap when an object briefly falls between sensor coverage zones.
21. NVIDIA DRIVE Orin Classical Processing
CPU Architecture
The NVIDIA DRIVE Orin SoC contains 12 Arm Cortex-A78AE CPU cores. While the GPU handles ML inference workloads, these CPU cores are responsible for significant classical processing:
| CPU Workload | Classical Method |
|---|---|
| Tracker core loop | Kalman/UKF prediction and update at 20 Hz |
| Data association | Hungarian algorithm / JPDA |
| Road model estimation | Polynomial / clothoid fitting |
| Coordinate transformations | Rotation matrices, frame conversions |
| Safety checks | Rule-based monitoring at 10 Hz |
| Ego-motion estimation | IMU integration, GPS fusion |
| Map matching | Lane-to-map correlation |
| Trajectory planning | Optimization, constraint checking |
| Communication | Sensor data routing, telemetry |
Programmable Vision Accelerator (PVA)
The Orin includes a dedicated hardware accelerator for classical computer vision -- the PVA (Programmable Vision Accelerator). This is a VLIW SIMD DSP specifically optimized for image processing and computer vision operations:
| PVA Specification | Value |
|---|---|
| Architecture | Vector SIMD VLIW DSP |
| Vector Processing Units (VPUs) | 2 per PVA |
| VLIW width | 7-way (2xScalar + 2xVector + 3xMemory simultaneously) |
| INT8 performance | 2048 GMACs per PVA instance |
| FP32 performance | 32 GMACs per PVA instance |
| Memory bandwidth | Up to 15 GB/s per DMA |
Classical CV operations that run on the PVA:
- Image remapping (undistortion): Correcting lens distortion using calibrated parameters
- Cropping and resizing: Preparing image regions for ML detectors
- Color conversion: YUV to RGB conversion for neural network input
- Layout conversion: Block-linear to pitch-linear format for memory efficiency
- Image pyramid generation: Multi-scale image representations for detection
- Sparse optical flow: Feature tracking between consecutive frames
- Feature detection: Harris/Shi-Tomasi corner detection
- Feature tracking: KLT (Kanade-Lucas-Tomasi) tracker for visual odometry
- Median filtering: Outlier rejection in post-processing
- Non-maximum suppression (NMS): Post-processing for detection outputs
The PVA runs concurrently with the GPU and CPU, offloading classical CV workloads to free GPU resources for ML inference. In production deployments (e.g., NIO's PVA usage on Orin), transferring classical operations to PVA achieved a 10% reduction in overall GPU resource usage.
Deep Learning Accelerator (DLA)
Orin also includes two DLA engines for efficient fixed-function neural network inference. While not classical processing, the DLA frees the GPU for other workloads, indirectly increasing the compute budget available for classical processing on the CPU and PVA.
22. Ambarella CV3-AD685 Classical Processing
General Vector Processor (GVP)
The Ambarella CV3-AD685 includes a dedicated classical processing engine within its CVflow architecture -- the General Vector Processor (GVP). This is specifically designed to offload classical computer vision and radar processing:
| GVP Capability | Purpose |
|---|---|
| Classical computer vision | Traditional CV algorithms that don't require neural networks |
| HD radar processing | Signal processing for 4D radar data |
| Floating-point algorithms | Algorithms too computationally expensive for CPUs |
| Sensor fusion | Classical fusion algorithms combining camera, radar, and LiDAR data |
The GVP provides:
- New floating-point general vector processor: Purpose-built for algorithms with intensive floating-point computation
- Optimization for HD radar: Specific enhancements for processing high-definition radar signals, including FFT computation, CFAR detection, and beamforming
- CPU offloading: Floating-point-intensive algorithms that would otherwise overload the Arm Cortex CPUs are offloaded to the GVP
Dense Stereo and Optical Flow Engine
The CV3-AD685 includes a dedicated hardware engine for two foundational classical computer vision operations:
Dense stereo: Computes per-pixel depth from stereo camera pairs. This is a classical correspondence matching problem where each pixel in the left image is matched to its corresponding pixel in the right image, and the disparity (pixel difference) is converted to depth using the known stereo baseline:
depth = (focal_length * baseline) / disparityDense optical flow: Computes per-pixel motion vectors between consecutive frames. This classical operation estimates how each pixel has moved from one frame to the next, providing:
- Ego-motion estimation (if the scene is stationary)
- Moving object detection (pixels whose flow is inconsistent with ego-motion)
- Velocity estimation for tracked objects (visual velocity cues)
These hardware engines compute stereo and optical flow at frame rate with minimal CPU/GPU involvement.
ISP Pipeline (On-Chip)
The CV3-AD685's on-chip ISP is an extensive classical processing pipeline:
- Handles all 12 camera streams simultaneously
- Premium image quality processing described as suitable for "rain, fog, and darkness"
- HDR processing for the extreme dynamic range of highway driving (tunnels, sun glare, night)
- Low-light enhancement for nighttime driving
- Real-time exposure control and auto-white-balance
Workload Split Between Ambarella and NVIDIA
The two compute platforms divide the classical workload:
| Workload | Primary Processor | Rationale |
|---|---|---|
| Camera ISP | Ambarella CV3-AD685 | On-chip ISP optimized for camera processing |
| Camera classical CV | Ambarella GVP | Purpose-built classical CV accelerator |
| Dense stereo / optical flow | Ambarella hardware engine | Dedicated hardware for these operations |
| Radar signal processing | Ambarella GVP | Optimized for HD radar algorithms |
| Camera ML inference | Ambarella CVflow NVP | 20x faster than CV2 for neural networks |
| LiDAR point cloud processing | NVIDIA Orin GPU | GPU-parallel point cloud algorithms |
| Tracker / fusion | NVIDIA Orin CPU | Complex sequential logic |
| Planning / prediction | NVIDIA Orin CPU + GPU | Mixed classical and ML workload |
| Classical image preprocessing | NVIDIA Orin PVA | Undistortion, cropping, color conversion |
| Safety monitoring | NXP ACE (independent) | ASIL-D certified classical logic |
23. NXP ACE Safety Computer
Hardware Architecture
The Kodiak Actuation Control Engine (ACE) is a custom-designed safety computer that operates independently from the main autonomy system. It integrates NXP automotive processors certified to the highest ISO 26262 safety integrity level (ASIL-D):
| Component | Role | Safety Function |
|---|---|---|
| NXP S32G3 | Vehicle network processor | Safe actuation of braking, steering, and throttle |
| NXP S32K3 | Safety co-processor | Power distribution, battery charging, safety HMI |
| NXP VR5510 | Multi-channel PMIC | Power generation with functional safety voltage monitoring |
| NXP PF53 | Core supply regulator | Power supply for S32G3 core |
ASIL-D Compliance
ASIL-D (Automotive Safety Integrity Level D) is the highest safety classification in ISO 26262, corresponding to:
- Failure rate: Fewer than 10 failures in 1 billion hours of operation
- Development process: Formal requirements specification, formal design verification, structural code coverage, systematic testing
- Hardware metrics: Single-point fault metric > 99%, latent fault metric > 90%
This level of certification requires that the ACE's processing is deterministic, verifiable, and simple enough to be formally analyzed. This inherently means classical processing -- deterministic rule-based logic rather than ML inference, which cannot be formally verified to ASIL-D standards.
What the ACE Processes
The ACE performs classical, deterministic safety functions:
Actuation control: Translates trajectory commands into physical actuator signals for:
- Triple-redundant pneumatic braking (three independent actuators)
- Dual-redundant steering (two ZF steering actuators)
- Throttle control
Sensor health monitoring: Independent sensor data access allows the ACE to verify that sensors are functioning
Communication watchdog: The ACE monitors the communication link with the main Kodiak Driver compute. If communication is lost (hardware failure, software crash, cable disconnect), the ACE automatically initiates fallback
Actuation integrity: The ACE verifies that commanded actuations (brake pressure, steering angle) match actual actuator responses. Discrepancies indicate actuator failure.
Power management: The S32K3 co-processors monitor power distribution and battery charging, ensuring electrical system integrity
Why the ACE Must Be Classical
The ACE cannot use ML-based processing because:
- ML models are not deterministically verifiable (required for ASIL-D)
- ML models can produce unexpected outputs on out-of-distribution inputs
- Formal verification of neural networks is an unsolved problem at the scale needed for ASIL-D
- The safety case requires provable worst-case behavior, which classical rule-based logic provides
The ACE's processing is entirely rule-based: IF (communication_lost) THEN (execute_fallback). IF (brake_pressure < commanded_pressure) THEN (flag_actuator_failure). This deterministic logic can be formally verified and exhaustively tested.
Dual ACE Redundancy
Gen 4+ Kodiak trucks include two ACE units. If one ACE fails, the second independently maintains vehicle control. This dual-redundancy is classical fault-tolerance architecture: the probability of both ACEs failing simultaneously is the product of their individual failure probabilities, achieving extremely low system-level failure rates.
24. Off-Road Perception
Military Terrain Classification
Kodiak's military perception system (deployed on the Ford F-150 prototype and Textron RIPSAW M3) must handle terrain that has no road markings, no lane structure, and no standardized appearance. The perception challenge shifts from "detect objects on a road" to "determine where is drivable":
Classical Terrain Analysis Methods:
| Method | Sensor | What It Determines |
|---|---|---|
| Geometry-based traversability | LiDAR | Slope angle, step height, roughness, gap width |
| Appearance-based classification | Camera | Terrain type (dirt, gravel, mud, rock, vegetation, water) |
| Proprioceptive feedback | IMU | Actual vibration and traction (validates predicted traversability) |
| Ground surface estimation | LiDAR + IMU | Drivable surface geometry, obstacles above ground |
Slope and Roughness Analysis: The LiDAR point cloud is divided into a 2D grid. For each grid cell, classical statistics are computed:
- Mean height (ground elevation)
- Height variance (roughness)
- Maximum slope to neighboring cells (traversability constraint)
- Step height at cell boundaries (obstacle detection)
A grid cell is classified as "traversable" if slope < max_slope, roughness < max_roughness, and step_height < max_step.
Negative Obstacle Detection: Unlike highway driving where all obstacles are above the road surface, off-road terrain includes negative obstacles -- ditches, gullies, drop-offs. These are detected by classical analysis of the LiDAR height grid:
- A cell significantly below its neighbors indicates a depression
- The depth and width of the depression determine if it is traversable
Dust Handling
Kodiak specifically hardened its system against dust during off-road testing:
- The Hesai IPE filters dust returns at the sensor level (>99.9% environmental noise rejection)
- Multi-return processing sees through dust to solid surfaces behind
- The ZF 4D radar operates through dust with minimal attenuation (4mm wavelength >> dust particle size)
- The SensorPod self-cleaning system (compressed air, water) clears lens contamination
- DefensePod hardening provides additional environmental protection
Off-Road Path Planning Without Lane Structure
Without lane markings or road structure, the system navigates using:
- Waypoint following (GPS-based when available)
- Terrain-relative navigation (matching perceived terrain to known maps)
- Lead vehicle following (following the path of a manned vehicle ahead in convoy operations)
- Obstacle avoidance on traversability costmaps
25. Threat Detection
Classical Obstacle Detection for Military
Military threat detection extends beyond standard autonomous driving obstacles to include threats specific to combat and reconnaissance operations:
Geometric Anomaly Detection: Classical point cloud analysis identifies objects that do not match the expected terrain surface:
- Objects protruding above the ground surface (rocks, logs, debris, potential IEDs)
- Disturbed earth (changes in ground surface geometry that may indicate buried objects)
- Wire-like structures (detected as thin, linear point clusters)
Multi-Sensor Anomaly Fusion: The classical approach fuses geometric anomalies from LiDAR with:
- Radar cross-section analysis (metallic objects have characteristically high RCS)
- Camera texture analysis (disturbed earth has different visual appearance from undisturbed terrain)
- Thermal anomalies (from thermal cameras, if integrated in DefensePods)
Classical Change Detection: Comparing the current terrain to a prior reference (from a previous pass or a pre-mission map):
- Grid-based subtraction of current elevation model from reference elevation model
- Changes exceeding a threshold indicate new objects or terrain disturbance
- This is a classical remote sensing technique adapted for ground-vehicle perspective
Limitations
Kodiak has not publicly disclosed specific IED detection algorithms. The military perception system uses the same Kodiak Driver software as the commercial trucking system, adapted for off-road environments. Specific threat detection capabilities are likely classified or proprietary.
26. GPS-Denied Navigation
Classical Navigation Without GPS
Kodiak's perception-over-priors philosophy inherently supports GPS-denied operation because the system is designed to navigate using real-time perception rather than relying on GPS-based localization against HD maps. The classical techniques for GPS-denied navigation include:
Inertial Navigation System (INS): The truck's IMU (accelerometers + gyroscopes) provides dead-reckoning navigation by integrating acceleration and angular velocity:
position(t+dt) = position(t) + velocity(t)*dt + 0.5*acceleration(t)*dt^2
velocity(t+dt) = velocity(t) + acceleration(t)*dt
heading(t+dt) = heading(t) + yaw_rate(t)*dtINS drift is the fundamental problem: integration of noisy accelerometer data causes position error to grow quadratically with time. A tactical-grade IMU can maintain ~1% of distance traveled accuracy, meaning after 1 km, the position error is ~10m.
Visual Odometry (VO): Classical visual odometry estimates ego-motion by tracking visual features between consecutive camera frames:
- Detect features in frame N (Harris corners, FAST, ORB)
- Track features in frame N+1 (KLT tracker, optical flow)
- Compute the essential matrix from feature correspondences
- Decompose the essential matrix into rotation and translation
- Scale from stereo or from the known ground plane
This provides a drift-reduced ego-motion estimate that can correct INS drift.
LiDAR Odometry (SLAM): Classical scan matching algorithms (ICP -- Iterative Closest Point, or NDT -- Normal Distributions Transform) align consecutive LiDAR scans to estimate ego-motion:
- Current scan is aligned to previous scan (or a local map) using iterative optimization
- The transformation that best aligns the two scans provides the ego-motion estimate
- Loop closure detection (recognizing previously visited locations) can correct accumulated drift
Terrain-Relative Navigation: For military operations, the system can localize against known terrain elevation models:
- The perceived terrain (from LiDAR) is matched against a stored digital elevation model (DEM)
- Classical template matching or correlation techniques find the best alignment
- This provides absolute position without GPS, given an accurate DEM
Sensor Fusion for GPS-Denied Localization
Classical sensor fusion (typically an Extended Kalman Filter) combines:
- INS (high rate, drifts over time)
- Visual odometry (moderate rate, drifts slowly)
- LiDAR odometry (moderate rate, very accurate locally)
- Wheel odometry (high rate, affected by slip)
- Terrain matching (intermittent, provides absolute corrections)
The EKF optimally weights each source based on its noise characteristics, producing a fused position estimate that is more accurate than any individual source.
27. Degraded Sensor Operation
Classical Fallback Hierarchy
When sensors are damaged, degraded, or obscured, the perception system must gracefully degrade rather than fail. This is handled by classical fault-tolerance logic:
Sensor Health Classification (determined at 10 Hz by the safety monitoring system):
- Nominal: Full capability
- Degraded: Reduced performance (e.g., rain on camera lens, dust on LiDAR)
- Failed: No usable output
Degradation Response:
| Scenario | Remaining Sensors | Classical Response |
|---|---|---|
| One camera failed | 11 cameras + 4 LiDAR + 6 radar | Reduce reliance on failed camera's FOV; other sensors compensate |
| One LiDAR failed | 12 cameras + 3 LiDAR + 6 radar | Other LiDAR + radar + cameras maintain coverage; possible blind spot |
| One radar failed | 12 cameras + 4 LiDAR + 5 radar | Other radars cover similar FOV; velocity estimation slightly degraded |
| One SensorPod failed | Half sensors from other pod | Significant degradation; reduced FOV; initiate controlled pullover |
| Heavy rain/fog | Cameras degraded, LiDAR degraded, radar functional | Radar becomes primary sensor; increase following distance; reduce speed |
| Camera-only night | Cameras degraded (darkness) | LiDAR and radar (active sensors) become primary |
Classical Uncertainty Propagation
The Kodiak Vision Tracker's information-theoretic framework handles degraded sensors naturally:
- A degraded sensor's detections carry higher uncertainty
- The Tracker automatically down-weights high-uncertainty detections
- The overall fused estimate remains valid but with increased uncertainty
- Increased uncertainty propagates to the planner, which responds with more conservative behavior
This is classical Bayesian inference in action -- the mathematics naturally handles varying sensor reliability without requiring explicit sensor-failure rules.
Military-Specific Degradation
In military operations, additional degradation sources include:
- Battle damage to DefensePods
- Electromagnetic interference (EMI) affecting radar
- Laser dazzle or jamming affecting LiDAR
- Smoke and obscurants degrading all optical sensors
The classical response is the same: the Tracker down-weights unreliable sensors, and the system operates with whatever sensors remain functional. The radar's robustness to visual obscurants (smoke, dust) makes it particularly valuable as a last-resort sensor.
28. 1,000+ Safety Checks at 10 Hz
What Is Monitored
Ten times per second, the Kodiak Driver evaluates the performance of more than 1,000 safety-critical processes and components. These checks are overwhelmingly rule-based (classical logic), not ML-based, because they must be deterministic and verifiable.
The 1,000+ checks span two domains:
Truck Platform Checks (classical threshold monitoring):
| Component | Classical Check | Failure Response |
|---|---|---|
| Engine | RPM within expected range, temperature nominal | Reduce speed or initiate pullover |
| Oil levels | Pressure above minimum threshold | Initiate pullover |
| Tire pressure | All tires within safe range (TPMS) | Reduce speed or pullover |
| Braking system | Air pressure adequate, actuator response | Fallback to redundant braking |
| Steering | Actuator response matches commanded angle | Fallback to redundant steering |
| Electrical system | Voltage, current within bounds | Switch to backup power |
| Cooling | Coolant temperature within range | Reduce load or pullover |
Autonomy System Checks (classical process monitoring):
| Component | Classical Check | Failure Response |
|---|---|---|
| Sensor health | Each of 22 sensors producing data at expected rate | Down-weight or bypass failed sensor |
| Sensor data quality | Signal-to-noise ratio, point density, image exposure | Flag degraded sensor for Tracker |
| Software process health | Each process running, not hung, producing outputs | Restart process or initiate fallback |
| Compute health | CPU/GPU temperature, utilization, memory | Reduce load or initiate fallback |
| Communication links | Data flowing between compute nodes | Initiate fallback if link lost |
| Detector outputs | Each detector producing detections at expected rate | Flag missing detector for Tracker |
| Tracker consistency | World model internally consistent, tracks well-formed | Flag inconsistency for investigation |
| Cross-sensor agreement | LiDAR, radar, camera detections consistent | Flag disagreement, increase uncertainty |
| Localization health | GPS quality, map correlation confidence | Increase uncertainty in position |
| Planning sanity | Planned trajectory feasible, within vehicle dynamics limits | Reject infeasible plan, replan |
Rule-Based vs. ML-Based Checks
The vast majority of these checks are classical rule-based logic:
IF sensor_frame_rate < minimum_threshold THEN mark_sensor_degraded
IF compute_temperature > thermal_limit THEN reduce_compute_load
IF communication_latency > max_latency THEN initiate_fallback
IF brake_pressure < minimum_safe THEN activate_redundant_brakeThis rule-based approach is chosen because:
- Rules are deterministically verifiable (required for ASIL-D safety case)
- Rules have guaranteed worst-case execution time (real-time constraint)
- Rules are transparent and auditable (regulatory requirement)
- Rules do not produce unexpected outputs (unlike ML models)
10 Hz Execution Rate
The 10 Hz rate (100ms period) is chosen to balance:
- Responsiveness: Detecting a failure within 100ms allows corrective action before the situation becomes dangerous
- Computational overhead: Running 1,000+ checks at higher frequency would consume excessive compute
- Physical dynamics: Most mechanical failures (tire blowout, brake loss, engine failure) develop over timescales > 100ms
At 65 mph, the truck moves ~2.9m per 100ms cycle. A brake failure detected within 100ms allows intervention before the truck has traveled an additional 2.9m -- well within the control system's response margin.
29. ACE Fallback System
Dual-Path Planning Architecture
The Kodiak Driver always plans two paths simultaneously:
- Nominal trajectory: The planned route to the destination (normal driving)
- Fallback trajectory: A safe pullover path ready for immediate execution
Both trajectories are computed at every planning cycle (multiple times per second). The fallback trajectory is always available and always valid -- if a fault is detected, the system can immediately switch from executing the nominal trajectory to executing the fallback trajectory.
Fallback Execution
When the ACE determines that a fallback is necessary (based on the 1,000+ safety checks), the execution follows a classical control sequence:
- Fault detection: A safety check fails (sensor, compute, communication, or vehicle component)
- Trajectory switch: The controller transitions from executing the nominal trajectory to executing the fallback trajectory
- Controlled deceleration: The truck decelerates smoothly (not emergency braking unless imminent collision)
- Lane change to shoulder: If safe, the truck changes lanes to the right and onto the shoulder
- Stop: The truck comes to a controlled stop on the shoulder
- Hazard notification: Hazard lights are activated; the remote operations center is notified
ACE Independence
The ACE executes the fallback without any input from the main Kodiak Driver computer. This is critical because the most dangerous failure mode is a complete crash of the main autonomy software. In that scenario:
- The main compute produces no output
- The ACE detects the communication loss
- The ACE independently executes the fallback using its own sensor access and the pre-planned fallback trajectory
- The ACE controls braking, steering, and throttle through direct actuator connections
This is analogous to a biological brainstem reflex -- fast, automatic, and independent of higher-level processing.
Classical Control Theory
The ACE's trajectory following is a classical control problem:
- Lateral control: PID or pure-pursuit controller maintains the vehicle on the fallback trajectory
- Longitudinal control: Speed controller manages deceleration profile
- Actuator control: Direct control of brake pressure, steering angle, and throttle position
These controllers are deterministic, formally verified, and certified to ASIL-D standards.
Testing and Validation
Kodiak validates fallback performance through:
- Fault injection testing: Deliberately injecting faults (sensor failures, software crashes, communication drops) during controlled testing
- Scenario variation: Testing fallback in various conditions (near objects, on curves, in traffic, on straight highways)
- Simulation: Millions of simulated fallback scenarios via Applied Intuition simulation platform and Kodiak's Breakpoint AI tool
30. Geometric Collision Checking
Time-to-Collision (TTC) Computation
TTC is the most fundamental classical collision metric, computing the time until contact between the ego-vehicle and another object under the assumption of constant velocities:
TTC = relative_distance / relative_velocityFor a Kodiak Class 8 truck at highway speed:
| Scenario | Relative Distance | Relative Velocity | TTC | Required Action |
|---|---|---|---|---|
| Approaching stopped vehicle | 400m | 29 m/s (65 mph) | 13.8s | Begin planning evasive maneuver |
| Approaching stopped vehicle | 200m | 29 m/s | 6.9s | Execute lane change or begin braking |
| Approaching stopped vehicle | 100m | 29 m/s | 3.4s | Emergency braking |
| Cut-in at 30m | 30m | 10 m/s | 3.0s | Emergency braking |
| Lead vehicle hard braking | 100m | 15 m/s (closing) | 6.7s | Match braking |
The radar provides the most accurate TTC computation because it measures relative velocity directly via Doppler, while LiDAR and camera must differentiate range over time to estimate velocity.
Truck-Specific Geometric Considerations
Collision checking for a Class 8 truck is more complex than for a passenger car because of the truck's dimensions and articulation:
| Dimension | Value | Impact on Collision Checking |
|---|---|---|
| Cab length | ~6m | Front collision zone |
| Trailer length | ~16m (53-foot) | Total vehicle length ~22m |
| Width | ~2.6m | Tight lane clearance |
| Height | ~4.1m | Overhead clearance (bridges, signs) |
| Turning radius | Large | Wide swept path in curves and lane changes |
| Articulation | Cab-trailer joint | Trailer tracks differently than cab in turns |
Swept Volume Analysis
When the truck changes lanes or follows curved roads, it sweeps through a volume of space that is larger than the static vehicle footprint. Classical swept volume computation:
Kinematic model: The truck cab follows the planned trajectory. The trailer follows the cab's path with an offset due to the articulation joint (tractor-trailer kinematic model).
Volume computation: At each time step along the trajectory, the vehicle's footprint (2D or 3D bounding box) is placed at the predicted position and orientation. The union of all footprints defines the swept volume.
Collision check: The swept volume is checked against the predicted positions of all tracked objects. If any object's predicted volume intersects the truck's swept volume, a collision risk is flagged.
Clearance margins: Safety margins are added to account for:
- Position uncertainty of tracked objects
- Ego-vehicle trajectory tracking error
- Wind effects on the trailer
- Object motion during the maneuver
Braking Distance Computation
Classical physics determines the truck's stopping distance:
stopping_distance = (v^2) / (2 * mu * g) + v * reaction_timewhere:
- v = speed (m/s)
- mu = tire-road friction coefficient (0.7-0.9 dry, 0.3-0.5 wet, 0.1-0.3 ice)
- g = gravitational acceleration (9.81 m/s^2)
- reaction_time = system response time (~0.1-0.3s for autonomous)
For a loaded 80,000-pound truck at 65 mph on dry pavement:
- Braking deceleration: ~5 m/s^2 (loaded truck)
- Stopping distance: ~85m (280 feet) from braking initiation
- Plus reaction distance: ~9m (30 feet)
- Total: ~94m (310 feet)
At 65 mph on wet pavement:
- Braking deceleration: ~3 m/s^2
- Stopping distance: ~140m (460 feet)
- Plus reaction distance: ~9m
- Total: ~149m (490 feet)
These classical calculations directly inform the following distance the planner maintains and the trigger distance for emergency braking.
Overhead Clearance Checking
The 4D radar's elevation measurement enables classical overhead clearance checking:
- The radar detects an object ahead (bridge, sign gantry, overpass)
- The elevation measurement places the object above the road surface
- Classical geometry: if the object's elevation > truck height + safety margin, it is cleared
- If the clearance is insufficient, the system alerts or stops
This resolves a class of false-positive problems that plague 3D radar: overhead structures that appear as road-level obstacles because 3D radar cannot measure elevation.
Summary: Classical vs. ML Workload Distribution
The Kodiak perception stack is a deeply hybrid system where classical methods and ML methods are tightly integrated. The following table summarizes the division:
| Processing Layer | Classical Dominance | ML Dominance | Hybrid |
|---|---|---|---|
| Sensor signal processing (LiDAR, radar, camera ISP) | Entirely classical | ||
| Calibration (factory and online) | Entirely classical | ||
| Ground plane estimation | Primarily classical | ||
| Lane detection | Hybrid (classical preprocessing + ML segmentation + classical fitting) | ||
| Object detection | Primarily ML | ||
| Object tracking (Kodiak Vision Tracker) | Primarily classical (information-theoretic Bayesian estimation) | ||
| Data association | Entirely classical | ||
| Track prediction | Primarily classical (Kalman prediction) | ||
| Multi-sensor fusion | Primarily classical (Bayesian fusion) | ||
| Road model estimation | Primarily classical | ||
| Construction zone geometry | Hybrid (ML detection + classical clustering/fitting) | ||
| Safety monitoring (1,000+ checks) | Entirely classical (rule-based) | ||
| ACE fallback | Entirely classical (ASIL-D certified) | ||
| Collision checking | Entirely classical (geometry + kinematics) | ||
| VLM scene understanding | Entirely ML | ||
| Sparse map generation | Hybrid (ML lane detection + classical aggregation) | ||
| GPS-denied navigation | Primarily classical (INS, VO, SLAM) | ||
| Off-road traversability | Hybrid (classical geometry + ML classification) |
The safety-critical components (ACE, safety monitoring, collision checking, calibration) are entirely classical because they must be deterministically verifiable. The perception components (detection, tracking, fusion) use classical mathematical frameworks (information theory, Bayesian estimation) as the backbone, with ML models providing inputs that the classical framework integrates and validates. The most ML-heavy components (VLMs, deep learning detectors) augment rather than replace the classical foundation.
This architecture -- classical foundations augmented by ML -- is deliberate. As Kodiak has stated, the system is built on "fundamental mathematics instead of clever software tricks," allowing it to incorporate advances in deep learning "without rewriting our codebase."
Sources
Kodiak AI Official
- Kodiak AI -- Technology Overview
- Kodiak AI -- Kodiak Vision: Turning Data into Insight
- Kodiak AI -- Sparse Maps: Doing More with Less
- Kodiak AI -- 4D Radar
- Kodiak AI -- SensorPods
- Kodiak AI -- Fallback System
- Kodiak AI -- NXP Integration
- Kodiak AI -- Navigating Construction
- Kodiak AI -- Military Ground Superiority
- Kodiak AI -- Safety, First and Always
- Kodiak AI -- Gen 5 Truck
- Kodiak AI -- Smarter, Faster, Just as Safe
- Kodiak AI -- Bosch Partnership
- Kodiak AI -- VLMs Take the Wheel
- Kodiak AI -- Q4 2025 Earnings Call
Kodiak AI Medium Blog
- Kodiak Vision: Turning Data into Insight
- Kodiak Sparse Maps: Doing More with Less
- Taking Self-Driving Trucks to New Dimensions with 4D Radar
Sensor Vendor Sources
- Luminar Technologies -- Technology
- Luminar Technologies -- InGaAs Receiver Acquisition
- Luminar Iris -- AutonomouStuff
- Hesai OT128 Specifications
- ZF Imaging Radar
- ZF 4D Full-Range Radar Press Release
Compute Platform Sources
- Ambarella CV3-AD685 Announcement
- Ambarella -- Kodiak Selection
- NVIDIA DRIVE Orin -- PVA SDK
- NVIDIA -- PVA Engine Optimization Blog
- NXP -- Kodiak Integration
- NXP -- ASIL-D Safety
Academic and Technical References
- 4D Millimeter-Wave Radar in Autonomous Driving: A Survey (arXiv)
- CFAR Detection in Automotive Radar
- Road Geometry Estimation Using Clothoid Models
- Ground Segmentation for Automotive LiDAR: A Survey
- LiDAR-Camera Extrinsic Calibration Review
- Online Calibration for 4D Imaging Radar
- Radar Mounting Angle Estimation
- Camera ISP for Autonomous Driving
- HDR Tone Mapping in Automotive Systems
- Off-Road Terrain Classification Survey
- GPS-Denied Navigation Survey
- Speed-Guided Learnable Kalman Filter for Tracking
- Cut-In Behavior Prediction
- Multi-Sensor Fusion Review
- Camera-LiDAR Spatiotemporal Calibration
Industry Coverage
- FreightWaves -- Kodiak Lightweight Mapping
- FreightWaves -- Kodiak NXP Safety
- TechCrunch -- Kodiak Military Prototype
- IoT World Today -- RIPSAW M3
- Inside GNSS -- Kodiak Lightweight Mapping for PNT
- Electronic Design -- Kodiak NXP Integration