Skip to content

Kodiak AI -- Perception Stack Deep Dive

Exhaustive Technical Analysis -- March 2026


Table of Contents

  1. "Perception Over Priors" Philosophy
  2. Sensor Evolution: Gen 4 - Gen 5 - Gen 6
  3. Luminar Iris LiDAR Integration
  4. Hesai 360-Degree LiDAR
  5. ZF 4D Imaging Radar
  6. Camera Perception
  7. Multi-Sensor Fusion: Kodiak Vision
  8. 3D Object Detection
  9. Object Tracking: The Kodiak Vision Tracker
  10. Highway-Specific Perception
  11. Construction Zone Perception
  12. Adverse Weather Perception
  13. Night Driving Perception
  14. SensorPod Architecture
  15. Dual Compute for Perception
  16. Lightweight Mapping Perception
  17. Military/Defense Perception
  18. VLM (Vision Language Model) Integration
  19. Auto-Labeling and Training Data Pipeline
  20. ARM Safety Metric and Perception's Role
  21. Key Patents
  22. Perception Team

1. "Perception Over Priors" Philosophy

Core Principle

Kodiak's perception system is built on a foundational philosophy the company calls "perception over priors": the Kodiak Driver trusts its eyes (live sensor data) before trusting its memory (pre-built maps or stored expectations). This principle permeates every layer of the perception stack, from sensor fusion to planning-level decision-making, and represents a deliberate departure from the HD-map-dependent approaches used by many autonomous vehicle competitors.

The inspiration is drawn directly from human driving. When a person drives, even if they use a map for navigation, they rely primarily on what they can see through the windshield. If the map says there are two lanes but the driver sees three, the driver trusts their eyes. Kodiak's autonomous system operates the same way.

Why Kodiak Minimizes HD Map Dependency

Traditional HD map approaches suffer from several critical weaknesses that are especially damaging in the long-haul trucking domain:

ProblemImpact on HD Maps
StalenessHD maps go stale almost immediately after creation. Road construction, lane reconfigurations, and seasonal changes can invalidate map data within days
ScaleMaintaining centimeter-accurate HD maps across hundreds of thousands of highway miles requires dedicated mapping fleets and enormous operational cost
Build timeCreating HD maps requires specialized mapping vehicles and many hours of driving; updating them requires repeating the process
BrittlenessSystems that rely heavily on HD maps can fail catastrophically when encountering unmapped changes, because the map is treated as ground truth
Team overheadLarge dedicated mapping teams are required for ongoing maintenance

Kodiak's sparse maps, by contrast, contain only essential highway information measured in kilobytes per mile rather than megabytes:

  • Geometric data: Lane boundary locations
  • Topological data: Road connectivity
  • Semantic data: Speed limits, road attributes

This forces the perception system to shoulder the burden of real-time environment understanding -- and Kodiak considers this a strength, not a compromise.

How Real-Time Perception Replaces Map Priors

When the sparse map and real-time perception disagree, the Kodiak Driver does what a human would do: it trusts what it sees. The system actively detects mismatches between stored map data and reality through continuous comparison. When a discrepancy is identified, the perception system can construct a temporary "on-the-fly" map that is "good enough for the Kodiak Driver to drive safely until the map matches again."

This capability is critical in scenarios such as:

  • Construction zones where lanes have shifted, closed, or been temporarily rerouted
  • New road markings that differ from mapped data
  • Unmapped roads such as the private lease roads in the Permian Basin oilfield
  • Seasonal changes where road features (vegetation, snow cover) alter the environment

The system localizes the truck using lane markings and other perceivable cues rather than relying on absolute map coordinates. If perception conflicts with the pre-built map, the system is designed to trust what it perceives and can share updated map information with the rest of the fleet over the air.

Technical Implications

The "perception over priors" philosophy has several deep technical consequences:

  1. Perception must be self-sufficient: The system cannot lean on map priors to simplify detection; it must detect lane boundaries, road edges, and obstacles entirely from sensor data when maps are absent or stale
  2. Uncertainty must be explicit: Because the system cannot assume the map is correct, every perception output must carry an honest uncertainty estimate that the planner can act on
  3. Rapid route expansion: New routes can be driven without extensive pre-mapping campaigns -- perception handles the unknown
  4. The map is one input among many: The map is never treated as ground truth; it is treated as a prior belief to be confirmed or overridden by live perception

As Kodiak has stated: "The Kodiak Driver can drive safely without maps, but can utilize maps in order to improve performance."


2. Sensor Evolution: Gen 4 - Gen 5 - Gen 6

Generation 4 (2021) -- 14 Sensors

The Gen 4 truck, unveiled in September 2021, established Kodiak's core sensor philosophy with 14 sensors distributed across three mounting locations:

ComponentDetails
Mounting locationsCenter pod (front roofline) + two mirror-mounted SensorPods
LiDAR (long-range)1x Luminar Iris (center pod, forward-facing)
LiDAR (360-degree)2x Hesai spinning LiDAR (mirror pods)
Radar4x ZF Full Range 4D Radar
Cameras7 cameras (wide + narrow FOV)
Total14 sensors

The center pod was a slim-profile enclosure mounted on the front roofline that housed the forward-facing Luminar Iris LiDAR and a wide FOV camera. The mirror pods each contained one Hesai LiDAR, two ZF radars, and cameras.

Key innovation at Gen 4: First integration of ZF 4D Full Range Radar and Luminar Iris LiDAR together, connected to the NVIDIA DRIVE platform.

Generation 5 (April 2023) -- 18 Sensors

Gen 5 represented a major architectural redesign focused on eliminating the center pod and increasing redundancy:

ChangeFrom (Gen 4)To (Gen 5)
Total sensors1418 (+4)
Center podPresentRemoved
Luminar Iris placement1x in center pod2x in mirror SensorPods (one per side)
LiDAR total34 (+1 Hesai or redistributed)
Camera count710 (+3)
Radar count44 (unchanged)
Blind spot camerasNone2x wide-angle on hood-mounted mirrors

Key changes and why:

  1. Center pod elimination: The rooftop center pod added build time, maintenance complexity, and aerodynamic drag. Removing it cut upfit time significantly and simplified field service. All autonomy-critical sensors moved into the mirror-mounted SensorPods.

  2. Luminar Iris doubled: By placing one Luminar Iris in each mirror SensorPod, Gen 5 doubled the long-range LiDAR coverage. Each pod now had its own forward-facing long-range LiDAR, providing redundancy -- if one pod failed, the other still had long-range forward perception.

  3. Three new cameras added: Two wide-angle cameras were added to hood-mounted mirrors to cover blind spots around the cab. A third camera was added for additional coverage, bringing the total to 10.

  4. Ambarella CV2 SoC introduced: The Ambarella CV2 perception system-on-chip was added to handle all camera data processing, improving long-range image quality and unlocking improved dynamic range for nighttime driving.

  5. Processing power gains: 130% more GPU processing power, 60% more CPU capacity, 50% reduction in power system size.

Generation 6 (CES 2024) -- 22 Sensors

Gen 6 was the production-ready, driverless-ready configuration -- the first truck designed from the ground up for scaled deployment without a human driver:

Sensor TypeCountChange from Gen 5
Cameras12+2
LiDAR4Upgraded automotive-grade units
Radar6+2 side-mounted units
Total22+4

Key changes and why:

  1. Two additional radars (side-mounted): Added for long-range side detection, critical for highway merge and lane-change scenarios where vehicles approach from lateral angles.

  2. Two additional cameras: Enhanced coverage for blind spots and supplementary forward/rear perception, bringing total to 12 across multiple FOVs.

  3. Upgraded LiDAR units: Automotive-grade LiDAR sensors in the SensorPods with improved reliability for production deployment.

  4. Full redundancy achieved: Every safety-critical function (braking, steering, power, compute, perception) has full redundancy. Triple-redundant braking with three pneumatic actuators. Dual-redundant steering with two ZF actuators. Fully isolated power subsystems.

  5. Compute improvements over Gen 1: 2x GPU processor cores, 1.6x processing speed, 3x memory, 2.75x bandwidth.

  6. Ambarella CV3-AD685 integration planned: Next-generation AI domain controller SoC for continuous perception and ML improvement.

  7. Redundant LTE links: Dual cellular connections to command centers in Texas and California.

Sensor Count Summary

GenerationYearCamerasLiDARRadarTotalCenter Pod
Gen 4202173414Yes
Gen 52023104418No
Gen 62024124622No

3. Luminar Iris LiDAR Integration

Sensor Specifications

The Luminar Iris is a 1550nm wavelength LiDAR specifically designed for automotive-grade autonomous driving applications.

SpecificationValue
Wavelength1550 nm (eye-safe; 17x more photons allowed than sub-1000nm wavelengths)
Maximum range600 meters
Resolution>300 points per square degree (camera-like)
Horizontal FOV120 degrees
Vertical FOVUp to 26-28 degrees (dynamic)
Scanning method2-axis scanning mirrors (scans only the laser, not the entire device)
Operating temperature-40C to +85C
Form factorAutomotive-grade, compact
Reflectance dataHigh-resolution reflectance provides an additional imaging dimension

Why 1550nm Matters for Trucking

The 1550nm wavelength is critical for Kodiak's highway trucking application because:

  • Eye safety at high power: International eye safety standards allow 1550nm lasers to emit 17x more photons than 905nm lasers. More photons means more returned signal at long range, which directly enables the 600m detection capability.
  • Long-range detection: At highway speeds of 65-80 mph, a fully loaded Class 8 truck needs 500+ feet to stop. The 600m range provides roughly 6-8 seconds of detection time at highway speed, giving the system adequate time to plan and execute avoidance maneuvers.
  • Reflectance imaging: The Iris measures the amount of energy reflected from each target, producing a camera-like reflectance image that provides an additional dimension beyond 3D point positions. This reflectance data improves detection and classification algorithms by providing texture-like information.

Integration in the Kodiak Driver

Gen 4 placement: Single Luminar Iris in the center pod, providing one forward-facing long-range LiDAR channel.

Gen 5+ placement: Two Luminar Iris units, one in each mirror-mounted SensorPod. This change doubled long-range LiDAR coverage and provided critical redundancy -- if one SensorPod is damaged or obstructed, the other still has full long-range forward perception.

Point Cloud Processing

The Iris generates dense, high-resolution point clouds at ranges up to 600m. The processing pipeline involves:

  1. Raw point cloud ingestion: The LiDAR produces point cloud frames with (x, y, z) position, intensity/reflectance, and timing data
  2. Multiple detector processing: Each LiDAR frame is processed through multiple independent software detectors, each using different methodologies to identify objects
  3. Detection output: Detectors output object hypotheses with position, size, classification, and confidence estimates
  4. Fusion input: Detector outputs are passed to the Kodiak Vision Tracker in a standardized information-theoretic format for fusion with other sensor modalities

The Iris's camera-like resolution of 300+ points per square degree means that at long range (200-600m), objects such as pedestrians, motorcycles, and stopped vehicles are represented by enough points for reliable detection and classification -- essential for highway safety where early detection of distant hazards is paramount.


4. Hesai 360-Degree LiDAR

Role in the Sensor Suite

While the Luminar Iris sensors provide high-resolution forward detection out to 600m, the two Hesai 360-degree spinning LiDARs provide comprehensive surround coverage -- detecting objects on the sides, rear, and near-field around the truck.

Complementary Coverage Architecture

The Luminar-Hesai pairing creates a layered LiDAR coverage architecture:

Coverage ZonePrimary SensorRangePurpose
Forward long-range (0-600m)Luminar Iris (x2)Up to 600mEarly detection of distant hazards, stopped vehicles, pedestrians
Surround / 360-degreeHesai (x2)Full 360-degreeSide, rear, and near-field coverage; lane-change safety; merge detection

The Hesai LiDARs are mounted in the mirror SensorPods (one per pod), providing spinning 360-degree coverage from each side of the truck cab. Together, the two units create overlapping 360-degree coverage around the entire vehicle.

Processing Pipeline

The Hesai LiDAR data follows the same standardized processing pipeline as Luminar data:

  1. Point cloud frames are generated at high frequency with full 360-degree coverage
  2. Multiple detectors process each frame independently
  3. Detector outputs are formatted in the common information-theoretic language defined by Kodiak Vision
  4. The Tracker fuses Hesai detections with Luminar, radar, and camera detections

Significance of Hesai's Capabilities

Hesai's 360-degree LiDARs (likely from their Pandar or QT/OT series) provide:

  • Full surround awareness: Critical for highway driving where vehicles merge from adjacent lanes, approach from behind, or occupy blind spots alongside the 53-foot trailer
  • Near-field coverage: Detecting objects close to the truck that may be outside the Luminar Iris FOV (e.g., vehicles in adjacent lanes, pedestrians near the cab during low-speed maneuvers)
  • Redundancy: Even if both Luminar Iris units are degraded (e.g., by weather), the Hesai LiDARs provide fallback 3D perception in all directions
  • Automotive-grade reliability: Designed for the harsh vibration, temperature, and duty-cycle requirements of commercial trucking

The partnership between Kodiak and Hesai was announced in March 2021, with Hesai LiDARs integrated into the mirror-mounted SensorPods to enhance side and rear detection capability and provide redundancy for the Kodiak Vision perception system.


5. ZF 4D Imaging Radar

What Makes It "4D"

Traditional automotive radar operates in three dimensions: horizontal position (azimuth), distance (range), and velocity (Doppler). The critical limitation is that traditional radar cannot measure the vertical position (elevation) of an object -- it sees the world effectively from a top-down view.

The ZF Full Range Radar adds elevation as a fourth dimension, measuring:

  1. Range (distance to object)
  2. Azimuth (horizontal angle)
  3. Elevation (vertical angle / height)
  4. Doppler (radial velocity)

This fourth dimension is transformative for highway trucking perception because it resolves a class of problems that are otherwise extremely dangerous.

Technical Architecture

SpecificationValue
MMIC configuration4 cascaded Monolithic Microwave Integrated Circuits
Channel count192 channels (16x more than traditional 12-channel automotive radar)
Frequency band77 GHz
ModulationFast Ramp FMCW (Frequency Modulated Continuous Wave)
Detection range300+ meters (velocity tracking up to 350m)
Aperture angle+/- 60 degrees
Weather resilience~4mm wavelength passes through rain and fog with minimal power loss

The four cascaded MMICs, each with multiple transmit and receive channels, produce 192 total channels -- vastly more than the 12 channels in traditional automotive radar. This dramatically increases angular resolution in both azimuth and elevation, enabling the radar to produce a high-resolution "image" of the environment rather than a coarse set of detections.

Critical Use Cases for Trucking

The 4D capability resolves scenarios that are uniquely dangerous for autonomous trucks on highways:

Scenario3D Radar Response4D Radar Response
Overhead bridgeDetects object ahead, may emergency brakeMeasures elevation, determines bridge is overhead, continues driving
Road sign spanning highwayAppears as large obstacle blocking roadElevation data shows it is above the vehicle, no braking needed
Stopped vehicle under a bridgeCannot distinguish from the bridge itselfSeparate elevation measurements for the vehicle (road level) and bridge (overhead)
Disabled vehicle on shoulder under a signAmbiguous radar return at 270mResolves the vehicle as a separate road-level object from the overhead sign

How 4D Radar Data Is Processed and Fused

The ZF radar data is processed through the same multi-detector, information-theoretic pipeline as other sensor modalities in Kodiak Vision:

  1. Raw radar output: 4D radar produces detection lists with range, azimuth, elevation, and Doppler for each detected target
  2. Detector processing: Multiple software detectors independently analyze the radar data, producing object hypotheses with uncertainty estimates
  3. Velocity information: Radar's native Doppler measurement provides direct velocity data, enabling the system to accurately track vehicle velocities up to 350m -- a capability neither LiDAR nor cameras can match natively
  4. Fusion: Radar detections are expressed in the shared information-theoretic language and fused with LiDAR and camera detections in the Kodiak Vision Tracker

Kodiak explicitly treats radar as a primary sensor, not a secondary complement to LiDAR. The "shared language" between sensor types means radar data carries equal weight in the fusion pipeline, with the Tracker assessing which sensor's evidence best explains the observed reality.

Weather Robustness

Radar's ~4mm wavelength passes through rain, fog, and dust with minimal signal attenuation. This makes radar the most weather-robust sensor in the Kodiak suite:

  • Camera: Degraded by rain, fog, dust, and darkness
  • LiDAR: Degraded by heavy rain, fog, and dust (1550nm less affected than 905nm, but still impacted)
  • Radar: Minimal degradation in most adverse weather; severe rainfall can reduce range by up to 55%, but it remains functional

This weather resilience makes 4D radar an essential safety layer for the Permian Basin (dust storms) and Texas Gulf Coast (thunderstorms) operations.


6. Camera Perception

Camera Configuration (Gen 6)

The Gen 6 truck features 12 cameras providing comprehensive visual coverage:

FOV TypePurposePlacement
Narrow FOVLong-range forward detection; reading signs; identifying distant objectsSensorPods (forward-facing)
Wide FOVNear-field and peripheral coverage; lane-change awarenessSensorPods
Blind spot camerasDetecting objects in cab blind spotsHood-mounted mirrors (added Gen 5)
Supplementary camerasAdditional coverage angles for 360-degree visual perceptionVarious mounting points

The cameras span multiple spectral bands and provide both long-range (narrow FOV for highway distances) and near-field (wide FOV for adjacent lanes and close maneuvering) visual perception.

Camera Processing Architecture

All camera data processing runs on dedicated hardware:

  • Gen 5: Ambarella CV2 SoC handles all camera data processing, with its on-chip ISP providing improved image quality for longer-range detections and improved dynamic range for nighttime driving
  • Gen 6+: Integration of Ambarella CV3-AD685, which provides next-generation CVflow AI engine with neural vector processor 20x faster than CV2, along with premium image signal processing through rain, fog, and darkness

The camera processing pipeline involves:

  1. Image signal processing (ISP): Raw sensor data is processed through the Ambarella ISP for white balance, exposure, HDR tone mapping, and noise reduction -- critical for maintaining image quality across lighting conditions (Texas sun, night, rain, tunnel transitions)
  2. Feature extraction: Deep neural networks extract features from camera images, running on the CVflow AI engine's neural vector processor
  3. Multiple detector inference: Camera images pass through multiple independent detectors for object detection, lane detection, and semantic segmentation
  4. Detection output: Detector outputs are formatted for the Kodiak Vision Tracker

Highway-Specific Camera Challenges

Highway trucking presents unique challenges for camera perception that differ from urban robotaxi scenarios:

  • Extreme range: Objects of interest (vehicles, pedestrians, debris) can be 300-600m ahead, appearing as only a few pixels in camera images. Narrow FOV cameras with high resolution are essential for detecting and classifying these distant objects.
  • Speed-induced motion blur: At 65-80 mph relative speeds, camera exposure times must be carefully managed to avoid motion blur while maintaining adequate light capture
  • Lighting extremes: Texas highways experience intense direct sunlight, deep shadows under overpasses, and rapid transitions between them. HDR processing on the Ambarella ISP handles these extremes.
  • Night driving: Camera effectiveness drops significantly at night; the system relies more heavily on LiDAR and radar while the Ambarella ISP's enhanced dynamic range maximizes remaining camera utility
  • Glare: Oncoming headlights and low-angle sun produce intense glare that can saturate camera sensors

Camera Role in Sensor Fusion

Cameras provide capabilities that LiDAR and radar cannot:

  • Color information: Essential for reading traffic signs, construction zone markings, brake lights, and emergency vehicle identification
  • Texture and classification: Fine-grained object classification (e.g., distinguishing a person from a pole, reading text on signs)
  • Lane marking detection: Camera-based lane detection provides the primary input for localization (Kodiak localizes based on perceived lane markings, like a human driver)
  • Semantic understanding: Cameras feed the VLM pipeline for higher-level scene understanding

7. Multi-Sensor Fusion: Kodiak Vision

Design Philosophy: No Sensor Hierarchy

Most autonomous vehicle companies adopt a "LiDAR-first" approach, treating LiDAR as the primary sensor and using cameras and radar to fill gaps. Kodiak explicitly rejects this hierarchy. Kodiak Vision treats all sensors -- LiDAR, camera, and radar -- as primary, taking into account the unique properties of each.

This design choice is grounded in the observation that each sensor modality has fundamental strengths and weaknesses:

SensorStrengthsWeaknesses
LiDARPrecise 3D geometry, range measurement, works day/nightDegraded by weather, no color, no native velocity
CameraColor, texture, classification, sign reading, lane markingsNo native depth, degraded at night, affected by glare/weather
RadarNative velocity, weather-robust, long rangeLower spatial resolution (even with 4D), phantom detections

By treating all three as primary and creating a shared mathematical language between them, Kodiak maximizes the information extracted from every sensor reading.

The Shared Language: Information-Theoretic Framework

The Kodiak Vision architecture defines a common interface for all detectors, based on the fundamental mathematics of information theory. This shared language allows every detector to:

  1. Describe what it sees: Object detections, ranges, velocities, accelerations, vehicle boxes, and other attributes
  2. Honestly assess uncertainty: Each detection comes with an explicit confidence/uncertainty estimate that reflects the physical limitations of the sensor and the ambiguity of the measurement
  3. Account for sensor peculiarities: The interface accommodates the unique properties of each sensor type (e.g., radar natively measures velocity; cameras natively provide color)

This is inspired by the work of information theory pioneers -- Shannon, Wiener, Kolmogorov, and Chapman -- and draws from mathematical frameworks originally developed for missile defense and GPS systems.

Multiple Detectors Per Sensor

A critical architectural decision in Kodiak Vision is that every sensor measurement passes through multiple independent software detectors. Different detectors may process the same raw data using different algorithms, architectures, or methodologies:

  • One detector might use a deep neural network optimized for vehicle detection
  • Another might use a different architecture optimized for pedestrian detection
  • A third might use a classical point cloud processing approach

This diversity provides defensive coverage: "Having multiple sensors and detectors working in parallel can help us determine with high probability that objects will be detected." If one detector misses an object, another is likely to catch it.

Coordination Between Luminar (Forward) and Hesai (Surround)

The Luminar Iris and Hesai LiDAR sensors serve complementary roles:

  • Luminar Iris (x2): Forward-facing long-range detection up to 600m. Provides the earliest possible detection of hazards in the truck's path -- critical at highway speeds where reaction time and stopping distance are measured in hundreds of meters.
  • Hesai (x2): 360-degree surround coverage. Provides awareness of vehicles in adjacent lanes, behind the truck, and in the near-field blind spots around the cab and trailer.

The Tracker fuses detections from both LiDAR types seamlessly. An object first detected by Luminar at 400m (forward LiDAR only) will be picked up by Hesai as it enters the surround coverage zone. The Tracker maintains a continuous track across this transition, updating its hypotheses with information from both sources.

Cross-Modal Verification

When multiple sensor modalities detect the same object, the Tracker gains increased confidence. A vehicle detected by Luminar LiDAR (3D shape), ZF radar (velocity), and camera (visual classification) has been cross-verified by three independent physics -- light time-of-flight, radio frequency reflection, and optical imaging. This cross-modal verification is a core safety feature, making false negatives (missed detections) and false positives (phantom objects) far less likely.

Perception-to-Planning Communication

Kodiak Vision maintains a continuous dialogue with the planning system. When the perception system is less than fully confident about what it sees, it tells the planner to be cautious, taking into account best- and worst-case scenarios. This uncertainty-aware interface causes the planner to:

  • Maintain a longer following distance
  • Avoid lane changes near uncertain objects
  • Reduce speed when visibility or detection confidence is degraded
  • Flag objects for extra scrutiny

This defensive driving behavior mirrors how a careful human driver responds to ambiguous situations.


8. 3D Object Detection

Detection Architecture

Kodiak's 3D object detection pipeline is characterized by its multi-detector, multi-sensor approach rather than relying on a single monolithic detection network. The perception system processes every sensor measurement through multiple detectors -- software tools that process sensor data to identify what and where potential objects are.

Each detector independently analyzes sensor data and produces:

  • 3D bounding boxes with position (x, y, z) and dimensions (length, width, height)
  • Object classification (vehicle, pedestrian, cyclist, construction object, debris, etc.)
  • Confidence score and uncertainty estimates
  • Additional attributes such as velocity, acceleration, heading

Trucking-Specific Detection Requirements

Autonomous trucking on highways has detection requirements that differ significantly from urban robotaxi operations:

RequirementUrban DrivingHighway Trucking
Detection range50-150m typical200-600m required
Primary objectsPedestrians, cyclists, carsVehicles (especially large trucks), stopped/disabled vehicles, road debris
Speed regime0-35 mph55-80 mph
Stopping distance20-100 feet300-500+ feet (loaded Class 8 at highway speed)
Object diversityHigh (pedestrians, bikes, scooters, etc.)Lower diversity but higher stakes per detection
Critical missesPedestrian at crosswalkStopped vehicle in lane at 600m; tire tread/debris in lane

Long-Range Detection Strategy

At 600m (the maximum range of the Luminar Iris), a typical sedan subtends only a small number of LiDAR points. The detection system must reliably identify objects from sparse point returns at extreme range and then refine its understanding as the truck closes distance:

  • 600m: Initial detection from Luminar point cloud -- few points, high uncertainty about object type and exact dimensions
  • 400m: Increasing point density; classification confidence improves; cameras with narrow FOV begin to resolve the object
  • 300m: Radar velocity data confirms whether the object is moving, stopped, or slow-moving; 4D radar provides elevation data
  • 200m: Full multi-sensor coverage with high-density LiDAR, clear camera imagery, precise radar velocity; object is fully characterized
  • 100m: Near-field; detailed shape, behavior prediction, and trajectory planning with high confidence

This progressive refinement is fundamental to safe highway operation -- the system has multiple opportunities to detect and classify an object before the truck reaches it.

Deep Learning for Detection

Kodiak develops deep neural networks focused on key robotics perception problems:

  • 3D object detection: Identifying and localizing objects in 3D space from point clouds and images
  • 3D scene understanding: Holistic understanding of the driving environment beyond individual objects
  • 3D world reconstruction: Building a coherent 3D model of the environment from multi-sensor data
  • Self-supervised learning: Learning representations from unlabeled data to improve data efficiency
  • End-to-end learning: Exploring learned approaches that directly map sensor inputs to driving-relevant outputs

9. Object Tracking: The Kodiak Vision Tracker

Architecture

The Kodiak Vision Tracker is the central sensor fusion and tracking component of the perception stack. It is responsible for:

  1. Fusing every detection from every sensor into a coherent world model
  2. Combining detections over time to track every object on the road
  3. Maintaining continuous tracks across sensor transitions (e.g., an object moving from Luminar-only range into Hesai coverage)
  4. Estimating object state including position, velocity, acceleration, heading, and dimensions -- with explicit uncertainty

Information-Theoretic Foundation

The Tracker defines a common interface for detectors based on the fundamental mathematics of information theory. This is not a heuristic system -- it is built on "fundamental mathematics instead of clever software tricks," drawing from information theory pioneers including Shannon, Wiener, Kolmogorov, and Chapman.

This mathematical rigor has a practical consequence: the Tracker can accept any detector output expressed in the shared information-theoretic language without requiring code changes. Adding a new sensor type, a new detector, or upgrading an existing detector does not require modifying the Tracker.

How the Tracker Operates

The Tracker runs at 20 Hz (twenty times per second). Each cycle:

  1. Ingest new detections: The Tracker receives detection outputs from all active detectors across all sensors
  2. Hypothesis refinement: Using the new evidence, the Tracker refines its hypotheses about every tracked object. It assesses what conclusion best explains the whole body of evidence "without prejudice, rules, or hard-coding"
  3. Likelihood assessment: The Tracker determines how likely each hypothesis is and identifies likely alternate possibilities
  4. Uncertainty propagation: For each tracked object, the Tracker maintains an explicit representation of what it knows, what it does not know, and how confident it is
  5. World model output: The Tracker submits a detailed world description to the autonomy stack for prediction and planning

Handling Uncertainty and Occlusion

One of the Tracker's most distinctive capabilities is its honest treatment of uncertainty. When a vehicle is partially occluded (e.g., hidden behind another vehicle), the Tracker does not assume standard vehicle dimensions. Instead, it "makes an honest assessment consistent with the data and calculates their (probably high) level of uncertainty."

For example, if a vehicle is partially visible behind another:

  • The Tracker knows one vehicle is hidden behind another
  • It knows the precise location of the visible vehicle surface closest to the truck
  • Instead of guessing the occluded vehicle's full dimensions, it maintains high uncertainty about the hidden portions
  • This uncertainty is communicated to the planner, which responds with additional caution (e.g., extra following distance, avoiding lane changes toward the uncertain area)

Adaptive Calibration

After every drive, the system analyzes Tracker performance and identifies systematic biases in individual detectors: "You were systematically biased by a few centimeters; please adjust your measurements accordingly." This post-drive calibration allows continuous improvement of individual detector accuracy. Kodiak has indicated that future versions will apply these corrections in real-time.

Highway-Speed Tracking Challenges

Tracking at highway speeds (60-80 mph) introduces specific challenges:

  • High closing speeds: When the truck overtakes slower traffic or encounters stopped vehicles, relative velocities can exceed 80 mph. The Tracker must maintain tracks on objects whose range decreases rapidly.
  • High-speed ego motion: The truck itself is moving at high speed, requiring precise ego-motion estimation to separate true object movement from apparent movement caused by the truck's own velocity.
  • Long prediction horizons: At highway speeds, the planner needs predictions of where other vehicles will be several seconds into the future. The Tracker's state estimates (position, velocity, acceleration) feed directly into trajectory prediction.
  • Lane-change detection: Adjacent vehicles changing lanes must be detected early through subtle lateral velocity changes, requiring high-precision tracking of lateral position over time.

The 20 Hz update rate provides a new world model every 50 milliseconds, enabling the system to respond quickly to sudden changes such as emergency braking or abrupt lane changes by other vehicles.


10. Highway-Specific Perception

How Trucking Perception Differs from Urban Driving

Kodiak's operational design domain (ODD) is highway and freeway driving for long-haul trucking -- explicitly not urban city environments. This domain focus shapes every aspect of the perception system:

AspectUrban AV PerceptionKodiak Highway Perception
Primary range0-150m0-600m
Speed regime0-35 mph55-80 mph
Key objectsPedestrians, cyclists, scootersVehicles, trucks, road debris, stopped vehicles
Map dependencyHeavy (HD maps of every intersection)Minimal (sparse maps, perception fills gaps)
Lane structuresComplex (intersections, crosswalks, turn lanes)Simpler (lane following, merges, exits)
Traffic signalsCritical (lights, signs, hand signals)Minimal (no traffic lights; speed limits, lane markings)
Pedestrian frequencyVery highVery low (but critical when present)
Stopping distanceShort (low speed)Very long (500+ feet loaded at highway speed)

Long-Range Perception Chain

The 600m Luminar Iris range establishes a long-range perception chain that is critical at highway speeds:

At 65 mph, the truck covers approximately 95 feet per second. A fully loaded 80,000-pound Class 8 truck requires approximately 500-700 feet to stop. The 600m (~1,970 feet) detection range provides approximately 14-20 seconds of advance warning -- enough time for the system to:

  1. Detect the hazard at extreme range
  2. Classify and refine understanding as the truck closes distance
  3. Evaluate multiple response options (lane change, gradual deceleration, emergency braking)
  4. Execute the safest maneuver well before the stopping-distance threshold

Merge and Lane-Change Detection

Highway merges and lane changes are among the most common interactions on freeways and require specific perception capabilities:

  • Adjacent vehicle tracking: The Hesai 360-degree LiDAR and side-mounted radar (added in Gen 6) continuously track vehicles in adjacent lanes
  • Lateral velocity detection: The Tracker monitors subtle lateral velocity changes that indicate a vehicle is beginning to change lanes, even before the vehicle crosses the lane marking
  • Merge zone awareness: When approaching highway on-ramps, the system monitors merging traffic and adjusts speed and position to facilitate smooth merges
  • Gap assessment: The system evaluates gaps in adjacent lanes for safe lane-change opportunities, considering both current positions and predicted trajectories of nearby vehicles

Following Distance Management

The multi-sensor perception system continuously monitors following distance:

  • Radar provides direct velocity: The ZF 4D radar provides native Doppler velocity measurements of the lead vehicle, enabling precise calculation of time-to-collision and following distance
  • LiDAR provides precise range: The Luminar Iris provides sub-centimeter range accuracy to the lead vehicle
  • Uncertainty-driven spacing: When perception confidence is reduced (e.g., due to weather degradation), the planner automatically increases following distance

ODD: 95% of Weather Conditions

Kodiak has stated that their operational design domain includes driving on the freeway -- including construction zones -- in 95% of all weather conditions. This means the perception system is designed to operate in all but the most extreme weather events (e.g., zero-visibility dust storms, blizzard whiteout conditions), with the system designed to pull over and stop when conditions exceed its safe operating envelope.


11. Construction Zone Perception

The Challenge

Construction zones are among the toughest challenges for self-driving vehicles, and they are particularly prevalent on Texas highways where Kodiak operates. Lanes close down, marked by a variety of methods: traffic cones, construction vehicles, heavy concrete barriers, barrels, temporary lane markings, and even construction workers directing traffic. The road geometry can change drastically from what is mapped.

Perception-Driven Navigation

Kodiak's approach to construction zones is a direct application of the "perception over priors" philosophy:

  1. Detection of construction elements: The perception system must detect and classify a wide range of construction-specific objects:

    • Traffic cones (standard orange, various sizes)
    • Construction barrels (channelizing drums)
    • Concrete jersey barriers
    • Construction vehicles (stationary and moving)
    • Construction workers
    • Temporary signage
    • Lane dividers and delineators
  2. Temporary lane marking detection: When construction zones introduce new lane markings (painted or taped) that differ from the mapped configuration, the camera-based lane detection system identifies the temporary markings and trusts them over the map. The system localizes using the lane markings it perceives, regardless of what the map says.

  3. Traffic shift detection: When lanes are shifted laterally due to construction (a common configuration on Texas highways where traffic is moved to the opposite side of a divided highway), the perception system detects the shift through:

    • Cone/barrier lines defining the new lane boundaries
    • Temporary lane markings on the shifted roadway
    • Lead vehicle trajectories (following the path of traffic ahead)
  4. Real-time map updates: When the Kodiak Driver detects a construction zone or traffic shift that conflicts with its stored map, it builds a temporary on-the-fly map that is "good enough for the Kodiak Driver to drive safely until the map matches again." This updated construction zone information is shared with the rest of the fleet over the air, so subsequent trucks benefit from the first truck's perception.

Multi-Sensor Approach to Construction Detection

Each sensor modality contributes to construction zone perception:

  • Cameras: Detect and classify cones, barrels, signs, workers (color and texture are essential for classification); read temporary construction signage; detect temporary lane markings
  • LiDAR: Detect the 3D geometry of cones, barriers, and construction vehicles; provide precise ranging to construction zone boundaries
  • Radar: Track construction vehicles and moving workers; detect vehicles within the construction zone at long range; function through dust kicked up by construction activity

Scale AI Synthetic Data for Construction

Kodiak uses Scale AI's human-in-the-loop synthetic data generation to augment training data for construction zone scenarios. Scale Synthetic specifically addresses construction worker detection scenarios, generating synthetic pedestrians in construction gear with validated placement and poses to ensure realism. This is critical because construction workers on highways are relatively rare in real-world driving data but represent a high-stakes detection requirement.


12. Adverse Weather Perception

Multi-Sensor Weather Resilience

Kodiak's multi-sensor architecture provides inherent weather resilience because different sensor modalities are affected differently by adverse weather:

ConditionCamera ImpactLiDAR ImpactRadar Impact
Heavy rainSignificant (visibility, droplets on lens)Moderate (scattering, reduced range)Moderate (up to 55% range reduction in severe rainfall)
FogSevere (visibility < 200m)Severe (scattering)Minimal (4mm wavelength passes through fog)
DustSevere (visibility, lens contamination)Moderate to severeMinimal
Bright sun / glareSevere (saturation, lens flare)NoneNone
DarknessSevere (reduced image quality)None (active sensor)None (active sensor)
SnowModerate (visibility, road marking occlusion)Moderate (scattering from flakes)Low to moderate

Sensor Degradation and Graceful Fallback

The Kodiak Driver is designed to degrade gracefully rather than fail catastrophically:

  1. Sensor health monitoring: The system continuously monitors the health of all 22 sensors at 10 Hz (as part of the 1,000+ safety-critical component checks). This includes detecting sensor degradation before it becomes critical.

  2. Dynamic sensor weighting: When a sensor modality is degraded (e.g., cameras blinded by fog), the Kodiak Vision Tracker naturally down-weights the degraded sensor's contributions. The Tracker's information-theoretic framework handles this automatically -- less certain detections from a degraded sensor carry less weight in the fusion.

  3. Radar as weather anchor: In the worst weather conditions, radar remains the most reliable sensor due to its 4mm wavelength. The ZF 4D radar can continue to detect and track vehicles with velocity information even when cameras and LiDAR are severely degraded.

  4. Conservative planning response: When overall perception confidence drops due to weather, the planner adopts more conservative behavior: increased following distance, lower speed, avoidance of lane changes.

  5. Pullover decision: If sensor degradation exceeds the safe operating envelope (the system's ODD covers 95% of weather conditions), the Kodiak Driver executes a controlled pullover to a safe location. This is a planned, calm maneuver -- not an emergency stop.

SensorPod Self-Cleaning System

The SensorPods include an automated self-cleaning system that helps maintain sensor performance during adverse weather:

  • One of the three cables connecting each SensorPod carries water for lens washing
  • Another cable carries compressed air for drying and clearing debris
  • This en-route self-cleaning system operates automatically during driving, clearing raindrops, dust, mud, and insect strikes from camera lenses and LiDAR windows

This is particularly important for Permian Basin operations where constant dust from oilfield roads can rapidly contaminate sensor optics.

Texas Weather Testing

Kodiak has tested its trucks in real-world adverse weather conditions including Texas thunderstorms and extreme heat. The system operates in conditions ranging from the intense dust of West Texas oilfields to Gulf Coast thunderstorms, providing a broad real-world weather validation dataset.


13. Night Driving Perception

The Night Driving Challenge

Night driving is a critical capability for autonomous trucking because trucks generate revenue by operating around the clock. Kodiak's trucks operate up to 24/7 in the Permian Basin and target near-continuous operation on long-haul routes. However, night driving significantly reduces camera effectiveness -- the very modality that provides color, texture, and classification data.

How LiDAR and Radar Compensate

LiDAR and radar are active sensors that emit their own energy (laser light and radio waves, respectively) and do not depend on ambient illumination:

  • Luminar Iris (1550nm LiDAR): Performance is unaffected by ambient lighting. The sensor emits its own laser pulses and measures reflections. Detection range, point density, and accuracy are identical day and night. This makes LiDAR the primary geometric perception sensor at night.

  • Hesai 360-degree LiDAR: Similarly unaffected by darkness. Full surround 3D coverage operates identically at night.

  • ZF 4D Radar: Completely unaffected by lighting conditions. Continues to provide velocity, range, azimuth, and elevation measurements at full capability. At night, radar's weather-like robustness extends to providing consistent performance when cameras are at their weakest.

Camera Nighttime Enhancements

While cameras are degraded at night, Kodiak has made specific investments to maximize their nighttime utility:

  • Ambarella CV2 ISP: The Ambarella CV2 SoC (introduced in Gen 5) includes an image signal processor that "improves image quality for longer range detections and unlocks improved dynamic range for nighttime driving." The improved dynamic range is critical for handling the contrast between headlights/taillights and dark road surfaces.

  • Ambarella CV3-AD685 ISP: The next-generation Ambarella SoC provides even more advanced image signal processing capabilities, described as providing "premium image quality processing through rain, fog and darkness," with outstanding imaging in low-light conditions including high dynamic range (HDR) processing.

  • HDR processing: Highway night driving involves extreme contrast ratios -- bright oncoming headlights, red taillights, reflective signs, and near-zero ambient light on the road surface. HDR processing on the Ambarella ISP captures multiple exposures and combines them to reveal detail in both bright and dark regions simultaneously.

Night Driving Perception Strategy

At night, the perception system effectively shifts its sensor priority:

TimePrimary PerceptionSecondary PerceptionReduced Role
DayLiDAR + Camera + Radar (all balanced)----
NightLiDAR + Radar (active sensors)Camera (with HDR ISP enhancement)Camera long-range classification

The Kodiak Vision Tracker automatically handles this shift through its information-theoretic framework -- camera detections at night naturally carry higher uncertainty, and the Tracker down-weights them accordingly while relying more heavily on LiDAR and radar detections.

Night-Specific Scenarios

  • Unlit highway sections: The truck's LiDAR provides full 3D perception regardless of road lighting. Radar provides velocity data for surrounding traffic.
  • Headlight glare from oncoming traffic: HDR camera processing mitigates glare while maintaining detection of other objects in the scene. LiDAR and radar are unaffected.
  • Animals on the road: Animals (deer, livestock) are detected by LiDAR at night even when camera detection is unreliable. The Luminar Iris's 600m range provides early detection.
  • Unlit pedestrians: Rare on highways but critical when present. LiDAR detects any solid object regardless of clothing color or reflectivity. The reflectance channel of the Iris provides additional classification data.

14. SensorPod Architecture

Patent-Pending Design

Kodiak's SensorPod is a patent-pending modular sensor enclosure that replaces the truck's stock side-view mirrors. Each SensorPod integrates all sensors needed for autonomous driving into a single, field-swappable unit.

Contents of Each SensorPod (Gen 5/6)

ComponentCountPurpose
Luminar Iris LiDAR1Forward-facing long-range detection (up to 600m)
Hesai 360-degree LiDAR1Surround coverage (side and rear)
ZF Full Range 4D Radar2Long-range detection with velocity, 300m+ range
Cameras3Wide and narrow FOV visual perception

Each truck has two SensorPods (left and right mirror), providing full 360-degree coverage with overlapping fields of view. This means that even if one SensorPod fails entirely, the remaining SensorPod provides forward-facing LiDAR, surround LiDAR, radar, and camera coverage -- degraded but still functional.

Three-Cable Interface

The SensorPod connects to the truck via exactly three cables:

CableFunction
Cable 1Power + Data (single cable carrying both electrical power and data communications)
Cable 2Water (for automated lens/window washing during driving)
Cable 3Compressed air (for drying and clearing debris from sensor optics)

This three-cable design is deliberately minimal to enable rapid field swaps.

Field Swappability

The SensorPod can be replaced by any truck mechanic without specialized training or equipment:

  1. Remove the access cover
  2. Disconnect three cables
  3. Remove three bolts
  4. Reverse the process with the replacement SensorPod

Total swap time: under 10 minutes. Kodiak describes this as "as easy as changing a tire -- but faster."

Pre-Calibration

Every SensorPod ships pre-built and pre-calibrated. This eliminates the need for specialized sensor calibration equipment in the field -- a significant operational advantage because long-haul trucks can break down or need service in remote locations (e.g., rural West Texas) where specialized autonomous vehicle technicians are unavailable.

The pre-calibration means the replacement SensorPod is immediately ready for operation after installation, without any alignment or calibration procedures.

Enabling OEM-Agnostic Deployment

The SensorPod architecture is central to Kodiak's vehicle-agnostic strategy:

  • Standard mirror mount: SensorPods replace stock mirrors, which are a standardized mounting point across all truck manufacturers (Kenworth, Peterbilt, Freightliner, Volvo, etc.)
  • Self-contained sensing: All perception sensors are inside the pods, so no vehicle-specific sensor mounting is needed
  • Standardized interface: The three-cable interface (power/data, water, air) is simple enough to adapt to any truck platform
  • No structural modifications: No roof-mounted hardware, no drilling into the cab structure, no vehicle-specific brackets

This design allows Kodiak to deploy on different truck platforms by designing only the SensorPod mounting brackets for each truck model, rather than redesigning the entire sensor suite.

Evolution Across Generations

GenerationSensorPod Configuration
Gen 3 (2021)Mirror pods introduced; center pod still present for Luminar Iris
Gen 4 (2021)Mirror pods with Hesai LiDAR, ZF radar, cameras; Luminar in center pod
Gen 5 (2023)Center pod eliminated; Luminar Iris moved into mirror SensorPods; all autonomy sensors now in SensorPods
Gen 6 (2024)Upgraded automotive-grade sensors; additional side radar; pre-calibrated production-ready units with top-mounted hazard lights

DefensePods for Military

Kodiak developed DefensePods -- a military-adapted version of the SensorPods designed for defense applications. These are described as "the industry's only hardened solution built to be reliable, serviceable, and maintainable on any ground mission," with the ability to be maintained with minimal training even in the field during operations.


15. Dual Compute for Perception

Architecture Overview

Kodiak's onboard compute architecture separates perception/autonomy processing from safety-critical vehicle control:

SystemHardwareFunction
Primary Autonomy ComputeNVIDIA DRIVE Orin + Ambarella CV3-AD685Full perception, prediction, planning stack
ACE Safety ComputerNXP S32G3, S32K3, VR5510Independent fallback; vehicle actuation

NVIDIA DRIVE Orin -- Primary GPU Compute

The NVIDIA DRIVE Orin provides the high-performance GPU compute needed for:

  • Real-time multi-sensor fusion across 22 sensors
  • Deep learning inference for LiDAR-based 3D object detection
  • Deep learning inference for radar processing
  • Motion planning computation
  • Prediction model inference
  • Localization computation

The open and scalable NVIDIA DRIVE platform allows Kodiak to iterate on perception algorithms while maintaining production-grade safety and security. The GPU-heavy workloads of point cloud processing, neural network inference, and sensor fusion are well-suited to Orin's massively parallel GPU architecture.

Ambarella CV3-AD685 -- Camera and AI Processing

The Ambarella CV3-AD685, selected in January 2024, complements the NVIDIA Orin by providing:

CapabilityDetails
Process node5nm, purpose-built for AV workloads
CVflow AI engineNeural vector processor 20x faster than previous CV2 generation
Camera processingOn-chip ISP for all camera data processing; premium image quality through rain, fog, and darkness
HDR processingHigh dynamic range for challenging lighting (night, tunnel transitions, glare)
Multi-sensor fusionSingle-chip processing for camera, radar, and LiDAR fusion
Power efficiencyUltra-low power consumption in compact form factor

The Ambarella SoC handles camera-specific workloads particularly well:

  1. Image signal processing: Raw camera sensor data goes through the on-chip ISP for white balance, exposure control, HDR tone mapping, and noise reduction
  2. Camera neural network inference: The CVflow AI engine runs camera-based detection networks with high efficiency
  3. Multi-sensor neural network inference: The CV3-AD685 can run fusion networks that combine camera, LiDAR, and radar data

How Workloads Are Split

While Kodiak has not published the exact workload partition, the architecture suggests:

  • Ambarella CV3-AD685: Handles all camera data ingestion and ISP processing; runs camera-specific neural networks; may handle initial stages of multi-modal fusion; optimized for power-efficient, continuous inference on camera streams
  • NVIDIA DRIVE Orin: Handles GPU-intensive workloads including LiDAR point cloud processing, complex multi-sensor fusion, prediction, planning, and any large-model inference (including VLM processing)

The two processors work together as a complementary pair rather than redundant backups. The Ambarella SoC's strength in camera processing and power efficiency complements Orin's strength in GPU-heavy compute.

Processing Power Evolution

From Gen 1 to Gen 6, onboard compute has grown substantially:

MetricGen 6 vs Gen 1
GPU processor cores2x
Processing speed1.6x
Memory3x
Bandwidth2.75x

From Gen 4 to Gen 5 alone: 130% GPU increase, 60% CPU increase, 50% power system reduction.

ACE and Perception

The ACE safety computer does not run the primary perception stack -- it operates independently with its own sensor inputs to execute fallback maneuvers when the primary system fails. However, it performs its own simplified perception to guide the truck safely to a stop:

  • The ACE has access to sensor data independently of the main compute
  • It can execute a controlled pullover without any input from the Kodiak Driver's main perception stack
  • This is analogous to a brainstem reaction -- fast, reflexive, and independent of higher-level processing

16. Lightweight Mapping Perception

What Perception Must Do When Maps Are Sparse

Because Kodiak uses sparse maps (kilobytes per mile vs. megabytes for HD maps), the perception system must handle many tasks that HD-map-dependent systems offload to the map:

TaskHD Map ApproachKodiak Perception Approach
Lane boundary locationRead from mapDetect from camera + LiDAR in real-time
Number of lanesStored in mapCounted from perceived lane markings
Lane curvaturePre-computed from mapEstimated from perceived lane geometry
Road edge detectionMap boundaryPerceived from curbs, barriers, road surface edges
Construction changesMap update required (days/weeks)Perceived in real-time; temporary map built on-the-fly
Speed limitsStored in map (also in Kodiak's sparse maps)Can also read speed limit signs
Overhead clearanceMay be in map4D radar provides elevation; LiDAR detects structures

Real-Time Map Building

When the perception system detects that reality does not match the sparse map (e.g., a new construction zone has shifted lanes), it builds a temporary on-the-fly map:

  1. Discrepancy detection: Perceived lane markings, road edges, or traffic patterns conflict with stored map data
  2. Perception-primary navigation: The system switches to full reliance on real-time perception for lane-following and navigation
  3. Temporary map construction: A local map is built from perceived features -- enough for safe driving through the changed area
  4. Fleet propagation: The updated information is shared over the air to the fleet, so subsequent trucks benefit immediately

This capability is critical for route expansion. Kodiak can deploy trucks on new routes without requiring a specialized mapping vehicle to first create an HD map. The trucks themselves act as their own mapping fleet, with perception filling gaps and the sparse map being updated incrementally.

Localization Without HD Maps

Kodiak localizes the truck based on what sensors see relative to lane markings -- mimicking how human drivers localize on the road. This perception-based localization uses:

  • Camera lane detection: Primary source of lateral localization (distance from lane markings)
  • LiDAR feature detection: Road edges, barriers, and structural features for longitudinal and lateral reference
  • GPS: Used for coarse localization and route-level navigation (but not relied upon for centimeter-level precision)
  • Sparse map correlation: The perceived lane markings are correlated with the sparse map's lane boundary data for global localization

When GPS is degraded (tunnels, urban canyons, or GPS-denied military environments), the system can localize entirely from perception-based cues, demonstrating the robustness of the perception-over-priors approach.


17. Military/Defense Perception

Adapting Perception for Military Environments

Kodiak's military program uses the same Kodiak Driver software that powers commercial trucks, adapted for fundamentally different operating environments. The military perception challenge differs from highway trucking in nearly every dimension:

AspectHighway TruckingMilitary Off-Road
TerrainPaved highway, lane markingsUnstructured: dirt, rocks, mud, water, vegetation
GPS availabilityGenerally availableOften degraded or denied
Road markingsLane lines, signsNone
ObstaclesVehicles, debris, constructionRocks, logs, ditches, water crossings, IEDs
Speed55-80 mphVariable, often lower
EnvironmentWeather, but benign terrainHostile, dust, rough terrain
MapsSparse maps availableMay have no usable maps

GPS-Denied Navigation

Military operations frequently occur in GPS-challenged or GPS-denied environments (electronic warfare, jamming, or simply remote areas with poor satellite coverage). The Kodiak Driver's perception-over-priors philosophy -- designed to function without relying on map priors -- naturally extends to GPS-denied operation:

  • The system can navigate using purely perception-based cues (terrain features, obstacles, path detection) without GPS
  • LiDAR-based localization using terrain matching replaces GPS-based localization
  • The sparse-map philosophy means the system is already designed to operate with minimal prior information

DefensePod Sensor Suite

Military vehicles use Kodiak's DefensePods -- hardened versions of the commercial SensorPods designed to withstand:

  • Extreme vibration from off-road and tracked vehicle operation
  • Dust and mud contamination in field conditions
  • Temperature extremes from desert to arctic environments
  • Rough handling and field maintenance

The DefensePods are described as "the industry's only hardened solution built to be reliable, serviceable, and maintainable on any ground mission," maintainable with minimal training even in field conditions.

Off-Road Perception Challenges

Off-road military perception must handle challenges that simply do not exist on highways:

  • Traversability estimation: Determining whether terrain is drivable (soft sand, mud depth, water crossing depth, slope angle)
  • Natural obstacle detection: Rocks, fallen trees, ditches, and other obstacles that have no standardized appearance
  • Dust clouds: Military vehicles (especially tracked vehicles like the RIPSAW) generate massive dust clouds that can blind following sensors. The self-cleaning system and radar's dust-penetrating capability are critical.
  • No lane structure: Without lane markings, the system must navigate using terrain features, path planning, and waypoint following

Vehicle Platforms

The military perception system has been integrated onto:

  • Ford F-150: Upfitted with the Kodiak Driver for the initial military prototype (began testing at a U.S. military base in November 2023)
  • Textron Systems RIPSAW M3/M5: Tracked unmanned ground vehicle for the Army's Robotic Combat Vehicle program. The RIPSAW presents unique perception challenges due to its tracked drivetrain, combat weight (10.5 tons), and significantly different dynamics from a highway truck.

18. VLM (Vision Language Model) Integration

Overview

In fall 2025, Kodiak deployed Vision Language Models (VLMs) into the Kodiak Driver's production perception stack -- a significant advancement in autonomous vehicle perception. VLMs are a variant of large language models trained to process text, images, and video concurrently, enabling the autonomous system to reason about visual scenes using natural language understanding.

Technical Architecture

AspectDetails
Inference locationOnboard, real-time, in the autonomous truck's perception stack
Query approachNatural language prompts describing critical scenarios are queried during each perception cycle
Training methodologyTraining and alignment of vision and language features in the joint embedding space through maximization of image-text pair similarity
Classification approachZero-shot classification of scenes through VLM inference
HardwareRuns on Ambarella CV3-AD SoC family (designed for efficient embedded real-time VLM inference)

Key Capabilities

Zero-Shot Prediction: The VLM can identify and make informed decisions about novel potential hazards that have never been seen before in real-world driving. This is a fundamental capability shift -- traditional perception systems can only detect object categories they were explicitly trained on. VLMs can reason about entirely new scenarios through their grounded understanding of the visual and physical world.

Emergency Vehicle Detection: VLMs identify first responders blocking traffic lanes without explicit training, recognizing these rare scenarios through built-in understanding of what emergency vehicles look like and their contextual meaning (e.g., "there is an ambulance blocking the right lane with flashing lights, likely responding to an accident -- vehicles should merge left").

Sensor Occlusion Detection: VLMs detect camera obstruction (e.g., dust, water, a person standing dangerously close to a camera) even without having a single sample of such scenarios in the training dataset.

Novel Object Understanding: By grounding vision with language features, the Kodiak Driver can understand contextual properties of objects. For example, it can understand that tree branches are not rigid objects -- allowing it to navigate through partially obstructing branches in off-road military scenarios without treating them as solid walls.

Edge Case Recognition: The VLM handles unusual situations that would be virtually impossible to train a traditional detector on due to their extreme rarity:

  • A house being towed on the highway
  • A person running across the highway at night wearing dark clothing
  • Unusual cargo configurations on other trucks
  • Debris of unusual shapes

Integration with Traditional Perception

VLMs do not replace the traditional multi-detector perception pipeline. They augment it as an additional layer of understanding:

  1. Traditional detectors (neural networks running on LiDAR, camera, and radar data) provide the primary object detection and tracking at 20 Hz
  2. VLMs provide higher-level scene understanding, edge case recognition, and zero-shot classification
  3. When the VLM detects a scenario that requires special attention, it can flag the situation for the planning system

Impact

The VLM deployment contributed to "an overall reduction in the need for remote human support and improved their ability to safely scale their solution" -- a direct measure of perception effectiveness, since fewer remote operator interventions means the system is handling more scenarios autonomously.


19. Auto-Labeling and Training Data Pipeline

Two-Partner Annotation Architecture

Kodiak uses two complementary partners for training data generation:

PartnerPrimary RoleKey Capability
KognicMulti-sensor annotation platformTime-series multi-sensor data labeling; AI flywheel
Scale AISynthetic data generationHuman-in-the-loop pedestrian and construction worker simulation

Kognic Partnership -- The AI Flywheel

Kognic provides the annotation platform for labeling real-world sensor data from Kodiak's fleet:

Multi-sensor annotation: Kognic's platform optimizes datasets by merging sensor data from radar, LiDAR, and cameras via intuitive interfaces for visualizing complex objects and sequences. This is not single-sensor labeling -- annotators work with synchronized multi-sensor views, labeling objects consistently across all modalities.

Time-series data handling: Kognic has a unique capacity to handle time-series data, enabling annotation of object behavior across consecutive frames. This temporal annotation is critical for training tracking models and behavior prediction networks -- an object is not just a static 3D box but a trajectory over time.

Pre-labeling and AI flywheel: Kodiak integrates its own pre-labeling models into Kognic's platform. The cycle works as follows:

  1. Kodiak's perception models generate initial labels (pre-labels) on new driving data
  2. Human annotators review, correct, and refine the pre-labels on Kognic's platform
  3. The corrected labels are used to train improved perception models
  4. The improved models generate better pre-labels on the next batch of data
  5. Repeat -- each cycle improves both the models and the efficiency of annotation

This creates a self-reinforcing AI flywheel where better models produce better training data, which produces even better models.

Quality assurance: The partnership focuses on "high-quality data labeling capabilities" essential for "safe deployment of autonomous trucking technology in real-world environments."

Scale AI Partnership -- Synthetic Data

Scale AI addresses a critical gap in Kodiak's training data: rare but safety-critical scenarios where real-world examples are insufficient.

Problem: Highway driving data contains very few examples of pedestrians, construction workers, and other vulnerable road users. But the perception system must detect these objects with extremely high reliability.

Solution: Scale AI provides human-in-the-loop synthetic data generation:

  1. Scale generates synthetic data -- for example, simulated pedestrians placed in highway scenes
  2. Trained taskers validate the placement and poses of synthetic pedestrians to ensure realism
  3. The validated synthetic data is merged with real-world training data

Integration: Scale delivers data using the same dashboard and APIs as Kodiak's existing annotation pipeline (via Scale's Nucleus platform), making integration seamless. Kodiak consolidates multiple labeling projects and raw unlabeled data into a single dataset.

Specific use cases addressed by Scale Synthetic:

  • Construction worker detection scenarios
  • Vehicle behavior under bridges (complex 4D radar scenarios)
  • Edge cases identified via low IoU (Intersection over Union) scores where existing models underperform

Efficiency: Teams identify underperforming scenes through model metrics (e.g., low IoU scores) and curate targeted data subsets where additional synthetic examples are most beneficial -- a data-efficient approach to improving model performance.

Combined Pipeline

Real-world driving data
        |
        v
Kodiak pre-labeling models --> Initial auto-labels
        |
        v
Kognic platform --> Human review + correction --> Refined labels
        |
        v
Training pipeline --> Improved models
        |                    |
        v                    v
Better pre-labels     Scale AI synthetic data
                      (pedestrians, workers,
                       edge cases)
                            |
                            v
                     Combined training dataset
                            |
                            v
                    Next-generation models

20. ARM Safety Metric and Perception's Role

What ARM Measures

The Autonomy Readiness Measure (ARM) is Kodiak's proprietary metric that measures the percentage of claims and evidence in Kodiak's safety case for driverless operations that are materially complete. It is not a feature development checklist -- it is a claims-based testing framework spanning simulation, real-world driving, and track testing.

MilestoneARM Score
November 202578%
February 202684%
Industrial applications (Permian Basin)100%
Long-haul target100% (H2 2026)

Reaching 100% on ARM indicates readiness to launch long-haul driverless operations on public highways. ARM completion and commercial launch are expected to occur in parallel.

How Perception Contributes to ARM

Perception is foundational to the safety case. Every safety claim about the Kodiak Driver's ability to detect, classify, and respond to objects on the road ultimately rests on the perception system's capabilities. The remaining work on ARM (from 84% to 100%) involves:

  1. Adding scenarios at higher speeds: Validating perception performance at the full range of highway speeds (up to 75-80 mph), where detection range requirements and reaction time constraints are most demanding
  2. Validating performance against expectations: Comparing perception system performance to the safety case claims across all object types, weather conditions, lighting conditions, and road configurations

Breakpoint -- AI-Driven Simulation for Perception Validation

Kodiak uses a proprietary simulation technology called Breakpoint that uses AI to test the Kodiak Driver against millions of scenario variations and identify challenging edge cases to prioritize engineering work.

Breakpoint's role in perception validation:

  • Generates millions of scenario variations that systematically stress-test perception capabilities
  • Identifies specific scenarios where the perception system struggles (edge cases)
  • Prioritizes engineering effort toward the highest-impact perception improvements
  • Contributes to the evidence base for ARM safety claims

Testing Infrastructure

Kodiak validates perception through a three-tier testing methodology:

  1. Simulation: Applied Intuition-based simulation generates synthetic sensor data that closely matches real-world sensor behavior. Perception is tested against a vast range of scenarios (including rare edge cases) without requiring real-world driving.

  2. Track testing: In February 2026, Kodiak began structured highway-speed testing at the American Center for Mobility Proving Ground in Southeast Michigan to evaluate perception performance in rare, controlled scenarios.

  3. Real-world driving: 3,000,000+ autonomous miles and 12,600+ loads provide a massive real-world dataset for validating perception performance. Over 10,700 revenue-generating driverless hours demonstrate sustained production-level perception performance.

Perception Safety Monitoring

The Kodiak Driver evaluates the performance of more than 1,000 safety-critical processes and components 10 times per second. This includes monitoring the perception system itself:

  • Sensor health (detecting degradation, failure, or contamination)
  • Detector performance (ensuring all detectors are producing outputs)
  • Tracker consistency (verifying the world model is coherent)
  • Cross-sensor agreement (checking that different sensor modalities are consistent)

21. Key Patents

Patent Portfolio Overview

MetricValue
Total patents globally21
Patents granted1
Active patents~95%
Unique patent families2
Primary filing jurisdictionUnited States (17 patents)
Other jurisdictionsEurope, Australia, Canada (1 each)

Filing Timeline

YearApplications FiledGranted
201910
202051
2022150

The significant increase in filing activity in 2022 corresponds to the period when Kodiak was developing the SensorPod architecture and advancing its perception technology from Gen 4 to Gen 5.

Patent Focus Areas

Kodiak's patent portfolio concentrates on sensor assembly technology for autonomous vehicles. Patent titles consistently reference "Sensor Assembly" variations incorporating LiDAR and radar technologies. The focus areas include:

  1. SensorPod design: The patent-pending modular sensor pod architecture, including the mirror-mount integration, pre-calibration system, and field-swappable design
  2. Sensor data processing: Methods for receiving and processing data from multiple sensors to identify objects and obstacles, determining multiple sets of attributes from individual sensors, and controlling the vehicle according to candidate trajectories
  3. Object detection and tracking: Methods and systems for detecting and tracking objects in the autonomous vehicle's environment
  4. Environment perception: Systems and methods for detecting portions of the vehicle's environment using multiple coupled sensors to generate environment data

Key Inventor

Don Burnette (CEO/Co-Founder) holds 11 patents within Kodiak's portfolio, reflecting his deep personal involvement in the company's core sensor technology IP. His patent work draws on over a decade of autonomous vehicle experience from Google's Self-Driving Car Project, Otto, and Uber ATG.

Strategic Significance

The patent portfolio's exclusive focus on sensor technology and sensor assembly reflects Kodiak's strategic view that the SensorPod architecture -- the modular, pre-calibrated, field-swappable sensor enclosure -- is the company's most defensible hardware innovation and the key enabler of its OEM-agnostic, scalable deployment model.


22. Perception Team

Leadership

Andreas Wendel -- Chief Technology Officer

DetailValue
RoleCTO since February 2022
At Kodiak since2018 (founding engineer)
Previous roles at KodiakVice President of Engineering
EducationB.S. and M.S. in Information and Computer Engineering; Ph.D. in Computer Science (Graz University of Technology, Austria)
Academic researchFounded the Aerial Vision Group at Graz University; research in image-based 3D reconstruction, localization, and autonomous navigation for micro aerial vehicles; visiting researcher at Carnegie Mellon's Robotics Institute
Waymo experiencePerception Tech Lead (2017-2018); designed Waymo's camera systems; enabled cars to see and understand the environment with robust ML algorithms
Google SDCSoftware Engineer on Google's Self-Driving Car Project (2013-2016); part of the small team that launched the first driverless car on public roads

Wendel's deep background in computer vision, 3D reconstruction, and perception systems directly shaped the Kodiak Vision architecture. His experience designing Waymo's camera systems is reflected in Kodiak's sophisticated multi-camera perception pipeline.

Jamie Hoffacker -- Vice President of Hardware

DetailValue
RoleVP of Hardware
EducationB.S. in Electrical Engineering (University of Florida)
Previous experienceDirector of Sensors and Electronics at Lyft Level 5 (founding engineer); 5 years at Google Maps designing cameras, LiDAR, and electronics for the mapping fleet

Hoffacker's sensor hardware background from Google Maps and Lyft Level 5 is directly relevant to the SensorPod design and sensor integration.

Don Burnette -- CEO / Co-Founder

DetailValue
RoleCEO since founding (2018)
EducationB.S. in Physics, Mathematics, and Electrical Engineering; M.S. in Physics (University of Florida); M.S. in Robotics (Carnegie Mellon University)
Google SDCSoftware Technical Lead (2010-2016)
OttoCo-founder of Ottomotto (first self-driving truck startup; acquired by Uber 2016)
Uber ATGSoftware Technical Lead (2016-2018)
Patents11 patents in Kodiak's portfolio

Perception Engineering Team

Derek Phillips -- Head of Perception Engineering

  • Leads Kodiak's perception engineering team
  • Oversees the development of the perception stack including Kodiak Vision, detectors, and the Tracker

Felix Duvallet -- Software Engineer

DetailValue
Research focusMachine learning, planning and decision-making, imitation learning
Previous rolesApple (Special Projects Group), EPFL, CMU Robotics Institute, Google's Self-Driving Car Project, CSIRO (Australia)
Google Scholar citations733+
Research areasRobotics, machine learning, imitation learning, robot planning, human-robot interaction

Duvallet authored Kodiak's technical blog post on sparse maps, indicating his work spans the intersection of perception, mapping, and planning.

Ryan Lee -- ML & CV Engineer, Perception

DetailValue
EducationM.S. in Robotic Systems (Carnegie Mellon University)
TeamPerception, under Derek Phillips

Cole Miles -- Software Engineer, Perception

DetailValue
FocusDeep learning for object detection
Notable workDeveloped a variation of the DARTS neural architecture search algorithm specialized to object detection models

Team Composition and Hiring Focus

Based on job postings, Kodiak's perception team works across the following specializations:

  • Sensor fusion and state estimation: Kalman filtering, particle filtering, multi-object tracking, learned scene estimation
  • 3D object detection and classification: Deep neural networks for LiDAR, camera, and radar-based detection
  • Lane detection and road understanding: Camera-based lane boundary and road edge perception
  • Sensor calibration: Multi-sensor extrinsic and intrinsic calibration
  • Multi-modal fusion: Fusing information from camera, radar, LiDAR, thermal, and IMU modalities
  • Deep learning for computer vision: End-to-end learned approaches, VLM integration
  • Behavior prediction: Predicting future trajectories of detected objects

Technical requirements from postings emphasize:

  • Strong C++ implementation skills for real-time perception code
  • Experience with deep neural networks and production ML pipelines
  • Background in sensor fusion, Kalman/particle filtering, and state estimation
  • Experience with dataset collection, labeling, training, and validation workflows
  • Passion for real-world robotics problems with prior work on autonomous systems

Engineering Lineage

Kodiak's perception team draws from a concentrated set of world-class autonomous vehicle programs:

  • Google Self-Driving Car Project (predecessor to Waymo)
  • Waymo
  • Otto (first self-driving truck startup)
  • Uber ATG (Advanced Technologies Group)
  • Lyft Level 5
  • Apple Special Projects Group
  • Carnegie Mellon Robotics Institute
  • Graz University of Technology (computer vision)

This lineage means the perception team has collective experience from three of the most significant autonomous vehicle programs in history (Google SDC, Waymo, and Uber ATG), providing deep institutional knowledge of perception system design, failure modes, and production deployment.


Sources

Kodiak AI Official

Kodiak AI Medium Blog

Partner & Vendor Sources

Industry Coverage

Press Releases

Patents & IP

Financial / Earnings

Researcher Profiles

Public research notes collected from public sources.