Production Perception Systems: What Actually Works in the Real World

Overview

This report analyzes perception systems deployed in production autonomous vehicles -- not research prototypes, but systems carrying real passengers and cargo on public roads, airport aprons, and managed sites. The goal is to extract concrete perception lessons that transfer across AV domains, with airside autonomy treated as one reference ODD rather than the default deployment lens.

1. Waymo Sensor Suite

5th Generation (Deployed 2020-2025)

The 5th-generation Waymo Driver established the template for production L4 perception:

Sensor Type	Count	Key Details
Cameras	29	16 on roof structure, additional forward-facing; high-dynamic-range with thermal stability
LiDAR	5	1x 360-degree rooftop (>300m range), 4x perimeter ("Honeycomb" Laser Bear) at front, rear, and over front tires
Radar	Multiple	Custom imaging radar with redesigned architecture and signal processing

The perimeter LiDARs ("Honeycomb" / Laser Bear) are notable: 95-degree vertical FOV, 360-degree horizontal FOV, and zero minimum range -- meaning they can detect objects directly against the sensor housing. This eliminates the classic LiDAR blind spot directly around the vehicle, which is critical for urban driving near pedestrians and cyclists.

The 5th-gen long-range cameras and 360 vision system could identify pedestrians and stop signs at greater than 500 meters.

6th Generation (Deployed February 2026)

The 6th generation represents a major optimization: 42% fewer sensors while improving performance.

Sensor Type	Count	Key Details
Cameras	13	Down from 29; uses next-gen 17-megapixel imager
LiDAR	4	Custom-designed chips and optics, built in California
Radar	6	Imaging radar creating dense temporal maps
Audio	Array	External Audio Receivers (EARs) -- microphone array

Why this configuration works:

17MP camera imager: A generation ahead of other automotive cameras in resolution, dynamic range, and low-light sensitivity. By going to higher resolution, Waymo could halve the camera count while maintaining coverage.
Custom LiDAR: Designed and fabricated in-house since 2011. The 6th-gen LiDAR specifically improves weather penetration and eliminates point cloud distortion near highly reflective signs (a known failure mode on freeways).
Imaging radar: Creates dense temporal maps that track distance, velocity, and size in all lighting and weather conditions. Leverages affordable radar chipsets for cost reduction.
Redundancy philosophy: Three complementary modalities (camera, LiDAR, radar) with overlapping fields of view up to 500 meters. No single sensor failure can blind the system.
Modular cleaning: Waymo can swap sensor cleaning components per deployment city (e.g., different cleaning systems for cold climates vs. desert).

Cost: Waymo has significantly reduced per-vehicle sensor cost through custom silicon, strategic placement, and industry-wide LiDAR cost reduction (from $75K per unit in early 2010s to $1K-5K today).

Key Waymo Insight

Waymo has been doing autonomous driving longer than anyone and concluded they need all three sensing modalities. Their trajectory has been toward fewer, better sensors rather than more sensors -- but they have never dropped a modality entirely.

2. Cruise Sensor Suite

Configuration (Chevy Bolt-Based Vehicles)

Sensor Type	Count	Key Details
Cameras	16	Video cameras distributed around vehicle
LiDAR	5+	Roof-mounted navigation package; Cruise acquired Strobe (LiDAR company) in 2017
Radar	21	Extensive radar coverage

Cruise used the densest sensor configuration of any production robotaxi -- 40+ sensors total, with 40% of hardware custom for self-driving. They believed LiDAR was essential for urban driving and pedestrian detection, leading to their acquisition of Strobe to bring LiDAR development in-house.

The October 2023 Incident and Lessons

On October 2, 2023, a pedestrian was struck by a human-driven vehicle and thrown into the path of a Cruise robotaxi. What followed exposed critical perception system gaps:

Sensors worked correctly: The post-incident investigation found no issues with sensor hardware or maintenance. The sensors detected the initial collision event.
Software misclassification: The perception software classified the impact as a "non-critical side collision" and triggered a "minimal risk condition" -- pulling to the curb rather than emergency stopping.
Undercarriage blind spot: Once the pedestrian was beneath the vehicle, no sensors could detect their presence. The vehicle dragged the pedestrian 6 meters (20 feet) at 7 mph.
Wheel speed anomaly as final signal: A wheel speed sensor detected one wheel spinning on the pedestrian's body, which finally halted the vehicle -- an unintended safety mechanism.

Lessons for the Industry

Sensor placement must account for all scenarios: Roof-mounted sensors cannot see directly beneath the vehicle. The Cruise incident proved that undercarriage perception is not optional.
Classification confidence thresholds matter enormously: The system's decision to "relocate" rather than "stop" after any unusual contact was a software policy failure, not a sensor failure.
Organizational transparency: Cruise's delayed disclosure to regulators led to permit revocation and ultimately contributed to GM shutting down Cruise entirely in December 2024, writing off approximately $10 billion in investment.
Production is a marathon: Regulators must be treated as partners. The industry learned that rushing deployment and hiding problems is existentially dangerous for an AV program.

3. Zoox Sensor Suite

Purpose-Built Vehicle Design

Zoox is unique among robotaxi companies: they designed their vehicle from scratch for autonomous driving, rather than retrofitting an existing car. This allowed optimal sensor placement.

Sensor Type	Details
Cameras	Visual cameras at all four corners
LiDAR	Multiple Hesai LiDARs (partnership announced 2024)
Radar	Integrated at each corner pod
Infrared	Long-wave infrared (LWIR) cameras -- FLIR thermal imaging
Audio	Microphone arrays

Key design features:

Bidirectional symmetry: The vehicle has no "front" or "back." Two identical sensor configurations at each end, plus four-wheel steering, allow the vehicle to operate equally well in both directions.
270-degree FOV per corner: Each of the four corners achieves 270-degree coverage with overlapping camera, LiDAR, and radar. Combined, the vehicle achieves true 360-degree coverage with no blind spots.
LWIR thermal cameras: Zoox is the only production robotaxi using long-wave infrared sensors. These detect heat signatures and work in complete darkness, fog, and smoke -- conditions where cameras and LiDAR degrade.
150-meter all-direction range: The sensor suite can perceive the environment 150 meters in every direction.
Compute: Multiple NVIDIA GPUs process the full sensor suite in real time.

Why Purpose-Built Matters

By not retrofitting, Zoox avoids the compromises other companies face: sensors don't need to fit around existing body panels, mirror housings, or windshield structures. The symmetric design eliminates the "rear blind spot" problem that plagues every retrofitted vehicle. The inclusion of LWIR is a direct lesson -- thermal imaging catches what cameras and LiDAR miss (pedestrians in dark clothing at night, animals, overheating vehicle components).

4. Tesla Vision-Only Approach

Current Production System (Hardware 3, ~4M+ Vehicles)

Component	Details
Cameras	8 cameras, 1.2 megapixels each, 360-degree coverage up to 250 meters
Radar	Removed from production vehicles starting 2021
Ultrasonic	Removed starting October 2022
Compute	FSD Computer (HW3): custom SoC, 8 GB RAM, 64 GB storage

Camera placement (HW3):

3x forward-facing (main, wide, narrow/telephoto)
2x side-facing (B-pillar, forward-looking)
2x rear-quarter (above rear wheel wells)
1x rear-facing (above license plate)

Hardware 4 (Rolling Out in New Vehicles)

Component	Details
Cameras	Up to 12 camera ports; 5.4 megapixel sensors (2896x1876), 4.5x resolution increase
Radar	HD radar ("Phoenix") reintroduced
Compute	FSD Computer 2: Samsung 7nm SoC, 16 GB RAM, 256 GB storage

HW4 adds new camera positions: B-pillar cameras that see sideways and slightly forward, C-pillar cameras that see sideways and slightly backward, and a wider front-facing field of view.

How Camera-Only Perception Works

Tesla's FSD v12 represents a fundamental architectural shift: replacing 300,000 lines of hand-coded rules with end-to-end neural networks.

Processing pipeline:

Image rectification: Corrects for camera calibration distortions
Feature extraction: RegNet (residual network) backbone processes images at multiple scales
Transformer fusion: Multi-scale features are fused via transformer attention into a unified "Vector Space"
Bird's Eye View (BEV) transformation: 2D camera images are transformed into a 3D BEV representation
Occupancy Networks: 3D voxel grid determines which cells in space are occupied -- replacing traditional bounding-box detection with volumetric understanding
End-to-end output: The neural network directly outputs steering, acceleration, and braking commands

Key technical innovations:

48 neural networks running in concert
Occupancy Networks running at 100+ FPS, classifying voxels as occupied/free, static/moving
Self-supervised depth estimation from monocular cameras -- the network learns depth without LiDAR ground truth
Imitation learning from 10 million driving video clips, trained on 70,000 GPU hours per cycle

Performance vs. LiDAR Approaches

Advantages of vision-only:

Dramatically lower per-vehicle sensor cost (~$200-300 for all cameras vs. $5K-50K for LiDAR suites)
Scales to millions of vehicles = massive training data advantage (4M+ vehicles collecting data)
No mechanical LiDAR failure modes (motor bearing wear, laser degradation)
Camera resolution improves rapidly with commodity semiconductor advances

Disadvantages vs. LiDAR:

No direct distance measurement -- depth must be inferred, introducing uncertainty
Degraded performance in direct sunlight, glare, and low-light conditions
Cannot match LiDAR's centimeter-level range accuracy for nearby objects
Tesla FSD remains SAE Level 2 (supervised) -- no production deployment without driver supervision
Radar was reintroduced in HW4, suggesting pure vision has practical limits

Critical observation: No company operating fully driverless (L4) vehicles uses a camera-only approach. Tesla's massive data advantage has not yet produced a system that operates without human supervision. This is the strongest evidence that cameras alone may be insufficient for L4 autonomy.

5. comma.ai Sensor Suite

Hardware Configuration

Generation	Cameras	Other Sensors	Processor	Price
comma 3X	3 (OmniVision OX03C10)	Uses vehicle's existing radar	Snapdragon 845	$1,250
comma four	3 (same triple-camera system)	Uses vehicle's existing radar	Snapdragon 845 MAX	~$1,000

Camera layout:

2x forward-facing (road-facing stereo pair for depth perception)
1x interior-facing (driver monitoring)

The device mounts behind the rearview mirror and is one-fifth the size of the comma 3X (which itself fits in a small windshield-mounted unit).

How comma.ai Achieves Performance with Minimal Sensors

Supercombo model: A single neural network trained on 10+ million miles of real-world driving data from the comma.ai user fleet. Unlike traditional AV stacks with separate perception, prediction, and planning modules, openpilot uses a system-level end-to-end design that predicts the car's trajectory directly from camera images.

Radar fusion: When the host vehicle has radar (many newer cars do), openpilot fuses neural network estimates with radar data for lead vehicle detection, improving longitudinal (speed) control.

Key insight: comma.ai proves that a well-trained neural network with a single good camera can achieve L2 driving assistance comparable to manufacturer ADAS systems. However, like Tesla, this is supervised L2 -- the driver must remain attentive. The lesson is not that minimal sensors suffice for L4, but that sensor count is less important than software quality for L2.

Supported Vehicles

openpilot supports 300+ vehicle models, leveraging each vehicle's existing CAN bus interface and (when available) factory-installed radar.

6. TractEasy / reference airside AV stack -- Production Airside Vehicles

TractEasy (EasyMile)

TractEasy is the most widely deployed autonomous airside tow tractor, operating at airports including Changi (Singapore), Kansai (Japan), and Frankfurt (Lufthansa Cargo).

Sensor Type	Details
LiDAR	Multiple units (likely Velodyne, based on CES 2022 partnership); 3D mapping
Cameras	HD stereo cameras
Radar	Integrated
GPS/GNSS	Satellite navigation with RTK correction
IMU	Inertial measurement unit for dead reckoning
Wheel encoders	Odometry
V2X	Vehicle-to-infrastructure communication (on-board units)
Connectivity	3G/4G modem, Wi-Fi

Key specifications:

Load capacity: 25 tons
Localization accuracy: within 5 centimeters
Navigation: centimeter-precise localization at all times
Perception: wide-range obstacle detection with predictive control
V2X: Communication at intersections and pedestrian crossings

Operational approach: TractEasy operates on pre-mapped routes with centimeter-level localization. The LiDAR and cameras provide obstacle detection, while GPS/IMU/encoders provide localization. V2X communication adds awareness of infrastructure signals (gates, traffic lights, aircraft movement areas).

reference airside AV stack (autonomous baggage/cargo tug, autonomous cargo vehicle)

reference airside AV stack is deploying autonomous baggage and cargo vehicles at airports including Heathrow, Schiphol, and multiple global trial sites.

Sensor Type	Details
LiDAR	Multiple units on A-frame structure above cargo platform
Cameras	360-degree cameras
GPS/GNSS	High-precision positioning
IMU	Inertial measurement unit

Key operational features:

Navigates pre-programmed mapped airport areas
New rainfall algorithm differentiates rain drops from obstacles (critical for LiDAR in rain)
Operates in up to 50mm/hour precipitation with specialized sensor housing
Custom weather-hardened LiDAR enclosures protect sensors in intense conditions

Lessons from Airside Deployments

Pre-mapping is standard: Both TractEasy and reference airside AV stack operate on pre-mapped routes, not in unknown environments. This dramatically simplifies the perception problem -- the vehicle knows what the world should look like and only needs to detect deviations (obstacles, vehicles, people).
Weather is the primary challenge: Both companies have invested heavily in wet-weather algorithms. reference airside AV stack specifically developed new software to distinguish rain from obstacles in LiDAR data.
Lower speeds reduce sensor requirements: Airside vehicles typically operate at 10-25 km/h, requiring much shorter perception ranges than highway vehicles. This means less expensive, shorter-range sensors can suffice.
V2X is critical for airside: Unlike road vehicles, airside vehicles can leverage infrastructure communication for coordination at intersections, gate areas, and aircraft stands.
Testing has been extensive: Hundreds of hours of testing across different weather conditions without weather-related failures, though extreme conditions remain challenging.

7. LiDAR in Production: Brands, Deployments, and Reliability

Which LiDAR Brands Are Actually Deployed

Company	Technology	Key Customers	Status
Hesai	Solid-state (AT128, OT128)	Zoox, Baidu Apollo, EasyMile, SAIC, plus 9 of top 10 L4 companies	Market leader; 380K+ AT128 units shipped by Q2 2024
Waymo (in-house)	Custom (Laser Bear, Honeycomb, 6th-gen)	Waymo only	In-house since 2011; not sold externally since 2021
Luminar	1550nm Iris	Volvo EX90 (standard), Volvo ES90, Polestar 3	First automotive-grade LiDAR in mass production passenger car (2024)
RoboSense	MEMS solid-state (M1, M1 Plus)	Chinese OEMs, North American automakers	ISO 26262 ASIL-B certified; sub-100ms fault detection
Ouster (merged with Velodyne)	Digital spinning (OS0, OS1, OS2)	Robotics, mapping, industrial; some AV programs	IP68/IP69K rated; passed GM automotive reliability standards
Innoviz	Solid-state	BMW, Volkswagen	Automotive-grade solid-state
Aeva	FMCW (4D: position + velocity)	Daimler Truck/Torc (production 2026-2027), May Mobility	First automotive-grade FMCW LiDAR; 500m range, 4M points/sec

Cost Evolution

Period	Cost per Unit	Notes
Early 2010s	$75,000 - $100,000	Velodyne HDL-64E era
Mid-2010s	$10,000 - $20,000	Solid-state emergence
2020-2025	$1,000 - $5,000	Automotive-grade viability; Luminar targeting $500-1,000
2026-2030 (projected)	$300 - $500	With weather hardening improvements
Post-2030 (projected)	$200 - $300	At mass production scale

Reliability Data

Ouster OS1:

IP68 and IP69K rated (submersible and high-pressure wash resistant)
Passed GMW3172 (General Motors automotive reliability standard)
Extreme shock and vibration tested
Specific MTBF figures not publicly disclosed

RoboSense M1:

ISO 26262 compliant (ASIL-B, SIL-2)
97% diagnostic coverage with sub-100ms fault self-detection
Power consumption under 15W
Physical size: 108mm x 110mm x 45mm (world's thinnest automotive LiDAR at launch)

Hesai AT128:

380,000+ units shipped without major reliability issues reported
"Quality and reliability have been highly praised by automotive OEMs"
Market-proven in production vehicles

Luminar Iris:

1550nm wavelength (eye-safe at higher power = longer range)
600-meter max range, 120-degree FOV, 26-degree dynamic vertical FOV
First truly mass-production passenger car LiDAR (Volvo EX90)
Vehicles with Luminar outperform camera+radar vehicles by ~40% in safety studies

FMCW LiDAR: The Next Generation

Aeva's FMCW (Frequency Modulated Continuous Wave) technology represents a paradigm shift:

Measures instant velocity per point (not just position) -- the "4th D" in 4D LiDAR
Uses continuous low-power laser beam, measuring frequency shift (Doppler effect)
Up to 4 million points per second at 500-meter range
Detects pedestrians beyond 350 meters, ground hazards beyond 200 meters
Better rain/fog performance than time-of-flight LiDAR due to continuous wave nature
Production deployment: Daimler Truck autonomous Freightliner Cascadia (2026-2027), May Mobility Toyota Sienna (through 2028)

Key LiDAR Finding

No known fully driverless fleet operates without LiDAR. This is the single most important data point. Waymo, Zoox, Baidu Apollo, Cruise (when operational) -- all use LiDAR. The technology is proven for L4 autonomy. The question is no longer "does LiDAR work?" but "which LiDAR is most reliable and cost-effective for the specific deployment environment?"

8. 4D Imaging Radar in Production

Continental ARS548 RDI

The most widely referenced 4D imaging radar in production/near-production:

Specification	Value
Frequency	77 GHz
Maximum detection range	Up to 300m (extended configurations to 1500m)
Minimum range	0.2 m
Scan rate	20 Hz (20 scans per second)
Output modes	Detection (point cloud, similar to LiDAR) and Object (classified targets)
Object classification	120+ cluster classifications
Interface	BroadR-Reach Ethernet, 100 Mbit/s
Modulation	Pulse compression with advanced frequency modulation
Measurements	Range, azimuth, elevation, and velocity simultaneously

Continental has reached 200 million radar sensors produced overall, with 4D imaging radar orders worth approximately 1.5 billion euros secured in Q1 2024 alone, with production rollouts scheduled for 2026-2027.

ZF Imaging Radar

Specification	Value
Channels	192 (16x more resolution than typical automotive radar)
Frequency	77 GHz FMCW
Aperture angle	120 degrees
Range	350 meters
Dimensions	4D: range, velocity, azimuth, elevation

ZF's imaging radar is in production with SAIC Motor's R-Series vehicles in China -- one of the first series-production deployments of 4D imaging radar.

What 4D Radar Adds

Traditional automotive radar provides range and velocity but poor angular resolution (cannot distinguish between adjacent objects). 4D imaging radar adds:

Elevation measurement: Can distinguish between an overpass and a vehicle ahead, or between a speed bump and an object to avoid -- a classic failure mode of traditional radar.
Point cloud output: Generates LiDAR-like point clouds, enabling 3D object detection algorithms to run on radar data.
All-weather performance: Unlike LiDAR and cameras, radar penetrates rain, fog, snow, and dust with minimal degradation. Detection range reduction in severe rain is ~45%, compared to 56%+ for LiDAR.
Instant velocity: Every point includes Doppler velocity measurement, making moving object detection trivial.
Low cost: Automotive radar has already achieved mass-production economics at ~$50-200 per unit.

4D Radar Market Trajectory

By 2025, 4D radar had reached approximately 11.4% of the automotive radar market, transitioning from niche to mainstream. This is significant for airside vehicles: 4D radar could be the most cost-effective way to add robust all-weather perception as a complement to LiDAR and cameras.

9. Sensor Cleaning Systems

Why Cleaning Matters

A dirty sensor is a blind sensor. In production, sensor cleaning is not optional -- it is a safety-critical system. Every production L4 vehicle has integrated cleaning mechanisms.

Waymo's Approach

Waymo uses a multi-layered cleaning strategy:

Passive aerodynamic deflection: Small particles (fog droplets, fine dust) are deflected away from sensor apertures through aerodynamic body design. Waymo conducts computational fluid dynamics (CFD) simulations to optimize airflow around sensors before deployment.
Active wipers: Mechanical wipers for camera lenses and LiDAR windows, similar to miniature windshield wipers.
Air puffers: Compressed air jets blast debris off sensor surfaces, effective for mud, spray, and insects.
Bandpass filters: Cameras are placed behind glass with optical bandpass filters that reject certain wavelengths of stray light.
Heating elements: Keep sensors clear of ice and condensation in cold climates.
Adaptive cleaning frequency: The Waymo Driver's LiDAR data helps understand weather conditions (fog density, precipitation level) and adjusts cleaning frequency automatically.
City-specific configurations: Waymo swaps cleaning components based on deployment city -- different systems for Phoenix (dust) vs. San Francisco (fog/rain) vs. cold-weather cities (ice/snow).

Industry-Wide Cleaning Technologies

The sensor cleaning industry has grown to 230+ companies. Key technologies include:

Technology	How It Works	Best For
Telescopic nozzles	Extend, spray fluid, retract	Camera lenses, LiDAR windows
Ring nozzles	Surround sensor with spray ring	Cylindrical LiDAR housings
Mini wipers	Curvature-adaptive mechanical wipe	Flat and curved camera surfaces
Air jets / puffers	Compressed air blast	Dust, light debris, drying after wash
Ultra-high-pressure pumps	Same cleanliness in half the time, 30% less fluid	Production-scale cleaning efficiency
Defrosters / heaters	Resistive heating elements	Ice, frost, condensation
Hydrophobic coatings	Self-cleaning nano-coatings	Reducing cleaning frequency
Rotating covers	Physical covers rotate away debris	Heavy contamination environments

Key supplier: Tensor (now a major AV sensor cleaning specialist) equips autonomous driving cameras and LiDARs with integrated mini wipers, retractable nozzles, and defrosters as standard.

Implications for Airside

Airport environments present unique cleaning challenges: jet exhaust residue, de-icing fluid overspray, rubber tire dust, hydraulic fluid mist, and bird droppings. Any airside AV must have robust sensor cleaning -- passive aerodynamic design plus active washing and heating. This is a non-negotiable production requirement.

10. Sensor Calibration in Production

The Three Types of Calibration

Type	What It Calibrates	Stability
Intrinsic	Internal sensor parameters (focal length, distortion coefficients, beam patterns)	Mostly stable; auto-calibrated by modern sensors
Extrinsic	Spatial relationship between sensors (rotation, translation relative to vehicle body)	Drifts due to vibration, thermal cycling, impacts
Temporal	Time synchronization between sensors operating at different rates	Critical for fusion; typically handled by hardware sync signals

Why Extrinsic Calibration Drifts

Production vehicles experience continuous forces that shift sensor positions:

Mechanical vibration: Road surface, engine/motor vibration, curb impacts
Thermal expansion: Metal mounting brackets expand/contract with temperature (can be 20-30 degree C swing over a day)
Minor impacts: Speed bumps, potholes, minor contact events
Long-term fatigue: Bracket fasteners gradually loosen over thousands of operating hours

Even a fraction of a degree of rotation error in a camera-to-LiDAR extrinsic calibration can cause meter-level projection errors at 100 meters -- enough to misplace a pedestrian into the wrong lane.

Production Calibration Strategies

Factory calibration: Initial calibration in a controlled environment using target boards, calibration patterns, and precise fixtures. Establishes the baseline extrinsic parameters.

Service calibration: Periodic re-calibration at service centers using target-based methods. Most OEMs want to minimize this -- customers don't want to bring vehicles in for recalibration.

Online (runtime) calibration: The holy grail. Systems that continuously monitor and correct extrinsic drift during normal operation:

CalibNet and similar approaches: Neural networks that estimate extrinsic corrections from photometric and geometric consistency between camera and LiDAR data
Weighted moving average with outlier rejection: Smoothly updates calibration parameters while rejecting transient errors
Motion-based self-calibration: Uses vehicle ego-motion (from IMU/GPS) to cross-validate sensor alignment
Feature matching: Detects natural features (lane markings, signs, building edges) visible to both cameras and LiDAR; if they don't align, calibration has drifted

Production Reality

Intrinsic calibration: Largely solved. Modern LiDARs and cameras auto-compensate for day-to-day intrinsic variation.
Extrinsic calibration: Remains a legitimate industry concern. Online calibration works but is not yet drift-free in all conditions. Production systems use a combination of factory baseline + runtime correction + periodic service checks.
Design mitigation: Better sensor housings that dampen vibration, thermally stable mounting materials, and rigid mounting structures reduce the drift that calibration must compensate for.

Temporal Synchronization

Camera frames and LiDAR scans must be precisely time-aligned for fusion. In production:

Hardware PPS (pulse-per-second) signals from GPS provide a common time reference
LiDAR scan timestamps are matched to the nearest camera frame
Misalignment of even 10ms at 60 mph means objects have moved ~27cm between the camera and LiDAR views -- enough to cause fusion artifacts

11. Perception Pipeline in Production

Typical Production Architecture

Raw Sensor Data (Camera/LiDAR/Radar)
    |
    v
[Sensor Preprocessing] (10-50ms)
  - Image rectification, point cloud filtering, radar clustering
    |
    v
[Object Detection & Classification] (30-100ms)
  - 2D/3D bounding boxes, semantic labels
  - BEV feature extraction
    |
    v
[Sensor Fusion] (15-45ms)
  - Multi-modal feature fusion (camera + LiDAR + radar)
  - BEV-space alignment
    |
    v
[Multi-Object Tracking] (5-15ms)
  - Track-by-detection, data association
  - Track lifecycle management (creation, update, deletion)
    |
    v
[Trajectory Prediction] (20-60ms)
  - Future trajectory forecasting for all tracked agents
  - Multi-modal predictions (multiple possible futures)
    |
    v
[Planning Interface] (5-10ms)
  - Perceived world state handed to motion planner

Total perception latency budget: 85-280ms, with production systems targeting the lower end.

What Actually Runs in Production

Waymo:

Custom multi-task neural networks; most run in under 10ms inference time
Published research using PointPillars-like architectures for 3D detection from LiDAR
Pillar-based methods achieve 62-100 Hz in 3D detection (10-16ms per frame)
Multi-sensor early and late fusion with camera-LiDAR-radar
Separate prediction module forecasts actor trajectories, informing the planner
HD map integration provides prior road knowledge

Tesla (FSD v12):

End-to-end neural network replacing 300K lines of code
48 neural networks running in concert
Occupancy Networks for volumetric scene understanding (100+ FPS)
BEV Transformer architecture fusing 8 camera views
Direct output of control commands (steering, acceleration, braking)
No explicit separate tracking or prediction modules -- the network learns these implicitly
Training: 10M video clips, 70,000 GPU hours per training cycle, 1.5 PB of data

Industry trend -- BEV + Transformer: The dominant production architecture is converging on:

Extract features from each sensor modality
Project all features into a common BEV (Bird's Eye View) representation
Use transformer attention to fuse features across modalities and time
Run detection, tracking, and prediction heads on the unified BEV features

This architecture (exemplified by BEVFormer, BEVFusion, TransFusion) unifies perception into a single representation, eliminating the cascading errors of traditional pipeline approaches.

Compute Requirements

Production perception requires a minimum of approximately 100 TOPS (Trillion Operations Per Second) of computing capacity. Emergency braking must execute within 100ms from detection to action at highway speeds.

12. Production Failure Modes

Camera Failure Modes

Failure Mode	Cause	Impact	Mitigation
Glare / saturation	Direct sunlight, oncoming headlights	Complete loss of scene information in affected regions	HDR imaging, multiple exposure fusion
Rain / water droplets	Precipitation on lens	Blurred, distorted images	Wipers, hydrophobic coatings, air jets
Fog	Atmospheric scattering	Reduced contrast and range	Image dehazing algorithms, sensor fusion fallback
Condensation	Temperature differential (cold exterior, warm vehicle)	Fogged lens	Heaters, defoggers
Low light	Night, tunnels, heavy overcast	Noise, reduced detection range	High-sensitivity imagers (17MP Waymo), near-IR illumination
Dirty lens	Road spray, dust, insects	Partial or complete occlusion	Active cleaning systems

LiDAR Failure Modes

Failure Mode	Cause	Impact	Mitigation
Rain attenuation	Raindrops scatter/absorb laser pulses	56% NPC reduction at 30-40 mm/h; non-detection beyond 30m for some materials	Multi-return filtering, FMCW LiDAR
Fog attenuation	Atmospheric scattering	59% NPC reduction at visibility <= 50m	1550nm wavelength (less fog scatter than 905nm), FMCW
Reflective surfaces	Mirrors, polished metal, glass facades	Complete reflection = no range data; phantom points	DBSCAN clustering to identify mirror reflections
Snow accumulation	Snow on sensor window	Gradual occlusion	Heating elements, wipers
Road spray	Following vehicles at speed	Dense false returns near vehicle	Waymo 6th-gen specific improvements for freeway spray
Solar interference	Direct sunlight at 905nm	False detections	1550nm wavelength avoidance, bandpass filtering
Mechanical wear	Spinning LiDAR motor bearings	Degraded scan quality over time	Solid-state LiDAR (no moving parts)

Quantitative degradation data (from empirical road testing):

Condition	Point Cloud Reduction	Intensity Reduction	Notes
Light rain (10-20 mm/h)	Minimal	Minimal	Generally acceptable
Intense rain (30-40 mm/h)	Up to 56%	Up to 73%	Significant; detection range drops below 30m for low-reflectivity targets
Weak fog (100-150m visibility)	Minimal	Minimal	Generally acceptable
Thick fog (visibility <= 50m)	Up to 59%	Up to 73%	Severe; similar impact to intense rain
Retroreflective targets	26% NPC decrease at 50m in fog	Less affected	Best-case scenario in bad weather
Non-reflective metal	81-86% NPC decrease at 20m in fog	Severe	Near-complete loss for metal objects

Radar Failure Modes

Failure Mode	Cause	Impact	Mitigation
Multipath / ghost targets	Reflections off guardrails, bridges, tunnels	False positive detections	Temporal filtering, multi-frame consistency
Frequency interference	Multiple FMCW radars operating nearby	Noise floor elevation	Cognitive waveform adaptation
Angular ambiguity	Resolution limits	Targets merged or mislocated	4D imaging radar (192+ channels)
Rain attenuation	Severe rainfall	Up to 45% range reduction	More resilient than LiDAR/camera; primary all-weather sensor
Static object blindness	Traditional radar filters out stationary clutter	Cannot detect parked vehicles, barriers	4D imaging radar with point cloud mode

How Production Systems Handle Degradation

Graceful degradation architecture:

Confidence-weighted fusion: Each sensor modality contributes to the fused perception proportional to its current confidence. If cameras are fogged, their weight drops; LiDAR and radar weights increase.
Cross-validation: Detections must be confirmed by at least two modalities before being acted upon. Single-modality detections are flagged as lower confidence.
Operational domain restriction: If sensor degradation exceeds thresholds, the system reduces its operational capability -- slowing down, refusing lane changes, or pulling over (Minimum Risk Condition).
Redundancy by design: Overlapping fields of view from different sensor types ensure no single failure creates a blind spot. Waymo's philosophy: every critical area is covered by at least two sensing modalities.
Simulation-validated fallbacks: Before deployment, degradation scenarios are tested in simulation (Software-in-the-Loop). The autonomy stack is validated against progressive sensor failure to ensure safe behavior.

13. Lessons for Airside Autonomous Vehicles

The Proven Sensor Configuration

Based on every production L4 deployment analyzed, the minimum viable sensor suite for a driverless airside vehicle is:

Sensor	Quantity	Purpose	Proven By
LiDAR	2-4 units	Primary 3D perception, obstacle detection, localization	Waymo, Zoox, Cruise, TractEasy, reference airside AV stack, Baidu
Cameras	4-8	Object classification, sign/marking reading, visual context	All production systems
Radar	2-4 (preferably 4D imaging)	All-weather perception, velocity measurement	Waymo, ZF/SAIC, Continental
GPS/GNSS + IMU	1 each	Localization, dead reckoning	All production systems
V2X	1+	Infrastructure communication (gates, traffic, aircraft)	TractEasy, airside-specific

What Is Proven vs. What Is Risky

Proven (low risk):

LiDAR + camera + radar sensor fusion for L4 autonomy
Hesai and RoboSense LiDAR for production reliability
Pre-mapped route operation with obstacle detection (TractEasy/reference airside AV stack model)
Active sensor cleaning systems (wipers, air jets, heaters)
4D imaging radar as a weather-robust complement to LiDAR
GPS/IMU/encoder fusion for centimeter-level localization

Emerging (moderate risk):

FMCW 4D LiDAR (Aeva) -- better in weather but not yet mass-deployed
End-to-end neural network perception (Tesla FSD v12 architecture) -- powerful but unproven for L4
BEV Transformer fusion architectures -- rapidly maturing, industry converging on this
Online extrinsic calibration -- works but not perfectly drift-free
LWIR thermal cameras (Zoox approach) -- excellent for pedestrian detection at night but limited supplier ecosystem

Risky (high risk for L4 airside):

Camera-only perception without LiDAR -- no production L4 system has achieved this
Minimal sensor configurations (comma.ai model) -- suitable only for L2 with human supervision
Operation without active sensor cleaning -- will fail in real airport environments
Single-sensor-modality dependence -- any modality can fail in specific conditions

Airside-Specific Recommendations

Start with the TractEasy/reference airside AV stack model: Pre-mapped routes, LiDAR + camera + radar + GPS/IMU. This is the most conservative, most proven approach for airside operations. Both companies have demonstrated successful airport deployments.
Add 4D imaging radar early: Continental ARS548 or ZF imaging radar at $200-500/unit adds robust all-weather capability. Rain is the most common sensor-degrading condition at airports, and 4D radar is the most weather-resistant perception sensor available.
Invest in sensor cleaning from day one: Airport environments are harsh -- jet exhaust, de-icing chemicals, rubber dust, bird debris. Integrated cleaning (wipers + air jets + heaters) is non-negotiable. Budget for this as a first-class system component, not an afterthought.
Consider LWIR thermal cameras: Airside operations occur at night, in fog, and around hot equipment (engines, APUs). FLIR/LWIR cameras detect people by body heat regardless of lighting or weather -- a significant safety advantage that only Zoox currently deploys in production.
Plan for calibration maintenance: At airside operating speeds (10-25 km/h), calibration drift is less critical than at highway speeds. But vibration from rough apron surfaces and thermal cycling (hot tarmac to cold nights) will still cause drift. Implement online calibration monitoring with periodic factory re-calibration.
Design for graceful degradation: The Cruise incident proved that failure mode design is as important as normal-mode performance. For airside: if any perception subsystem degrades below threshold, the vehicle must stop and request human intervention. Never implement a "relocate" behavior when perception is uncertain.
LiDAR selection: For airside, prioritize reliability and weather performance over maximum range. A 150m solid-state LiDAR (Hesai AT128 or RoboSense M1) is more than sufficient for 25 km/h operations and avoids the mechanical reliability concerns of spinning LiDAR. Consider Aeva FMCW if budget allows -- the velocity-per-point data dramatically simplifies tracking of people and vehicles on the apron.
Perception latency budget: At 25 km/h, the vehicle travels ~7 meters per second. With a 200ms total perception-to-action latency, the vehicle travels 1.4 meters before reacting. This is acceptable for airside but should be validated against stopping distance requirements for the specific vehicle mass and braking capability.

The Bottom Line

The autonomous vehicle industry has converged on a clear answer: multi-modal sensor fusion (LiDAR + cameras + radar) is required for fully driverless operation. No shortcuts have worked. The debate is not about which modalities to include but about which specific sensors, how many, and how to fuse them optimally. For airside operations, the lower speeds and pre-mapped environments reduce the difficulty -- but the safety stakes (proximity to aircraft, fuel, and people) demand the same sensor redundancy philosophy that Waymo has proven over millions of miles.

SLAM Methods

Methods

Production Perception Systems: What Actually Works in the Real World ​

Overview ​

1. Waymo Sensor Suite ​

5th Generation (Deployed 2020-2025) ​

6th Generation (Deployed February 2026) ​

Key Waymo Insight ​

2. Cruise Sensor Suite ​

Configuration (Chevy Bolt-Based Vehicles) ​

The October 2023 Incident and Lessons ​

Lessons for the Industry ​

3. Zoox Sensor Suite ​

Purpose-Built Vehicle Design ​

Why Purpose-Built Matters ​

4. Tesla Vision-Only Approach ​

Current Production System (Hardware 3, ~4M+ Vehicles) ​

Hardware 4 (Rolling Out in New Vehicles) ​

How Camera-Only Perception Works ​

Performance vs. LiDAR Approaches ​

5. comma.ai Sensor Suite ​

Hardware Configuration ​

How comma.ai Achieves Performance with Minimal Sensors ​

Supported Vehicles ​

6. TractEasy / reference airside AV stack -- Production Airside Vehicles ​

TractEasy (EasyMile) ​

reference airside AV stack (autonomous baggage/cargo tug, autonomous cargo vehicle) ​

Lessons from Airside Deployments ​

7. LiDAR in Production: Brands, Deployments, and Reliability ​

Which LiDAR Brands Are Actually Deployed ​

Cost Evolution ​