Skip to content

Aurora Innovation — Perception Stack Supplementary Deep Dive

This document supplements the existing aurora-av-tech-stack.md with additional technical details gathered from Aurora blog posts, published papers, patents, job postings, earnings calls, investor presentations, and conference talks. It focuses exclusively on information NOT already covered in the base document.


1. S2A (Sensor-to-Adjustment) Deep Dive

Architectural Positioning

S2A is not a standalone detector -- it is the tracking and refinement stage that sits downstream of initial detection. While detectors like LaserNet, LaserFlow, and SpotNet produce initial object hypotheses, S2A takes over once an object enters the tracking pipeline and continuously refines its state estimate by going back to raw sensor data on every cycle.

Per-Object Crop-and-Refine Mechanism

S2A operates on localized sensor crops centered on each tracked object. The mechanism works as follows:

  1. Crop extraction: For each tracked object, the system extracts a local region of raw LiDAR points, radar returns, and camera pixels from the ego-centric sensor representation, centered on the object's last known position
  2. Multi-modal input: The crop includes data from all available sensor modalities (LiDAR range view, LiDAR BEV, camera image patches, radar detections), not processed upstream features
  3. Neural refinement: A neural network processes these raw sensor crops to produce refined position, velocity, orientation, and estimated future positions
  4. State update: The refined estimates are fed back into the tracking state, which uses classical Extended Kalman Filter (EKF) state estimation to maintain temporally consistent tracks

Execution Model

  • Runs thousands of times per second across every tracked object simultaneously
  • Unlike human attention (which can focus on 3-5 objects), S2A maintains independent parallel processing channels for all objects in the scene
  • The "adjustment" in the name refers to the continuous state refinement -- the system does not re-detect objects from scratch each cycle but adjusts existing track states using fresh sensor evidence

Relationship to Classical Tracking

S2A represents a hybrid architecture where:

  • Neural networks handle the perceptual problem of extracting object state from raw sensor data
  • EKF handles the estimation theory problem of fusing observations over time with motion models
  • Job postings for "Senior Perception Software Engineer - Tracking" explicitly require "extensive experience in state estimation, Kalman Filter implementation, and 3D object tracking," confirming the classical-neural hybrid

Training Methodology (Inferred)

Based on Aurora's published data engine workflow and the PnPNet paper (which describes a closely related architecture), S2A likely trains:

  • End-to-end through the tracking loop rather than with teacher forcing (PnPNet demonstrated this approach, showing that using the model's own tracking outputs during training, rather than ground truth, improves generalization)
  • With multi-task losses combining detection accuracy, tracking consistency, and prediction quality
  • Using Aurora's automated data engine, which deploys new training data within two weeks of collection

Sensor Dropout Training

Aurora's system is explicitly "trained to be robust to hardware issues where individual or multiple sensors may be lost, with perception continuing to work even in such degraded conditions." This strongly suggests S2A training includes random sensor modality dropout during training -- randomly zeroing out LiDAR, camera, or radar input channels to force the network to learn redundant representations.

Patent US20200234110 describes a related technique: a "dynamic dropout routine" that determines "a dynamic dropout probability distribution associated with neurons of a neural network" to "help neurons learn distinguishable features."

Sources: Seeing with Superhuman Clarity, Senior Perception Engineer - Tracking Job Posting, PnPNet (CVPR 2020)


2. Remainder Explainer Deep Dive

Core Philosophy: Explain Every Measurement

The Remainder Explainer embodies the principle that "all of our sensor measurements have an explanation." After Mainline Perception claims measurements belonging to known object categories (vehicles, pedestrians, cyclists), the Remainder Explainer processes every remaining unexplained sensor return.

How Unknown Objects Get Avoidance Scores

The scoring pipeline works in stages:

  1. Unexplained measurement collection: All sensor returns (LiDAR points, radar detections) not assigned to known-category tracks by Mainline Perception are collected
  2. Instance grouping: These measurements are clustered into coherent physical objects using category-agnostic instance segmentation, likely descended from the OSIS (Open-Set Instance Segmentation) paper from Uber ATG
  3. Generic object tagging: Each cluster is tagged as "one or more generic objects" -- the system does not attempt classification
  4. Motion tracking: Moving generic objects are tracked using state estimation, predicting their trajectories
  5. Avoidance score assignment: An ML-based score is computed based on observable physical properties

Features Driving the Avoidance Score

Based on Aurora's public descriptions and the OSIS heritage, the avoidance score likely incorporates:

  • Physical dimensions: Larger objects receive higher avoidance scores (a full trailer fragment vs. a small debris piece)
  • Material reflectivity / LiDAR return intensity: FMCW LiDAR provides calibrated return intensity; highly reflective or metallic objects may score differently than soft/organic materials
  • Motion patterns: Moving objects (non-zero Doppler velocity) score higher than stationary ones; approach velocity toward the ego vehicle further increases the score
  • Persistence: Objects that maintain consistent measurements over multiple frames score higher than transient noise
  • Height and vertical extent: Objects at road surface level vs. elevated objects (e.g., overhead signs)

Relationship to OSIS Paper

The OSIS paper (arXiv:1910.11296) from Uber ATG is almost certainly the research foundation:

  • OSIS uses a category-agnostic embedding network to project LiDAR points from the same instance close together in embedding space
  • DBSCAN clustering on these embeddings groups points into instances, handling both known and unknown categories
  • The "open-set inference procedure" first associates points with known-class prototypical features, then clusters remaining points into new unknown-class instances
  • BEV resolution: 0.15625m over a 160x160x5m region
  • This two-phase approach (known-class association then unknown-class clustering) maps directly to the Mainline Perception / Remainder Explainer split

Real-World Validation

The Remainder Explainer has demonstrated detection of objects never seen in training, including:

  • A charred trailer fragment on the highway
  • Dropped cargo debris
  • Detached tires
  • Livestock on roadways

The system reports each unknown object's "current and estimated future position to the motion planner, which instructs the vehicle to stop" or take avoidance action.

Sources: Perception: No Measurement Left Behind, OSIS (arXiv:1910.11296)


3. FMCW LiDAR Perception Advantages — Additional Details

Instantaneous Doppler in the Perception Pipeline

Beyond static/dynamic separation (already documented), FMCW Doppler velocity feeds into perception in several additional ways:

Ego-Motion-Only Point Cloud Preprocessing (Patent US20200400821A1):

  • Because every point carries instantaneous radial velocity, the system can estimate full 3D ego-motion (translational and rotational) from a single LiDAR sweep without IMU
  • Translational velocity: least-squares minimization across all point radial velocities given their azimuth/inclination
  • Rotational velocity: fits velocity-vs-lever-arm relationship across multiple sensor origins
  • Bidirectional scan averaging cancels acceleration artifacts
  • This provides IMU-independent motion compensation, a unique redundancy advantage

Class-Adaptive Velocity Aggregation (Patent US20190317219A1): The system selects different velocity estimation methods per object class:

  • Pedestrians: Mean aggregation of per-point velocities (simpler motion model)
  • Vehicles: Histogram analysis -- bin velocities by range, select modal bin (handles articulated vehicles with different part velocities)
  • Median aggregation: Used as robust fallback when data quality is uncertain
  • Selection depends on object classification, point count, and proximity

Night Perception Advantage: FMCW's coherent detection is single-photon sensitive -- it operates at the quantum noise limit regardless of ambient illumination. At night, when cameras are severely degraded, FirstLight LiDAR maintains full performance. The system detects pedestrians "over 300 meters away at night, before they would have been visible to the naked eye," providing approximately 9 seconds of reaction time at highway speeds.

1550nm Eye Safety Advantage: The 1550nm wavelength is absorbed by the cornea rather than focused on the retina, allowing approximately 10x higher transmit power than 905nm ToF systems while remaining eye-safe. This directly translates to longer range (currently 450m+, Gen 2 targeting 1,000m).

Sources: US20200400821A1, US20190317219A1, Detecting a Pedestrian at Night


4. SpotNet — Additional Architecture Details

Backbone Architecture

SpotNet uses VoVNetV2 as its feature extractor:

  • Stem: Two fully convolutional layers (32 and 64 dimensions, 7x7 and 3x3 kernels), first layer applies 2x downsampling
  • Body: Three downsampling stages followed by three upsampling stages
  • Feature output at H/2 x W/2 resolution

RGB-D Input Construction

The 5-channel input tensor is constructed by:

  1. Processing the RGB image at native resolution (2MP for training, 8MP for inference)
  2. Projecting sparse LiDAR points into image space using z-buffering (masking occluded points)
  3. Creating a 2-channel sparse depth raster: (a) Euclidean distance from camera, (b) binary sentinel indicating valid LiDAR returns
  4. Concatenating depth channels with RGB to form the 5-channel input
  5. Depth raster is resized and concatenated with feature maps at multiple stages using nearest-neighbor sampling

LiDAR Anchoring — Why It Matters

Rather than regressing absolute 3D distances (which is extremely difficult at long range), SpotNet anchors detections on LiDAR points:

  • Losses are applied only on pixels corresponding to valid LiDAR point projections (sparse supervision)
  • For each LiDAR point on an object, the network predicts:
    • 2D offsets to 2D box center
    • 3D offsets to projected 3D centroid
    • Distance delta along camera ray: delta_d = dot(ray_unit_vector, displacement_to_centroid)
    • Bearing-relative orientation (cos theta, sin theta)
    • 3D extents (width, length, height)

Critical Finding: Multi-Modal Supervision

The single most impactful design choice is joint 2D and 3D supervision:

ConfigurationVRU AP @100-200mVehicle AP @100-200m
3D only supervision27.959.3
3D + projected 3D labels33.458.4
3D + true 2D labels50.671.7

Human-labeled 2D bounding boxes substantially outperform projected 3D labels for 2D supervision. This is a key insight: at long range, precise 3D annotation is difficult, but 2D annotation is reliable.

2MP-to-8MP Transfer Technique

SpotNet trains on 2MP images but deploys on 8MP without retraining:

  • During training at 2MP: apply 50% point-wise dropout on LiDAR projections
  • During 8MP inference: remove dropout (4x more pixels means each pixel has 4x fewer LiDAR points, but without dropout the density matches training)
  • Rescale depth values by 0.5 for consistent depth distribution per unit area
  • Undo range rescaling in post-processing

This produces strong gains: vehicles at 400-500m improve from 38.2% AP (2MP) to 55.5% AP (8MP).

Post-Processing Pipeline

  1. Query class heatmap to identify foreground LiDAR points
  2. Decode 2D bounding boxes, apply 2D NMS (IoU 0.5 threshold)
  3. Decode 3D bounding boxes on remaining detections
  4. Apply BEV NMS (IoU 0.2 threshold)

Both NMS stages are essential; ablations show significant performance drops when either is removed.

Dataset Scale

Aurora's Long Range Dataset for SpotNet:

  • 43,500 five-second training snippets at 10Hz
  • 4,000 validation snippets
  • 30-degree FOV long-range camera at 8MP native resolution
  • FMCW LiDAR with 400m+ range
  • Evaluation conducted from 100-500m range

Computational Scaling

SpotNet achieves O(1) compute complexity with range, compared to O(r^2) for BEV methods. On an NVIDIA A10 GPU, inference time remains constant regardless of detection range, while BEV approaches show runtime increasing with distance.

Source: SpotNet (arXiv:2405.15843)


5. Multi-View Fusion — Evolution and Architecture

WACV 2022 Paper Architecture

The Multi-View Fusion paper (Fadadu et al., WACV 2022) describes the published version of Aurora's sensor fusion:

Authors: Sudeep Fadadu, Shreyash Pandey, Darshan Hegde, Yi Shi, Fang-Chieh Chou, Nemanja Djuric, Carlos Vallespi-Gonzalez

Input Representation:

  • BEV voxels: LiDAR + RADAR + HD Map, with 10 historical sweeps stacked (T=10)
  • BEV voxel resolution: deltaL=0.16m, deltaW=0.16m, deltaV=0.2m
  • Range View: sensor-native spherical discretization of LiDAR
  • Image View: 2D camera features
  • HD Map: rasterized as 7 binary mask channels (driving paths, crosswalks, lane boundaries, road boundaries, intersections, driveways, parking lots)

Architecture:

  • Parallel CNN branches extract features from each view independently
  • Camera and Range View features are fused via feature projection
  • All features converge into the BEV frame for final processing
  • Joint detection and prediction heads output 3D boxes with future trajectory waypoints

Evolution Beyond WACV 2022

Aurora's production system has evolved significantly since this publication:

  1. New vision architecture (2025): Aurora disclosed in Q4 2025 earnings materials that they are "leveraging a new vision architecture within its perception system that enables the Aurora Driver to perceive its surroundings more efficiently, reducing compute requirements. This model advancement delivers superior performance at a lower cost than their previous robust perception model." This suggests a more efficient backbone (potentially transformer-based given Aurora's stated use of "transformer-style models on the road since 2021") replacing or augmenting the original CNN-based multi-view fusion.

  2. Sensor modality robustness: The production system is "trained to be robust to hardware issues where individual or multiple sensors may be lost, with perception continuing to work even in such degraded conditions" -- a capability not discussed in the WACV 2022 paper.

  3. SpotNet integration: The long-range detection system likely runs in parallel with the multi-view fusion pipeline, with SpotNet handling the 100-500m range and the multi-view system handling closer ranges where BEV resolution is adequate.

MVFuseNet and RV-FuseNet Lineage

The WACV 2022 paper builds on two workshop papers from Aurora/Uber ATG:

  • RV-FuseNet (arXiv 2020): Range-view temporal fusion for joint detection and trajectory estimation using panoramic range images
  • MVFuseNet (CVPR 2021 Workshop): Multi-view fusion combining Range View and BEV for joint detection and motion forecasting

The production system likely incorporates ideas from all three papers, with the BEV+RV+Camera fusion architecture serving as the backbone.

Sources: Multi-View Fusion (WACV 2022), Q4 2025 Earnings


6. Prediction Architecture Deep Dive

LaneGCN Heritage

Aurora's motion prediction system descends directly from LaneGCN (ECCV 2020 Oral), developed at Uber ATG before the acquisition. The architecture has four modules:

Module 1 — ActorNet:

  • 1D CNN extracts temporal features from observed actor trajectories
  • Feature Pyramid Network merges multi-scale temporal features
  • Produces a per-actor feature vector encoding motion history

Module 2 — MapNet (the Lane Graph Network):

  • Constructs a lane graph from HD map data with 4 connectivity types: predecessor, successor, left neighbor, right neighbor
  • LaneConv operator: Extends graph convolutions with multiple adjacency matrices and along-lane dilation
  • Stack of 4 multi-scale LaneConv residual blocks, each with dilation factors (1, 2, 4, 8, 16, 32)
  • All layers use 128 feature channels
  • Captures long-range lane topology dependencies (critical for highway scenarios where decisions depend on lane structure hundreds of meters ahead)

Module 3 — Actor-Map Fusion Cycle: A stack of 4 fusion networks with four interaction types:

  • A2L (Actor-to-Lane): Actors attend to nearby lane features
  • L2L (Lane-to-Lane): Lane features propagate through the lane graph
  • L2A (Lane-to-Actor): Lane context flows back to actor representations
  • A2A (Actor-to-Actor): Actors attend to each other (social interactions)

Module 4 — Prediction Header:

  • Outputs multi-modal trajectory predictions with associated probabilities
  • Each mode represents a distinct behavioral intent (e.g., lane-keep, lane-change left, lane-change right)

SpAGNN: Spatially-Aware Interaction Modeling

SpAGNN (also from Uber ATG) adds explicit spatial reasoning:

  • For each agent, positions of all other agents are transformed to that agent's local coordinate frame
  • Message passing between nodes updates states based on spatially-aware neighbor information
  • Output: 2D waypoints with positions and orientations
  • Location uncertainty: Gaussian distribution
  • Orientation uncertainty: Von Mises distribution
  • Uses RoI-aligned features as input, refined by GNN interaction model

Conditional Forecasting Architecture

Aurora's conditional forecasting (the core innovation for their interleaved prediction-planning) works technically as follows:

  1. Candidate ego trajectory generation: The Proposer generates dynamically feasible trajectory candidates
  2. Conditional prediction: For each candidate ego trajectory, the forecaster predicts how surrounding actors would respond -- this produces "conditional" forecasts rather than "marginal" forecasts
  3. Joint evaluation: The Ranker evaluates each (ego action, world response) pair
  4. Causal reasoning: This enables "If I do X, what happens?" reasoning rather than "What will others do regardless of me?" -- a critical distinction for interactive driving scenarios like merges, lane changes, and unprotected turns

The key insight from Aurora's forecasting blog post: traditional cascaded systems produce marginal forecasts biased by the training data's driver behaviors, and cannot reason about how the AV's actions influence other actors. Conditional forecasting eliminates this.

LookOut: Joint Prediction-Planning

LookOut (ICCV 2021, Uber ATG) formalized this approach:

  • Generates diverse multi-future predictions rather than single deterministic trajectories
  • Uses a CVAE (Conditional Variational Autoencoder) framework for latent variable modeling
  • Optimizes the evidence lower bound (ELBO) of the log-likelihood
  • Scores candidate ego trajectories against multiple predicted futures
  • Plans are selected to be robust across diverse future scenarios
  • Joint training combines prediction losses with planning-specific losses

PnPNet: End-to-End Perception-Prediction

PnPNet (CVPR 2020, Uber ATG) demonstrated end-to-end integration:

Tracking Architecture:

  • LSTM-based sequence modeling for each tracked object
  • Two feature types feed the LSTM: (a) observation features from bilinear interpolation of BEV feature maps, (b) motion features including 2D velocities and ego angular velocity
  • Combined via MLP before LSTM processing

Data Association:

  • Bipartite matching using a learned affinity matrix with three components:
    • MLPpair: Compares detection-track feature pairs
    • MLPunary: Scores detections as potential new objects
    • Hungarian algorithm: Solves optimal assignment
  • For occluded objects: searches within a local neighborhood centered at predicted positions

Motion Forecasting:

  • MLPpredict generates future trajectories directly from trajectory-level LSTM representations
  • Track length: T=16 frames (1.6 seconds)
  • Prediction horizon: 3 seconds
  • Maximum 50 tracks/detections per class

Training: Crucially avoids teacher forcing -- uses the model's own tracking outputs during training rather than ground truth, improving generalization.

Sources: LaneGCN (ECCV 2020), LookOut (ICCV 2021), PnPNet (CVPR 2020), Forecasting Part 1


7. Motion Planning Integration — Perception Interface

Perception Output Format to Planner

Based on Aurora's published materials, the perception system outputs to the motion planner:

From Mainline Perception (per tracked object):

  • Object class (vehicle, pedestrian, cyclist, motorcycle, construction equipment, etc.)
  • 3D oriented bounding box (position, dimensions, orientation)
  • Velocity vector
  • Uncertainty estimates (per PnPNet/LaserNet's probabilistic outputs)
  • Predicted future trajectory (6 waypoints over 3 seconds per LaserFlow, or longer horizon from the prediction module)
  • Track ID and track age

From Remainder Explainer (per unknown object):

  • Generic bounding box (no class label)
  • Current position and estimated future position
  • ML-based avoidance score
  • Dimensional extent

From Construction Perception:

  • Individual cone/barrel detections
  • Aggregated blockage regions (treated as solid walls by the planner)
  • Temporary lane marking overrides

From Static Perception:

  • HD Map rasterized layers (7 binary mask channels)
  • Real-time lane detection (when map override is active)
  • Traffic signal states (from camera-based detection)

Planner Consumption

The Proposer-Ranker planner operates at ~10 Hz, consuming perception outputs to:

  1. Proposer: Generates dynamically feasible trajectory candidates using perception-derived scene understanding
  2. Conditional Forecaster: For each candidate trajectory, predicts how other actors would respond
  3. Ranker: Scores each (trajectory, world response) pair using learned cost functions from Maximum Entropy IRL
  4. Invariant enforcement: Hard safety constraints (traffic rules, collision avoidance) filter proposals
  5. Selection: Highest-scoring trajectory that passes all invariants is executed

The Offline Executor simulation module can feed "the planner the set of inputs it otherwise would get from the perception module" -- confirming that the perception-planning interface is a well-defined data boundary enabling modular testing.

Validation of Perception-to-Planning

The validation module evaluates planned trajectories by asking:

  • Did the trajectory obey traffic law?
  • Did motion planning meet its objective?
  • Is the trajectory comfortable for vehicle occupants?

These questions implicitly validate perception quality: if perception provides incorrect inputs (wrong object positions, missed detections), the resulting trajectories will fail validation.

Sources: AI Alignment, The Offline Executor, Perception: No Measurement Left Behind


8. Construction Zone Perception — Technical Details

Detection Pipeline

The construction zone perception system operates in multiple stages:

Individual Element Detection:

  • SpotNet detects traffic cones and construction equipment as explicit output classes at long range
  • Cameras detect construction signage (speed limit changes, lane endings, merge warnings)
  • LiDAR and radar detect physical barriers, barrels, and delineators
  • Detection begins at sufficient range for highway-speed deceleration

Blockage Region Aggregation: Individual cone/barrel detections are aggregated into continuous blockage regions:

  • Adjacent or closely-spaced construction elements are merged into a single barrier representation
  • The blockage region is treated as equivalent to a solid wall by the motion planner
  • Visualized as yellow walls in Aurora's Lightbox visualization system
  • This aggregation converts point detections (individual cones at known locations) into area constraints (impassable zones)

Lane Override System: When temporary lane markings are detected by cameras:

  • The system compares perceived lane geometry against Atlas HD map lanes
  • If mismatch is detected, perception overrides the map
  • The vehicle follows perceived temporary lanes instead of mapped permanent lanes
  • This is critical in Texas construction zones where lane geometry shifts frequently

Nudging Capability

When construction elements encroach into adjacent lanes:

  • The system calculates how much encroachment exists
  • The motion planner can generate trajectories that "nudge" outside normal lane boundaries
  • This extends to driving partially on the shoulder to avoid construction obstacles
  • The nudge is bounded by safety constraints ensuring the vehicle does not enter oncoming traffic or hazardous areas

Scale of Testing

  • 3,500+ miles of Texas construction zone driving
  • Approximately 40 construction sites per week encountered on the Dallas-El Paso route
  • Construction scenarios are converted into simulation test cases for regression testing

Sources: Tackling Construction


9. Emergency Vehicle Detection — Technical Details

Visual Detection System

The detection system uses camera-based rapid brightness and color change analysis:

  • Emergency vehicles are characterized by distinctive patterns of rapidly alternating bright colors (red, blue, white) from light bars
  • The system is trained on labeled camera data to recognize these temporal patterns
  • Training data was collected at Aurora's Almono test track in Pittsburgh, PA and at the Transportation Research Center's test track in East Liberty, OH
  • Operators manually drove Aurora Driver-equipped vehicles while others operated law enforcement-issue emergency vehicles
  • Scenarios reenacted: traffic stops, high-speed pursuits, and accident scenes
  • Captured data was converted into simulation scenarios with parametric variations

Audio Siren Detection (In Development)

Aurora has disclosed active development of audio-based siren detection:

  • Training on "different types of sirens at varying levels of volume and proximity"
  • Using labeled audio files as training data
  • This capability is described in aspirational terms, suggesting it may not yet be in production

Behavioral Response Protocol

Once an emergency vehicle is detected:

  1. Determine if the ego vehicle's path is affected
  2. Implement immediate corrective action (lane change or speed reduction)
  3. Comply with Texas law: maintain at least 500 feet from active emergency vehicles
  4. Follow "Move Over or Slow Down" regulations (lane change or 20 mph speed reduction)
  5. Alert remote assistant via Aurora Beacon for additional support

Discrimination Challenges

The system must distinguish emergency vehicles from:

  • Tow trucks with amber warning lights
  • Construction vehicles with flashing beacons
  • Other vehicles with aftermarket lighting
  • The rapid brightness and color change detection specifically targets the high-frequency alternation pattern unique to emergency light bars

Source: Emergency Vehicles


10. Night Driving Perception — Technical Details

FMCW Advantage at Night

FirstLight FMCW LiDAR provides a fundamental advantage in darkness:

  • Coherent detection is sensitive to single photons -- operates at the quantum noise limit
  • Performance is independent of ambient illumination (unlike cameras, which require external light sources)
  • Maintains full 450m+ range at night (Gen 2: 1,000m)
  • Detects pedestrians "over 300 meters away at night, before they would have been visible to the naked eye"

Camera Degradation Compensation

At night, cameras provide limited utility:

  • Aurora's demo videos show the camera view as nearly black at night, with the pedestrian invisible to cameras
  • The system relies primarily on LiDAR for detection, with radar providing supplementary confirmation
  • FMCW's instantaneous Doppler velocity is especially valuable at night, enabling immediate classification of moving vs. stationary objects without requiring multi-frame visual tracking (which fails without adequate illumination)

Safety Margin

  • 37% of fatal crashes involving large trucks occur at night
  • The Aurora Driver detects objects in darkness more than 450 meters away
  • This provides approximately 9-11 seconds of reaction time at highway speeds before encountering road obstacles
  • The detection range exceeds what a human driver could perceive, even with headlights

Pre-Launch Validation

Before launching driverless night operations (July 2025):

  • Aurora examined "multiple collisions involving human-driven Class 8 trucks" on the Dallas-Houston route
  • Recreated these scenarios in simulation with variations in "time of day, vehicle size, distance, speed"
  • Human operators accumulated 3,000 nighttime loads over 700,000 commercial miles since 2023 before driverless night operations began
  • Night operations more than double potential truck utilization

Sources: The Road Never Sleeps, Detecting a Pedestrian at Night


11. Adverse Weather Perception — Technical Details

Rain, Fog, and Dust Effects on Sensors

LiDAR degradation in precipitation:

  • Light from LiDAR sensors bounces off rain droplets or snowflakes, "creating what can appear to be a cloud hovering around the truck"
  • FMCW's instantaneous Doppler can help distinguish rain returns (near-zero velocity relative to air, falling) from solid objects (stationary or moving along the road surface)
  • The FMCW advantage over ToF: rain/snow produce random phase noise in ToF systems, while FMCW can filter precipitation by velocity signature

Camera degradation:

  • Rain on camera lenses causes "blurry and desaturated" images with "reduced contrast and definition"
  • The sensor cleaning system blasts "a combination of high-pressure air and washer fluid" across each lens, "completely cleaning each lens in milliseconds"

Imaging radar advantage:

  • In dust storms and fog, "the perception system shifts weight to imaging radar"
  • Aurora's custom imaging radar penetrates rain, dense fog, and snow
  • Provides "true and precise 3D images" even when optical sensors are degraded
  • This dynamic modality weighting is a key graceful degradation mechanism

Three-Tier Graceful Fallback

  1. Normal conditions: Full speed, all sensors operational
  2. Degraded visibility: "Slow and proceed with caution" -- applies to rain, snow, fog, dust, smoke, and even insects on sensor lenses
  3. Severe conditions: "Begin searching for a safe place to stop and alert the Command Center"

The system's perception quality monitoring runs continuously: the "perception system is constantly assessing the range and quality of the data its sensors record."

Weather Validation Timeline

  • Inclement weather constrained Texas operations ~40% of the time in 2025
  • Rain and heavy wind validation completed in a January 2026 software release (a few weeks later than initially planned)
  • This was described as "a step-change in potential uptime" for fleet utilization
  • As of February 2026, the Aurora Driver can "driverlessly navigate highways and surface streets through multiple forms of inclement weather, including rain, fog, and heavy wind"

Wind Effects on Trucks

While not explicitly detailed, Aurora's mention of "heavy wind" validation is significant for Class 8 trucks:

  • Unladen trailers act as large sail areas susceptible to crosswinds
  • The perception system must account for lateral displacement due to wind gusts
  • Control and planning must adapt to wind-induced trajectory perturbations

Sources: Stormy Weather, Aurora Triples Network


12. New Papers and Publications (2024-2025)

Published Papers

SpotNet (arXiv, May 2024):

  • "An Image Centric, Lidar Anchored Approach To Long Range Perception"
  • arXiv:2405.15843
  • First Aurora paper to describe the long-range detection system in detail
  • Introduces the LiDAR-anchored image detection paradigm with O(1) compute scaling

Citations in Survey Papers

Aurora's prediction approach is discussed in a 2025 trajectory prediction survey (arXiv:2503.03262), which notes that Aurora's "prediction module performs joint object detection and trajectory forecasting, combining learning-based methods with rule-based constraints to align predictions with driving rules."

Note on Publication Cadence

Aurora has significantly reduced its academic publication rate compared to the Uber ATG era. This likely reflects:

  • A shift from research exploration to production deployment
  • The #CarryOurWater engineering tenet: "no separate research team" -- engineers build production systems rather than publishing papers
  • Competitive sensitivity as the company operates commercially

The Uber ATG heritage papers (LaserNet, LaserFlow, LaneGCN, PnPNet, SpAGNN, PIXOR, ContFuse, IntentNet, Fast and Furious, HDNET, LiDARsim, LookOut, MP3, MMF, OSIS, Multi-View Fusion) remain the most detailed public descriptions of Aurora's underlying perception algorithms.


13. Recent Patents (2024-2025)

Patent Portfolio Growth

Aurora's patent portfolio grew from ~1,400 assets to over 1,800 patents and pending applications by December 31, 2024.

Key Recent Grants

US11933901 (March 2024): Light detection and ranging (LiDAR) sensor system including bistatic transceiver -- covers the optical architecture of multi-receiver FMCW LiDAR configurations for improved angular resolution.

Multi-modal Sensor Fusion Patent (granted March 2025): Covers an autonomous vehicle system that obtains sensor data from multiple sensors with different modalities (RADAR, LiDAR, camera), fuses them to create a fused sensor sample, and provides this to a machine learning model for object detection and/or motion prediction. Notable for explicitly claiming "improved generalization performance over multiple sensor modalities."

US10579063B2: Machine learning for predicting locations of objects perceived by autonomous vehicles -- covers the prediction system including a machine-learned static object classifier, goal scoring model, trajectory development model, and ballistic quality classifier. Notably describes object representation as "bounding polygon or other shape."

US11099569B2: Systems and methods for prioritizing object prediction -- covers determining priority classifications for perceived objects and predicting future states in priority order (i.e., allocating more compute to high-priority objects).

Patent Portfolio Composition

  • ~58% hardware (predominantly FMCW LiDAR from Blackmore acquisition): 19 patent families covering beam steering, chirp ranging, transceiver design
  • ~42% software: ML perception, planning, simulation
  • Filing concentrated in US Patent Office (~100% of filings)
  • Uber ATG patents (acquired 2020) are cataloged separately and significantly expand the total portfolio

Sources: Aurora 10-K Filing, GreyB Analysis


14. Perception Team Structure

Leadership

  • Drew Bagnell — Co-Founder & Chief Scientist. Former Head of Perception and Autonomy Architect at Uber ATG (2015-2016), Consulting Professor at CMU Robotics Institute and Machine Learning Department. 43,000+ citations. Invented MaxEnt IRL, DAgger, LEARCH. Won the 2024 AAAI Classic Paper Award. Over 25 years at the intersection of ML, robotics, and autonomous vehicles.

Organizational Philosophy

Aurora explicitly rejects the model of separated research and engineering teams:

  • The #CarryOurWater tenet means "all teams share tool-building responsibility; no separate research team"
  • Engineers work in "pods that are effective, because they have clear targets and the freedom to choose the AI techniques that most expediently improve those targets"
  • This means perception researchers write production C++ code, and production engineers participate in ML research

Team Scale

As of available data, Aurora employs approximately 463 total employees with 215 dedicated engineering professionals. The perception team is one of several engineering organizations, alongside planning, simulation, hardware, infrastructure, and fleet operations.

Key Perception Sub-Teams (from Job Postings)

  • Perception - Detection: Builds ML models for object detection (LaserNet, SpotNet descendants)
  • Perception - Tracking: State estimation, EKF implementation, 3D object tracking (S2A)
  • Perception - Evaluation: Measures and validates perception quality against ground truth
  • Perception - Data Curation: Manages training data selection, mining, and quality
  • Perception - Sensor Fusion: Multi-modal fusion architectures
  • Perception - Calibration: Online and offline sensor calibration systems

Compensation Range

Technical Lead Manager, Perception positions command $189K - $302K base salary. Staff Software Engineer - Data Curation roles command $189K - $303K.

Sources: Leadership Team, Aurora Careers, Engineering Tenets


15. Data Engine — Detailed Architecture

End-to-End Pipeline

Aurora's data engine operates as a cyclical ML lifecycle:

Identify data requirements
    |
    v
Mine on-road sensor data (petabytes in S3)
    |
    v
Label data (manual + auto-labeling)
    |
    v
Generate datasets (S2T pipeline)
    |
    v
Train ML models (SageMaker + PyTorch)
    |
    v
Evaluate at subsystem + system level
    |
    v
Deploy to fleet (within 2 weeks of new data)
    |
    v
Collect new driving data --> feedback loop

Three-Layer Infrastructure

  1. Build Layer: Docker images and artifacts constructed via Buildkite CI/CD
  2. ML Orchestration Layer: Kubeflow Pipelines on EKS (installed via Terraform)
  3. Compute Layer: Amazon SageMaker (distributed training) + Aurora's internal Batch API (dataset generation, evaluation)

S2T (Sensor-to-Tensor) Pipeline

The S2T pipeline is Aurora's core perception preprocessing pipeline and was the first ML pipeline built at Aurora:

  • Converts raw sensor data into tensor representations suitable for neural network consumption
  • Includes voxelization, range view projection, camera preprocessing, and map rasterization
  • Before Kubeflow automation: took weeks of ML engineer time to run
  • After automation: takes one week with zero manual intervention
  • Updated S2T models land at minimum bi-monthly

Training Infrastructure

  • Framework: PyTorch (primary), with TensorFlow and JAX also supported
  • Distributed training: Amazon SageMaker with 50+ training configuration parameters passed via SageMaker hyperparameters to an internal "Training Main Wrapper"
  • Training visualization: Custom TensorBoard component in Kubeflow
  • GPU: NVIDIA A10 GPUs used for SpotNet inference; training likely uses P4d instances (A100 GPUs)

Custom Kubeflow Components

Aurora built domain-specific Kubeflow components:

  • SageMaker wrapper: Launches distributed training jobs
  • Batch API integration: Runs dataset generation and metrics evaluation on Aurora's internal compute
  • GitHub component: Creates PRs and posts comments for model integration
  • Slack component: Sends notifications (exit handler for all pipelines)
  • TensorBoard component: Spins up training visualization instances
  • Components use a factory method pattern allowing compile-time parameter overrides

Pipeline Launch Methods

  • CLI: Local experimentation
  • PR Commands: /kubeflow train --model [name] --training_type [core, deploy, integration]
  • CI: Automated health monitoring
  • CD: Automated end-to-end production deployment workflows

Continuous Deployment

New datasets and models can be deployed within two weeks of new data availability. The CD system runs automated end-to-end deployment workflows including dataset generation, training, and evaluation without manual intervention.

Sources: Data Engine, Kubeflow Summit 2022 Notes


16. Labeling Infrastructure

Manual Labeling

Aurora employs human specialists who:

  • Review driving log footage
  • Label objects with properties: object category, velocity, position, heading, dimensions
  • Create ground truth annotations for perception evaluation
  • Annotate complex scenes where "each actor (vehicles, pedestrians, bicyclists) has been carefully annotated by human experts"

Auto-Labeling and Map Generation

The Atlas mapping system demonstrates Aurora's auto-labeling capabilities:

  • After a single manual drive, cloud-based algorithms "generate semantic components" with "little to no human assistance"
  • ML models automatically extract lane boundaries, signs, signals from raw sensor data
  • Human annotators review and correct auto-generated annotations

Data Mining for Gap Identification

The data engine starts by "identifying the type of data required to support or improve an autonomous vehicle capability." For example:

  • To detect emergency vehicles: "the perception system needs lots of sensor data of emergency vehicles in different situations"
  • Data mining algorithms search the petabyte-scale sensor data archive for relevant scenarios
  • This enables targeted data collection rather than random sampling

Online-to-Offline (O2O) Pipeline

Real-world events are systematically converted to training data:

  1. Copilot annotations: Vehicle operators flag "interesting, uncommon, or novel" experiences
  2. Disengagements: Instances where operators retook control
  3. Triage: Specialist team reviews and diagnoses significance
  4. Test creation: Single events generate multiple test types (perception tests, trajectory evaluations, simulations)
  5. Amplification: A single real-world event generates 50+ simulation variations by modifying parameters (speed, distance, object size)

Example: From one nudging event around a stopped vehicle, Aurora created "50 new nudging simulations" and has "practiced nudging more than 20 million times in simulation."

Sources: Virtual Testing, O2O


17. Perception Metrics

Autonomous Perception Indicator (API)

Aurora's primary fleet-level perception metric is the Autonomous Perception Indicator (API):

  • Tracks the percentage of total commercially-representative miles where the truck does not require human assistance
  • "Support" means assistance via a local vehicle (not remote assistance)
  • Aurora does not expect API to ever reach 100% at aggregate level (flat tires and similar events always require on-site support)
  • The percentage of 100% API loads is the key progress indicator
  • Q1 2025: 95% of loads running production release software had 100% API -- exceeding the Commercial Launch target of ~90%

Perception Testing Metrics

At the subsystem level, perception is evaluated through two approaches:

Broad performance metrics:

  • "How many pedestrians does it correctly identify?" (recall)
  • "How many false alarms does the system generate?" (precision)
  • These are evaluated across the full labeled dataset

Specific capability tests:

  • "Does the perception system see this bicyclist right next to our vehicle?" (per-scenario recall)
  • Targeted evaluation of individual detection scenarios

Safety Case Integration

Perception metrics feed into the Safety Case Framework:

  • ~10,000 requirements include perception-specific requirements
  • 4.5 million tests are run before each software release, including perception-specific tests
  • Perception capabilities must be "appropriate for the operating environment" (Operational Design Domain)
  • The Autonomy Readiness Measure (ARM) was 84% complete by Q3 2024, reaching completion for commercial launch in April 2025

Recall vs Precision Tradeoffs

Aurora explicitly discusses the safety implications:

  • Low recall = "might not see something important, which could be a serious safety concern" (missed detections)
  • Low precision = "passengers might be subjected to unnecessary maneuvers that make the ride uncomfortable and also present safety concerns" (false positives causing unnecessary braking/swerving)

Sources: Q1 2025 Shareholder Letter, Virtual Testing, Building for Scale


18. Hardware-Software Co-Design

First Principles Design Process

Aurora's hardware-software co-design follows a structured process:

  1. Boundary scenario identification: The perception team identifies edge cases the vehicle must handle safely
  2. Sensing requirement derivation: Boundary scenarios determine required detection range, resolution, field of view, and update rate
  3. First principles physics: Stopping distances at highway speeds (65-80 mph for Class 8 trucks at 80,000 lbs), road curvature visibility requirements, object minimum detectable size at maximum range
  4. Sensor configuration simulation: ~20,000 hardware design scenarios evaluated in simulation before manufacturing prototypes (using Colrspace/Protocolr for sensor-realistic rendering)
  5. Perception expert evaluation: Hardware configurations evaluated by perception engineers against detection requirements
  6. Iterative refinement: Hardware and software teams jointly optimize

The Key Question

Aurora's hardware team asks: "Does a single pixel 300 meters away provide enough data?" This drives camera resolution requirements, lens specifications, and sensor placement decisions.

Hardware Generation Evolution

AspectGen 1 (Current)Gen 2 (Mid-2026)Gen 3 (2027)
ManufacturerAurora (internal)FabrinetAUMOVIO (formerly Continental)
Compute5,400 TOPSEnhancedDual NVIDIA DRIVE Thor (~2,000 TFLOPS, Blackwell architecture)
LiDAR range450m+1,000m (2x improvement)1,000m+
Hardware costBaseline50%+ reductionFurther reduction
Durability~300,000 miles1 million miles (3x improvement)Tens of thousands of trucks
Production scaleSmall fleetUp to ~1,500 unitsTens of thousands
Safety standardInternalEnhancedISO 26262 ASIL-D
Backup systemDual-computerDual-computerContinental/AUMOVIO specialized independent secondary system

Compute Architecture for Perception

The Aurora Driver computer uses:

  • Enterprise-class server architecture (not automotive-grade SoCs in Gen 1)
  • Processors designed for ML acceleration and camera signal processing
  • Custom TSN networking switch with an advanced networking chip synchronizing all sensors to microsecond precision
  • Self-sufficient hub design: conditions its own power, coordinates its own sensors, communicates with the vehicle over "a simple umbilical"
  • Vehicle-agnostic: Same compute platform operates across sedans (Toyota Sienna) and Class 8 trucks (Peterbilt, Volvo, International)

NVIDIA DRIVE Thor Integration (Gen 3)

  • Dual DRIVE Thor SoC configuration running DriveOS
  • Built on NVIDIA Blackwell architecture
  • 1,000 TFLOPS per SoC (~2,000 TFLOPS total)
  • Designed to "accelerate inference tasks critical for autonomous vehicles to understand and navigate the world around them"
  • Production samples delivered in first half of 2025
  • AUMOVIO will mass-manufacture starting 2027

Perception Compute Efficiency

In Q4 2025 earnings materials, Aurora disclosed that a "new vision architecture" within its perception system "enables the Aurora Driver to perceive its surroundings more efficiently, reducing compute requirements." This model advancement "delivers superior performance at a lower cost than their previous robust perception model." This suggests that as compute budgets shrink with Gen 2/3 hardware cost targets, Aurora is simultaneously making perception models more efficient -- a critical co-design constraint where hardware cost reduction must not sacrifice perception quality.

Sensor Simulation for Hardware Design

Aurora's Colrspace acquisition (Pixar veterans) provides sensor-realistic simulation for hardware co-design:

  • Protocolr technology uses neural networks for inverse rendering: given an image, it infers texture, specularity, roughness, and sheen of surfaces
  • Combined with a differentiable image renderer to produce sensor-realistic synthetic data
  • Simulates camera, conventional LiDAR, and FMCW LiDAR
  • Evaluated ~20,000 hardware design scenarios before manufacturing prototypes
  • Enables perception teams to test detection algorithms against proposed sensor configurations before physical hardware exists

Sources: Product Design from First Principles, Meet Fusion, NVIDIA Partnership, Colrspace Acquisition, Q4 2025 Earnings


19. Uber ATG Perception Heritage — Complete Research Lineage

Aurora inherited a world-class perception research portfolio from the Uber ATG acquisition (January 2021). This section documents the complete lineage and how each paper contributed to Aurora's production perception stack.

Detection Lineage

PaperVenueKey InnovationProduction Legacy
PIXORCVPR 2018Single-stage BEV oriented box detectorEstablished BEV detection paradigm
LaserNetCVPR 2019Range-view per-point multimodal detection with Laplace uncertaintyCore range-view detection architecture
LaserNet++CVPRW 2019Added 6-class semantic segmentation + camera fusionRoad surface estimation
HDNETCoRL 2018HD map as detection prior + online map predictionMap-less fallback capability
SpotNetarXiv 2024Image-centric LiDAR-anchored long-range detectionLong-range perception system

Sensor Fusion Lineage

PaperVenueKey InnovationProduction Legacy
ContFuseECCV 2018Continuous convolution for LiDAR-camera fusionPoint-wise cross-modal feature alignment
MMFCVPR 2019Ground estimation + depth completion as auxiliary tasksGround plane estimation
Multi-View FusionWACV 2022BEV+RV+Camera fusion with 7-channel mapCore fusion architecture
RV-FuseNetarXiv 2020Range-view temporal fusionTemporal feature integration
MVFuseNetCVPR 2021 WSMulti-view (RV+BEV) joint detection and forecastingCombined view architecture

Tracking and Prediction Lineage

PaperVenueKey InnovationProduction Legacy
LaserFlowICRA 2022 (Best Paper)Joint detection + motion forecasting in range viewUnified detection-prediction in RV
IntentNetCoRL 2018Joint detection + intention + trajectory predictionIntent classification
Fast and FuriousCVPR 2018Joint detection + tracking + forecasting with temporal 3D convEnd-to-end temporal reasoning
PnPNetCVPR 2020End-to-end perception + prediction with tracking-in-the-loopTracking architecture (LSTM + Hungarian)
SpAGNNUber ATGSpatially-aware GNN for multi-agent interactionInteraction modeling
LaneGCNECCV 2020 (Oral)Lane graph convolutions for motion forecastingLane-aware prediction
LookOutICCV 2021Diverse multi-future prediction + planning (CVAE)Conditional forecasting

Simulation and Data Lineage

PaperVenueKey InnovationProduction Legacy
LiDARsimCVPR 2020Realistic LiDAR simulation from real-world 3D meshesPerception simulation
MP3CVPR 2021Map-less driving with online BEV mapping + occupancy flowOnline mapping capability
OSISarXiv 2019Open-set instance segmentation for unknown objectsRemainder Explainer foundation

Key Researchers from Uber ATG (Now at Aurora or Moved On)

Many of the key researchers from the Uber ATG papers have continued in autonomous driving:

  • Raquel Urtasun (Uber ATG Chief Scientist) -- founded Waabi
  • Ming Liang, Bin Yang, Shenlong Wang -- co-authors on ContFuse, PIXOR, IntentNet, PnPNet, and other foundational papers
  • Carlos Vallespi-Gonzalez -- co-author on Multi-View Fusion (WACV 2022), one of the few post-acquisition Aurora publications
  • Nemanja Djuric -- co-author on Multi-View Fusion, continued at Aurora

The Aurora Multi-Sensor Dataset itself was captured between January 2017 and February 2018 in Pittsburgh, PA by Uber ATG before the acquisition.


20. Aurora Multi-Sensor Dataset (MSDS)

Dataset Specifications

PropertyValue
LiDAR64-beam Velodyne HDL-64E
Cameras7 cameras at 1920x1200 pixels (forward stereo pair + 5 wide-angle, 360-degree coverage)
Geographic areaPittsburgh, PA metropolitan area
Collection periodJanuary 2017 -- February 2018
ConditionsAll four seasons, rain, snow, overcast, sunny, different times of day, varying traffic
AnnotationsWeather labels, semantic segmentation, highly accurate localization ground truth
LicenseCC BY-NC-SA 4.0 (non-commercial academic use only)
HostingAWS Registry of Open Data

Intended Research Applications

  • Autonomous vehicle localization approaches
  • 3D reconstruction
  • Virtual tourism
  • HD map construction
  • Map compression algorithms

Limitations

The MSDS uses a Velodyne HDL-64E (ToF LiDAR), not Aurora's FirstLight FMCW LiDAR. This means the dataset lacks instantaneous Doppler velocity data. The dataset predates Aurora's FMCW integration and represents Uber ATG-era sensor configurations.

Source: Aurora MSDS on AWS


21. Verifiable AI and Perception Debugging

Compound AI System Architecture

Aurora's "Verifiable AI" approach structures the perception-to-planning pipeline as a compound system (not end-to-end):

  • Each module (perception, prediction, planning) produces semantically meaningful intermediate representations
  • These intermediates are inspectable: "What was the state of the light? What were other actors likely to do?"
  • When failures occur, engineers can "introspect what went wrong and dig many steps deeper to identify the root of the problem"
  • Incorrect predictions can be identified and the specific failure mode covered, rather than retraining entire end-to-end models

Invariant-Based Safety Layer

Hard safety constraints are encoded as invariants rather than learned behaviors:

  • "Don't depart the roadway" -- requires perception to output road boundary estimates
  • "Stop at red lights" -- requires perception to output traffic signal states
  • "Yield for emergency vehicles" -- requires emergency vehicle detection
  • These invariants constrain planning, but perception must provide the semantic inputs they depend on

Graph Neural Networks and Transformers in the Stack

Aurora confirms using:

  • Graph Neural Networks: For modeling multi-agent interactions in prediction (LaneGCN/SpAGNN heritage)
  • Transformer-style architectures: "On the road since 2021" -- likely used in both perception (attention-based feature fusion) and prediction (attention-based interaction modeling)
  • These are described as "generative AI approaches to self-driving that we pioneered"

Perception Debugging Workflow

The Lightbox visualization system enables perception debugging:

  • Web-based 3D scene visualization
  • Customizable views for Perception, Planning, Triage, and Data Science teams
  • Built on XVIZ (Aurora's open-source C++ visualization API)
  • Shows sensor data, perception outputs, mapping, simulation data, and planning decisions
  • Tracks objects "over time, with varying levels of granularity"

Sources: AI Transparency, Verifiable AI, Lightbox


22. Sensor Simulation for Perception Development

Colrspace/Protocolr Technology

Aurora's sensor simulation (from the 2022 Colrspace acquisition) enables perception development without physical hardware:

Inverse Rendering Pipeline:

  1. Input: Real-world photograph of a surface
  2. Neural network infers material properties: overall color, specularity, roughness, sheen
  3. Differentiable image renderer produces synthetic camera images that are nearly indistinguishable from real photos
  4. The same material models drive LiDAR simulation (return intensity depends on material properties) and FMCW simulation (Doppler signatures from surface motion)

LiDARsim (CVPR 2020) provides the LiDAR simulation foundation:

  • Builds a 3D mesh catalog from fleet driving data (static maps + dynamic objects)
  • Ray casting over 3D scenes produces physics-based LiDAR returns
  • A neural network produces deviations from physics-based simulation, adding realistic effects:
    • Dropped points on transparent surfaces (windshields)
    • High intensity returns on reflective materials (license plates)
    • Beam divergence effects at long range
  • Enables testing on long-tail events and safety-critical scenarios without real-world occurrence

Scale of Use:

  • ~20,000 hardware design scenarios evaluated before manufacturing prototypes
  • Enables perception teams to validate detection algorithms against proposed sensor configurations before physical hardware exists

Sources: Colrspace Acquisition, LiDARsim (CVPR 2020)


23. Operational Perception Performance (as of March 2026)

Fleet-Level Statistics

  • 250,000+ driverless miles with zero Aurora Driver-attributed collisions
  • 10 routes across Texas, New Mexico, and Arizona
  • 100% on-time performance
  • 95% of loads achieved 100% API (Autonomous Perception Indicator) on production software
  • Over 4 million tests run before each software release (latest validation)

Perception Capability Milestones

CapabilityValidation DateKey Perception Requirement
Daytime highway drivingApril 2025 (commercial launch)Full sensor suite operational
Night drivingJuly 2025FMCW LiDAR primary; cameras degraded
Rain and fog drivingJanuary 2026Imaging radar primary; LiDAR/camera degraded
Heavy wind drivingJanuary 2026Control adaptation for wind; perception unchanged
10-route expansionFebruary 2026Atlas map coverage + perception generalization

Impact of Weather Validation

Before rain validation: inclement weather constrained operations ~40% of the time in Texas. After January 2026 software release: "step-change in potential uptime," enabling near-continuous driverless operations across the Sun Belt.

Sources: Aurora Triples Network, Q4 2025 Earnings


Appendix A: Complete Published Paper Bibliography

Aurora Innovation (Post-2021)

  1. SpotNet: An Image Centric, Lidar Anchored Approach To Long Range Perception. arXiv:2405.15843, May 2024.
  2. Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving. WACV 2022. (Fadadu, Pandey, Hegde, Shi, Chou, Djuric, Vallespi-Gonzalez)

Uber ATG Heritage (Pre-2021, Now Aurora IP)

  1. PIXOR: Real-time 3D Object Detection from Point Clouds. CVPR 2018.
  2. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting. CVPR 2018.
  3. Deep Continuous Fusion for Multi-Sensor 3D Object Detection (ContFuse). ECCV 2018.
  4. HDNET: Exploiting HD Maps for 3D Object Detection. CoRL 2018.
  5. IntentNet: Learning to Predict Intention from Raw Sensor Data. CoRL 2018.
  6. LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving. CVPR 2019.
  7. LaserNet++: Multi-task Learning for Joint Semantic Segmentation and 3D Object Detection. CVPRW 2019.
  8. Multi-Task Multi-Sensor Fusion for 3D Object Detection (MMF). CVPR 2019.
  9. Identifying Unknown Instances for Autonomous Driving (OSIS). arXiv:1910.11296, 2019.
  10. LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World. CVPR 2020.
  11. PnPNet: End-to-End Perception and Prediction with Tracking in the Loop. CVPR 2020.
  12. Learning Lane Graph Representations for Motion Forecasting (LaneGCN). ECCV 2020 (Oral).
  13. RV-FuseNet: Range View Fusion for Joint Detection and Motion Forecasting. arXiv 2020.
  14. LaserFlow: Efficient and Probabilistic Object Detection and Motion Forecasting. RA-L 2021 / ICRA 2022 (Best Paper).
  15. MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion. CVPR 2021 Workshop.
  16. MP3: A Unified Model to Map, Perceive, Predict and Plan. CVPR 2021.
  17. LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving. ICCV 2021.
  18. SpAGNN: Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data. Uber ATG.

Patent NumberTitleKey Innovation
US10310087B2Range-view LIDAR-based object detectionPer-cell class + instance center + bounding box in range view
US10579063B2ML for predicting locations of objectsBounding polygon footprint; static object classifier
US10598791B2Object detection based on LiDAR intensityIntensity-based classification
US10967862B2Road anomaly detectionIMU-based pothole/crack detection
US11029395B1Systems for pulsed-wave LIDARAdvanced LIDAR pulse systems
US11099569B2Prioritizing object predictionPriority-ordered prediction compute allocation
US11217012B2Travel way feature identification2D/3D fusion with spatio-temporal memory
US11262437B1Mirror Doppler compensationTemplate-matching for scanning mirror Doppler
US11366200FMCW scanning correctionsScanning artifact correction
US11521396B1Probabilistic prediction of dynamic objectsProbabilistic occupancy distributions
US11531346B2Goal-directed occupancy predictionLane-based 1D occupancy cells with GCN
US11550061B2Phase coherent LIDAR classificationFMCW per-point velocity for static/dynamic separation
US11561548B2Basis path generationMotion planning trajectory generation
US11933901Bistatic transceiver LIDARMulti-receiver FMCW for angular resolution
US12051001B2Multi-task multi-sensor fusion3D boxes + ground geometry estimation
US20190317219A1Phase-coherent perceptionClass-adaptive velocity aggregation
US20200234110Dynamic dropout for neural network robustnessAdversarial robustness training
US20200400821A1Doppler LIDAR odometry and mappingEgo-motion from single FMCW sweep
US20210096253A1Complementary simultaneous chirpDual-laser FMCW for faster acquisition
US20230057509A1Vision-based ML with adjustable virtual cameraCamera-based perception with synthetic viewpoints

Document compiled March 2026. All information sourced from publicly available Aurora blog posts, published academic papers, granted patents, SEC filings, earnings call transcripts, investor presentations, job postings, and third-party analysis.

Public research notes collected from public sources.