Aurora Innovation — Perception Stack Supplementary Deep Dive

This document supplements the existing aurora-av-tech-stack.md with additional technical details gathered from Aurora blog posts, published papers, patents, job postings, earnings calls, investor presentations, and conference talks. It focuses exclusively on information NOT already covered in the base document.

1. S2A (Sensor-to-Adjustment) Deep Dive

Architectural Positioning

S2A is not a standalone detector -- it is the tracking and refinement stage that sits downstream of initial detection. While detectors like LaserNet, LaserFlow, and SpotNet produce initial object hypotheses, S2A takes over once an object enters the tracking pipeline and continuously refines its state estimate by going back to raw sensor data on every cycle.

Per-Object Crop-and-Refine Mechanism

S2A operates on localized sensor crops centered on each tracked object. The mechanism works as follows:

Crop extraction: For each tracked object, the system extracts a local region of raw LiDAR points, radar returns, and camera pixels from the ego-centric sensor representation, centered on the object's last known position
Multi-modal input: The crop includes data from all available sensor modalities (LiDAR range view, LiDAR BEV, camera image patches, radar detections), not processed upstream features
Neural refinement: A neural network processes these raw sensor crops to produce refined position, velocity, orientation, and estimated future positions
State update: The refined estimates are fed back into the tracking state, which uses classical Extended Kalman Filter (EKF) state estimation to maintain temporally consistent tracks

Execution Model

Runs thousands of times per second across every tracked object simultaneously
Unlike human attention (which can focus on 3-5 objects), S2A maintains independent parallel processing channels for all objects in the scene
The "adjustment" in the name refers to the continuous state refinement -- the system does not re-detect objects from scratch each cycle but adjusts existing track states using fresh sensor evidence

Relationship to Classical Tracking

S2A represents a hybrid architecture where:

Neural networks handle the perceptual problem of extracting object state from raw sensor data
EKF handles the estimation theory problem of fusing observations over time with motion models
Job postings for "Senior Perception Software Engineer - Tracking" explicitly require "extensive experience in state estimation, Kalman Filter implementation, and 3D object tracking," confirming the classical-neural hybrid

Training Methodology (Inferred)

Based on Aurora's published data engine workflow and the PnPNet paper (which describes a closely related architecture), S2A likely trains:

End-to-end through the tracking loop rather than with teacher forcing (PnPNet demonstrated this approach, showing that using the model's own tracking outputs during training, rather than ground truth, improves generalization)
With multi-task losses combining detection accuracy, tracking consistency, and prediction quality
Using Aurora's automated data engine, which deploys new training data within two weeks of collection

Sensor Dropout Training

Aurora's system is explicitly "trained to be robust to hardware issues where individual or multiple sensors may be lost, with perception continuing to work even in such degraded conditions." This strongly suggests S2A training includes random sensor modality dropout during training -- randomly zeroing out LiDAR, camera, or radar input channels to force the network to learn redundant representations.

Patent US20200234110 describes a related technique: a "dynamic dropout routine" that determines "a dynamic dropout probability distribution associated with neurons of a neural network" to "help neurons learn distinguishable features."

Sources: Seeing with Superhuman Clarity, Senior Perception Engineer - Tracking Job Posting, PnPNet (CVPR 2020)

2. Remainder Explainer Deep Dive

Core Philosophy: Explain Every Measurement

The Remainder Explainer embodies the principle that "all of our sensor measurements have an explanation." After Mainline Perception claims measurements belonging to known object categories (vehicles, pedestrians, cyclists), the Remainder Explainer processes every remaining unexplained sensor return.

How Unknown Objects Get Avoidance Scores

The scoring pipeline works in stages:

Unexplained measurement collection: All sensor returns (LiDAR points, radar detections) not assigned to known-category tracks by Mainline Perception are collected
Instance grouping: These measurements are clustered into coherent physical objects using category-agnostic instance segmentation, likely descended from the OSIS (Open-Set Instance Segmentation) paper from Uber ATG
Generic object tagging: Each cluster is tagged as "one or more generic objects" -- the system does not attempt classification
Motion tracking: Moving generic objects are tracked using state estimation, predicting their trajectories
Avoidance score assignment: An ML-based score is computed based on observable physical properties

Features Driving the Avoidance Score

Based on Aurora's public descriptions and the OSIS heritage, the avoidance score likely incorporates:

Physical dimensions: Larger objects receive higher avoidance scores (a full trailer fragment vs. a small debris piece)
Material reflectivity / LiDAR return intensity: FMCW LiDAR provides calibrated return intensity; highly reflective or metallic objects may score differently than soft/organic materials
Motion patterns: Moving objects (non-zero Doppler velocity) score higher than stationary ones; approach velocity toward the ego vehicle further increases the score
Persistence: Objects that maintain consistent measurements over multiple frames score higher than transient noise
Height and vertical extent: Objects at road surface level vs. elevated objects (e.g., overhead signs)

Relationship to OSIS Paper

The OSIS paper (arXiv:1910.11296) from Uber ATG is almost certainly the research foundation:

OSIS uses a category-agnostic embedding network to project LiDAR points from the same instance close together in embedding space
DBSCAN clustering on these embeddings groups points into instances, handling both known and unknown categories
The "open-set inference procedure" first associates points with known-class prototypical features, then clusters remaining points into new unknown-class instances
BEV resolution: 0.15625m over a 160x160x5m region
This two-phase approach (known-class association then unknown-class clustering) maps directly to the Mainline Perception / Remainder Explainer split

Real-World Validation

The Remainder Explainer has demonstrated detection of objects never seen in training, including:

A charred trailer fragment on the highway
Dropped cargo debris
Detached tires
Livestock on roadways

The system reports each unknown object's "current and estimated future position to the motion planner, which instructs the vehicle to stop" or take avoidance action.

Sources: Perception: No Measurement Left Behind, OSIS (arXiv:1910.11296)

3. FMCW LiDAR Perception Advantages — Additional Details

Instantaneous Doppler in the Perception Pipeline

Beyond static/dynamic separation (already documented), FMCW Doppler velocity feeds into perception in several additional ways:

Ego-Motion-Only Point Cloud Preprocessing (Patent US20200400821A1):

Because every point carries instantaneous radial velocity, the system can estimate full 3D ego-motion (translational and rotational) from a single LiDAR sweep without IMU
Translational velocity: least-squares minimization across all point radial velocities given their azimuth/inclination
Rotational velocity: fits velocity-vs-lever-arm relationship across multiple sensor origins
Bidirectional scan averaging cancels acceleration artifacts
This provides IMU-independent motion compensation, a unique redundancy advantage

Class-Adaptive Velocity Aggregation (Patent US20190317219A1): The system selects different velocity estimation methods per object class:

Pedestrians: Mean aggregation of per-point velocities (simpler motion model)
Vehicles: Histogram analysis -- bin velocities by range, select modal bin (handles articulated vehicles with different part velocities)
Median aggregation: Used as robust fallback when data quality is uncertain
Selection depends on object classification, point count, and proximity

Night Perception Advantage: FMCW's coherent detection is single-photon sensitive -- it operates at the quantum noise limit regardless of ambient illumination. At night, when cameras are severely degraded, FirstLight LiDAR maintains full performance. The system detects pedestrians "over 300 meters away at night, before they would have been visible to the naked eye," providing approximately 9 seconds of reaction time at highway speeds.

1550nm Eye Safety Advantage: The 1550nm wavelength is absorbed by the cornea rather than focused on the retina, allowing approximately 10x higher transmit power than 905nm ToF systems while remaining eye-safe. This directly translates to longer range (currently 450m+, Gen 2 targeting 1,000m).

Sources: US20200400821A1, US20190317219A1, Detecting a Pedestrian at Night

4. SpotNet — Additional Architecture Details

Backbone Architecture

SpotNet uses VoVNetV2 as its feature extractor:

Stem: Two fully convolutional layers (32 and 64 dimensions, 7x7 and 3x3 kernels), first layer applies 2x downsampling
Body: Three downsampling stages followed by three upsampling stages
Feature output at H/2 x W/2 resolution

RGB-D Input Construction

The 5-channel input tensor is constructed by:

Processing the RGB image at native resolution (2MP for training, 8MP for inference)
Projecting sparse LiDAR points into image space using z-buffering (masking occluded points)
Creating a 2-channel sparse depth raster: (a) Euclidean distance from camera, (b) binary sentinel indicating valid LiDAR returns
Concatenating depth channels with RGB to form the 5-channel input
Depth raster is resized and concatenated with feature maps at multiple stages using nearest-neighbor sampling

LiDAR Anchoring — Why It Matters

Rather than regressing absolute 3D distances (which is extremely difficult at long range), SpotNet anchors detections on LiDAR points:

Losses are applied only on pixels corresponding to valid LiDAR point projections (sparse supervision)
For each LiDAR point on an object, the network predicts:
- 2D offsets to 2D box center
- 3D offsets to projected 3D centroid
- Distance delta along camera ray: delta_d = dot(ray_unit_vector, displacement_to_centroid)
- Bearing-relative orientation (cos theta, sin theta)
- 3D extents (width, length, height)

The single most impactful design choice is joint 2D and 3D supervision:

Configuration	VRU AP @100-200m	Vehicle AP @100-200m
3D only supervision	27.9	59.3
3D + projected 3D labels	33.4	58.4
3D + true 2D labels	50.6	71.7

Human-labeled 2D bounding boxes substantially outperform projected 3D labels for 2D supervision. This is a key insight: at long range, precise 3D annotation is difficult, but 2D annotation is reliable.

2MP-to-8MP Transfer Technique

SpotNet trains on 2MP images but deploys on 8MP without retraining:

During training at 2MP: apply 50% point-wise dropout on LiDAR projections
During 8MP inference: remove dropout (4x more pixels means each pixel has 4x fewer LiDAR points, but without dropout the density matches training)
Rescale depth values by 0.5 for consistent depth distribution per unit area
Undo range rescaling in post-processing

This produces strong gains: vehicles at 400-500m improve from 38.2% AP (2MP) to 55.5% AP (8MP).

Post-Processing Pipeline

Query class heatmap to identify foreground LiDAR points
Decode 2D bounding boxes, apply 2D NMS (IoU 0.5 threshold)
Decode 3D bounding boxes on remaining detections
Apply BEV NMS (IoU 0.2 threshold)

Both NMS stages are essential; ablations show significant performance drops when either is removed.

Dataset Scale

Aurora's Long Range Dataset for SpotNet:

43,500 five-second training snippets at 10Hz
4,000 validation snippets
30-degree FOV long-range camera at 8MP native resolution
FMCW LiDAR with 400m+ range
Evaluation conducted from 100-500m range

Computational Scaling

SpotNet achieves O(1) compute complexity with range, compared to O(r^2) for BEV methods. On an NVIDIA A10 GPU, inference time remains constant regardless of detection range, while BEV approaches show runtime increasing with distance.

Source: SpotNet (arXiv:2405.15843)

5. Multi-View Fusion — Evolution and Architecture

WACV 2022 Paper Architecture

The Multi-View Fusion paper (Fadadu et al., WACV 2022) describes the published version of Aurora's sensor fusion:

Authors: Sudeep Fadadu, Shreyash Pandey, Darshan Hegde, Yi Shi, Fang-Chieh Chou, Nemanja Djuric, Carlos Vallespi-Gonzalez

Input Representation:

BEV voxels: LiDAR + RADAR + HD Map, with 10 historical sweeps stacked (T=10)
BEV voxel resolution: deltaL=0.16m, deltaW=0.16m, deltaV=0.2m
Range View: sensor-native spherical discretization of LiDAR
Image View: 2D camera features
HD Map: rasterized as 7 binary mask channels (driving paths, crosswalks, lane boundaries, road boundaries, intersections, driveways, parking lots)

Architecture:

Parallel CNN branches extract features from each view independently
Camera and Range View features are fused via feature projection
All features converge into the BEV frame for final processing
Joint detection and prediction heads output 3D boxes with future trajectory waypoints

Evolution Beyond WACV 2022

Aurora's production system has evolved significantly since this publication:

New vision architecture (2025): Aurora disclosed in Q4 2025 earnings materials that they are "leveraging a new vision architecture within its perception system that enables the Aurora Driver to perceive its surroundings more efficiently, reducing compute requirements. This model advancement delivers superior performance at a lower cost than their previous robust perception model." This suggests a more efficient backbone (potentially transformer-based given Aurora's stated use of "transformer-style models on the road since 2021") replacing or augmenting the original CNN-based multi-view fusion.
Sensor modality robustness: The production system is "trained to be robust to hardware issues where individual or multiple sensors may be lost, with perception continuing to work even in such degraded conditions" -- a capability not discussed in the WACV 2022 paper.
SpotNet integration: The long-range detection system likely runs in parallel with the multi-view fusion pipeline, with SpotNet handling the 100-500m range and the multi-view system handling closer ranges where BEV resolution is adequate.

MVFuseNet and RV-FuseNet Lineage

The WACV 2022 paper builds on two workshop papers from Aurora/Uber ATG:

RV-FuseNet (arXiv 2020): Range-view temporal fusion for joint detection and trajectory estimation using panoramic range images
MVFuseNet (CVPR 2021 Workshop): Multi-view fusion combining Range View and BEV for joint detection and motion forecasting

The production system likely incorporates ideas from all three papers, with the BEV+RV+Camera fusion architecture serving as the backbone.

Sources: Multi-View Fusion (WACV 2022), Q4 2025 Earnings

6. Prediction Architecture Deep Dive

LaneGCN Heritage

Aurora's motion prediction system descends directly from LaneGCN (ECCV 2020 Oral), developed at Uber ATG before the acquisition. The architecture has four modules:

Module 1 — ActorNet:

1D CNN extracts temporal features from observed actor trajectories
Feature Pyramid Network merges multi-scale temporal features
Produces a per-actor feature vector encoding motion history

Module 2 — MapNet (the Lane Graph Network):

Constructs a lane graph from HD map data with 4 connectivity types: predecessor, successor, left neighbor, right neighbor
LaneConv operator: Extends graph convolutions with multiple adjacency matrices and along-lane dilation
Stack of 4 multi-scale LaneConv residual blocks, each with dilation factors (1, 2, 4, 8, 16, 32)
All layers use 128 feature channels
Captures long-range lane topology dependencies (critical for highway scenarios where decisions depend on lane structure hundreds of meters ahead)

Module 3 — Actor-Map Fusion Cycle: A stack of 4 fusion networks with four interaction types:

A2L (Actor-to-Lane): Actors attend to nearby lane features
L2L (Lane-to-Lane): Lane features propagate through the lane graph
L2A (Lane-to-Actor): Lane context flows back to actor representations
A2A (Actor-to-Actor): Actors attend to each other (social interactions)

Module 4 — Prediction Header:

Outputs multi-modal trajectory predictions with associated probabilities
Each mode represents a distinct behavioral intent (e.g., lane-keep, lane-change left, lane-change right)

SpAGNN: Spatially-Aware Interaction Modeling

SpAGNN (also from Uber ATG) adds explicit spatial reasoning:

For each agent, positions of all other agents are transformed to that agent's local coordinate frame
Message passing between nodes updates states based on spatially-aware neighbor information
Output: 2D waypoints with positions and orientations
Location uncertainty: Gaussian distribution
Orientation uncertainty: Von Mises distribution
Uses RoI-aligned features as input, refined by GNN interaction model

Conditional Forecasting Architecture

Aurora's conditional forecasting (the core innovation for their interleaved prediction-planning) works technically as follows:

Candidate ego trajectory generation: The Proposer generates dynamically feasible trajectory candidates
Conditional prediction: For each candidate ego trajectory, the forecaster predicts how surrounding actors would respond -- this produces "conditional" forecasts rather than "marginal" forecasts
Joint evaluation: The Ranker evaluates each (ego action, world response) pair
Causal reasoning: This enables "If I do X, what happens?" reasoning rather than "What will others do regardless of me?" -- a critical distinction for interactive driving scenarios like merges, lane changes, and unprotected turns

The key insight from Aurora's forecasting blog post: traditional cascaded systems produce marginal forecasts biased by the training data's driver behaviors, and cannot reason about how the AV's actions influence other actors. Conditional forecasting eliminates this.

LookOut: Joint Prediction-Planning

LookOut (ICCV 2021, Uber ATG) formalized this approach:

Generates diverse multi-future predictions rather than single deterministic trajectories
Uses a CVAE (Conditional Variational Autoencoder) framework for latent variable modeling
Optimizes the evidence lower bound (ELBO) of the log-likelihood
Scores candidate ego trajectories against multiple predicted futures
Plans are selected to be robust across diverse future scenarios
Joint training combines prediction losses with planning-specific losses

PnPNet: End-to-End Perception-Prediction

PnPNet (CVPR 2020, Uber ATG) demonstrated end-to-end integration:

Tracking Architecture:

LSTM-based sequence modeling for each tracked object
Two feature types feed the LSTM: (a) observation features from bilinear interpolation of BEV feature maps, (b) motion features including 2D velocities and ego angular velocity
Combined via MLP before LSTM processing

Data Association:

Bipartite matching using a learned affinity matrix with three components:
- MLPpair: Compares detection-track feature pairs
- MLPunary: Scores detections as potential new objects
- Hungarian algorithm: Solves optimal assignment
For occluded objects: searches within a local neighborhood centered at predicted positions

Motion Forecasting:

MLPpredict generates future trajectories directly from trajectory-level LSTM representations
Track length: T=16 frames (1.6 seconds)
Prediction horizon: 3 seconds
Maximum 50 tracks/detections per class

Training: Crucially avoids teacher forcing -- uses the model's own tracking outputs during training rather than ground truth, improving generalization.

Sources: LaneGCN (ECCV 2020), LookOut (ICCV 2021), PnPNet (CVPR 2020), Forecasting Part 1

7. Motion Planning Integration — Perception Interface

Perception Output Format to Planner

Based on Aurora's published materials, the perception system outputs to the motion planner:

From Mainline Perception (per tracked object):

Object class (vehicle, pedestrian, cyclist, motorcycle, construction equipment, etc.)
3D oriented bounding box (position, dimensions, orientation)
Velocity vector
Uncertainty estimates (per PnPNet/LaserNet's probabilistic outputs)
Predicted future trajectory (6 waypoints over 3 seconds per LaserFlow, or longer horizon from the prediction module)
Track ID and track age

From Remainder Explainer (per unknown object):

Generic bounding box (no class label)
Current position and estimated future position
ML-based avoidance score
Dimensional extent

From Construction Perception:

Individual cone/barrel detections
Aggregated blockage regions (treated as solid walls by the planner)
Temporary lane marking overrides

From Static Perception:

HD Map rasterized layers (7 binary mask channels)
Real-time lane detection (when map override is active)
Traffic signal states (from camera-based detection)

Planner Consumption

The Proposer-Ranker planner operates at ~10 Hz, consuming perception outputs to:

Proposer: Generates dynamically feasible trajectory candidates using perception-derived scene understanding
Conditional Forecaster: For each candidate trajectory, predicts how other actors would respond
Ranker: Scores each (trajectory, world response) pair using learned cost functions from Maximum Entropy IRL
Invariant enforcement: Hard safety constraints (traffic rules, collision avoidance) filter proposals
Selection: Highest-scoring trajectory that passes all invariants is executed

The Offline Executor simulation module can feed "the planner the set of inputs it otherwise would get from the perception module" -- confirming that the perception-planning interface is a well-defined data boundary enabling modular testing.

Validation of Perception-to-Planning

The validation module evaluates planned trajectories by asking:

Did the trajectory obey traffic law?
Did motion planning meet its objective?
Is the trajectory comfortable for vehicle occupants?

These questions implicitly validate perception quality: if perception provides incorrect inputs (wrong object positions, missed detections), the resulting trajectories will fail validation.

Sources: AI Alignment, The Offline Executor, Perception: No Measurement Left Behind

8. Construction Zone Perception — Technical Details

Detection Pipeline

The construction zone perception system operates in multiple stages:

Individual Element Detection:

SpotNet detects traffic cones and construction equipment as explicit output classes at long range
Cameras detect construction signage (speed limit changes, lane endings, merge warnings)
LiDAR and radar detect physical barriers, barrels, and delineators
Detection begins at sufficient range for highway-speed deceleration

Blockage Region Aggregation: Individual cone/barrel detections are aggregated into continuous blockage regions:

Adjacent or closely-spaced construction elements are merged into a single barrier representation
The blockage region is treated as equivalent to a solid wall by the motion planner
Visualized as yellow walls in Aurora's Lightbox visualization system
This aggregation converts point detections (individual cones at known locations) into area constraints (impassable zones)

Lane Override System: When temporary lane markings are detected by cameras:

The system compares perceived lane geometry against Atlas HD map lanes
If mismatch is detected, perception overrides the map
The vehicle follows perceived temporary lanes instead of mapped permanent lanes
This is critical in Texas construction zones where lane geometry shifts frequently

Nudging Capability

When construction elements encroach into adjacent lanes:

The system calculates how much encroachment exists
The motion planner can generate trajectories that "nudge" outside normal lane boundaries
This extends to driving partially on the shoulder to avoid construction obstacles
The nudge is bounded by safety constraints ensuring the vehicle does not enter oncoming traffic or hazardous areas

Scale of Testing

3,500+ miles of Texas construction zone driving
Approximately 40 construction sites per week encountered on the Dallas-El Paso route
Construction scenarios are converted into simulation test cases for regression testing

Sources: Tackling Construction

9. Emergency Vehicle Detection — Technical Details

Visual Detection System

The detection system uses camera-based rapid brightness and color change analysis:

Emergency vehicles are characterized by distinctive patterns of rapidly alternating bright colors (red, blue, white) from light bars
The system is trained on labeled camera data to recognize these temporal patterns
Training data was collected at Aurora's Almono test track in Pittsburgh, PA and at the Transportation Research Center's test track in East Liberty, OH
Operators manually drove Aurora Driver-equipped vehicles while others operated law enforcement-issue emergency vehicles
Scenarios reenacted: traffic stops, high-speed pursuits, and accident scenes
Captured data was converted into simulation scenarios with parametric variations

Audio Siren Detection (In Development)

Aurora has disclosed active development of audio-based siren detection:

Training on "different types of sirens at varying levels of volume and proximity"
Using labeled audio files as training data
This capability is described in aspirational terms, suggesting it may not yet be in production

Behavioral Response Protocol

Once an emergency vehicle is detected:

Determine if the ego vehicle's path is affected
Implement immediate corrective action (lane change or speed reduction)
Comply with Texas law: maintain at least 500 feet from active emergency vehicles
Follow "Move Over or Slow Down" regulations (lane change or 20 mph speed reduction)
Alert remote assistant via Aurora Beacon for additional support

Discrimination Challenges

The system must distinguish emergency vehicles from:

Tow trucks with amber warning lights
Construction vehicles with flashing beacons
Other vehicles with aftermarket lighting
The rapid brightness and color change detection specifically targets the high-frequency alternation pattern unique to emergency light bars

Source: Emergency Vehicles

10. Night Driving Perception — Technical Details

FMCW Advantage at Night

FirstLight FMCW LiDAR provides a fundamental advantage in darkness:

Coherent detection is sensitive to single photons -- operates at the quantum noise limit
Performance is independent of ambient illumination (unlike cameras, which require external light sources)
Maintains full 450m+ range at night (Gen 2: 1,000m)
Detects pedestrians "over 300 meters away at night, before they would have been visible to the naked eye"

Camera Degradation Compensation

At night, cameras provide limited utility:

Aurora's demo videos show the camera view as nearly black at night, with the pedestrian invisible to cameras
The system relies primarily on LiDAR for detection, with radar providing supplementary confirmation
FMCW's instantaneous Doppler velocity is especially valuable at night, enabling immediate classification of moving vs. stationary objects without requiring multi-frame visual tracking (which fails without adequate illumination)

Safety Margin

37% of fatal crashes involving large trucks occur at night
The Aurora Driver detects objects in darkness more than 450 meters away
This provides approximately 9-11 seconds of reaction time at highway speeds before encountering road obstacles
The detection range exceeds what a human driver could perceive, even with headlights

Pre-Launch Validation

Before launching driverless night operations (July 2025):

Aurora examined "multiple collisions involving human-driven Class 8 trucks" on the Dallas-Houston route
Recreated these scenarios in simulation with variations in "time of day, vehicle size, distance, speed"
Human operators accumulated 3,000 nighttime loads over 700,000 commercial miles since 2023 before driverless night operations began
Night operations more than double potential truck utilization

Sources: The Road Never Sleeps, Detecting a Pedestrian at Night

11. Adverse Weather Perception — Technical Details

Rain, Fog, and Dust Effects on Sensors

LiDAR degradation in precipitation:

Light from LiDAR sensors bounces off rain droplets or snowflakes, "creating what can appear to be a cloud hovering around the truck"
FMCW's instantaneous Doppler can help distinguish rain returns (near-zero velocity relative to air, falling) from solid objects (stationary or moving along the road surface)
The FMCW advantage over ToF: rain/snow produce random phase noise in ToF systems, while FMCW can filter precipitation by velocity signature

Camera degradation:

Rain on camera lenses causes "blurry and desaturated" images with "reduced contrast and definition"
The sensor cleaning system blasts "a combination of high-pressure air and washer fluid" across each lens, "completely cleaning each lens in milliseconds"

Imaging radar advantage:

In dust storms and fog, "the perception system shifts weight to imaging radar"
Aurora's custom imaging radar penetrates rain, dense fog, and snow
Provides "true and precise 3D images" even when optical sensors are degraded
This dynamic modality weighting is a key graceful degradation mechanism

Three-Tier Graceful Fallback

Normal conditions: Full speed, all sensors operational
Degraded visibility: "Slow and proceed with caution" -- applies to rain, snow, fog, dust, smoke, and even insects on sensor lenses
Severe conditions: "Begin searching for a safe place to stop and alert the Command Center"

The system's perception quality monitoring runs continuously: the "perception system is constantly assessing the range and quality of the data its sensors record."

Weather Validation Timeline

Inclement weather constrained Texas operations ~40% of the time in 2025
Rain and heavy wind validation completed in a January 2026 software release (a few weeks later than initially planned)
This was described as "a step-change in potential uptime" for fleet utilization
As of February 2026, the Aurora Driver can "driverlessly navigate highways and surface streets through multiple forms of inclement weather, including rain, fog, and heavy wind"

Wind Effects on Trucks

While not explicitly detailed, Aurora's mention of "heavy wind" validation is significant for Class 8 trucks:

Unladen trailers act as large sail areas susceptible to crosswinds
The perception system must account for lateral displacement due to wind gusts
Control and planning must adapt to wind-induced trajectory perturbations

Sources: Stormy Weather, Aurora Triples Network

12. New Papers and Publications (2024-2025)

Published Papers

SpotNet (arXiv, May 2024):

"An Image Centric, Lidar Anchored Approach To Long Range Perception"
arXiv:2405.15843
First Aurora paper to describe the long-range detection system in detail
Introduces the LiDAR-anchored image detection paradigm with O(1) compute scaling

Citations in Survey Papers

Aurora's prediction approach is discussed in a 2025 trajectory prediction survey (arXiv:2503.03262), which notes that Aurora's "prediction module performs joint object detection and trajectory forecasting, combining learning-based methods with rule-based constraints to align predictions with driving rules."

Note on Publication Cadence

Aurora has significantly reduced its academic publication rate compared to the Uber ATG era. This likely reflects:

A shift from research exploration to production deployment
The #CarryOurWater engineering tenet: "no separate research team" -- engineers build production systems rather than publishing papers
Competitive sensitivity as the company operates commercially

The Uber ATG heritage papers (LaserNet, LaserFlow, LaneGCN, PnPNet, SpAGNN, PIXOR, ContFuse, IntentNet, Fast and Furious, HDNET, LiDARsim, LookOut, MP3, MMF, OSIS, Multi-View Fusion) remain the most detailed public descriptions of Aurora's underlying perception algorithms.

13. Recent Patents (2024-2025)

Patent Portfolio Growth

Aurora's patent portfolio grew from ~1,400 assets to over 1,800 patents and pending applications by December 31, 2024.

Key Recent Grants

US11933901 (March 2024): Light detection and ranging (LiDAR) sensor system including bistatic transceiver -- covers the optical architecture of multi-receiver FMCW LiDAR configurations for improved angular resolution.

Multi-modal Sensor Fusion Patent (granted March 2025): Covers an autonomous vehicle system that obtains sensor data from multiple sensors with different modalities (RADAR, LiDAR, camera), fuses them to create a fused sensor sample, and provides this to a machine learning model for object detection and/or motion prediction. Notable for explicitly claiming "improved generalization performance over multiple sensor modalities."

US10579063B2: Machine learning for predicting locations of objects perceived by autonomous vehicles -- covers the prediction system including a machine-learned static object classifier, goal scoring model, trajectory development model, and ballistic quality classifier. Notably describes object representation as "bounding polygon or other shape."

US11099569B2: Systems and methods for prioritizing object prediction -- covers determining priority classifications for perceived objects and predicting future states in priority order (i.e., allocating more compute to high-priority objects).

Patent Portfolio Composition

~58% hardware (predominantly FMCW LiDAR from Blackmore acquisition): 19 patent families covering beam steering, chirp ranging, transceiver design
~42% software: ML perception, planning, simulation
Filing concentrated in US Patent Office (~100% of filings)
Uber ATG patents (acquired 2020) are cataloged separately and significantly expand the total portfolio

Sources: Aurora 10-K Filing, GreyB Analysis

14. Perception Team Structure

Leadership

Drew Bagnell — Co-Founder & Chief Scientist. Former Head of Perception and Autonomy Architect at Uber ATG (2015-2016), Consulting Professor at CMU Robotics Institute and Machine Learning Department. 43,000+ citations. Invented MaxEnt IRL, DAgger, LEARCH. Won the 2024 AAAI Classic Paper Award. Over 25 years at the intersection of ML, robotics, and autonomous vehicles.

Organizational Philosophy

Aurora explicitly rejects the model of separated research and engineering teams:

The #CarryOurWater tenet means "all teams share tool-building responsibility; no separate research team"
Engineers work in "pods that are effective, because they have clear targets and the freedom to choose the AI techniques that most expediently improve those targets"
This means perception researchers write production C++ code, and production engineers participate in ML research

Team Scale

As of available data, Aurora employs approximately 463 total employees with 215 dedicated engineering professionals. The perception team is one of several engineering organizations, alongside planning, simulation, hardware, infrastructure, and fleet operations.

Key Perception Sub-Teams (from Job Postings)

Perception - Detection: Builds ML models for object detection (LaserNet, SpotNet descendants)
Perception - Tracking: State estimation, EKF implementation, 3D object tracking (S2A)
Perception - Evaluation: Measures and validates perception quality against ground truth
Perception - Data Curation: Manages training data selection, mining, and quality
Perception - Sensor Fusion: Multi-modal fusion architectures
Perception - Calibration: Online and offline sensor calibration systems

Compensation Range

Technical Lead Manager, Perception positions command $189K - $302K base salary. Staff Software Engineer - Data Curation roles command $189K - $303K.

Sources: Leadership Team, Aurora Careers, Engineering Tenets

15. Data Engine — Detailed Architecture

End-to-End Pipeline

Aurora's data engine operates as a cyclical ML lifecycle:

Identify data requirements
    |
    v
Mine on-road sensor data (petabytes in S3)
    |
    v
Label data (manual + auto-labeling)
    |
    v
Generate datasets (S2T pipeline)
    |
    v
Train ML models (SageMaker + PyTorch)
    |
    v
Evaluate at subsystem + system level
    |
    v
Deploy to fleet (within 2 weeks of new data)
    |
    v
Collect new driving data --> feedback loop

Three-Layer Infrastructure

Build Layer: Docker images and artifacts constructed via Buildkite CI/CD
ML Orchestration Layer: Kubeflow Pipelines on EKS (installed via Terraform)
Compute Layer: Amazon SageMaker (distributed training) + Aurora's internal Batch API (dataset generation, evaluation)

S2T (Sensor-to-Tensor) Pipeline

The S2T pipeline is Aurora's core perception preprocessing pipeline and was the first ML pipeline built at Aurora:

Converts raw sensor data into tensor representations suitable for neural network consumption
Includes voxelization, range view projection, camera preprocessing, and map rasterization
Before Kubeflow automation: took weeks of ML engineer time to run
After automation: takes one week with zero manual intervention
Updated S2T models land at minimum bi-monthly

Training Infrastructure

Framework: PyTorch (primary), with TensorFlow and JAX also supported
Distributed training: Amazon SageMaker with 50+ training configuration parameters passed via SageMaker hyperparameters to an internal "Training Main Wrapper"
Training visualization: Custom TensorBoard component in Kubeflow
GPU: NVIDIA A10 GPUs used for SpotNet inference; training likely uses P4d instances (A100 GPUs)

Custom Kubeflow Components

Aurora built domain-specific Kubeflow components:

SageMaker wrapper: Launches distributed training jobs
Batch API integration: Runs dataset generation and metrics evaluation on Aurora's internal compute
GitHub component: Creates PRs and posts comments for model integration
Slack component: Sends notifications (exit handler for all pipelines)
TensorBoard component: Spins up training visualization instances
Components use a factory method pattern allowing compile-time parameter overrides

Pipeline Launch Methods

CLI: Local experimentation
PR Commands: /kubeflow train --model [name] --training_type [core, deploy, integration]
CI: Automated health monitoring
CD: Automated end-to-end production deployment workflows

Continuous Deployment

New datasets and models can be deployed within two weeks of new data availability. The CD system runs automated end-to-end deployment workflows including dataset generation, training, and evaluation without manual intervention.

Sources: Data Engine, Kubeflow Summit 2022 Notes

16. Labeling Infrastructure

Manual Labeling

Aurora employs human specialists who:

Review driving log footage
Label objects with properties: object category, velocity, position, heading, dimensions
Create ground truth annotations for perception evaluation
Annotate complex scenes where "each actor (vehicles, pedestrians, bicyclists) has been carefully annotated by human experts"

Auto-Labeling and Map Generation

The Atlas mapping system demonstrates Aurora's auto-labeling capabilities:

After a single manual drive, cloud-based algorithms "generate semantic components" with "little to no human assistance"
ML models automatically extract lane boundaries, signs, signals from raw sensor data
Human annotators review and correct auto-generated annotations

Data Mining for Gap Identification

The data engine starts by "identifying the type of data required to support or improve an autonomous vehicle capability." For example:

To detect emergency vehicles: "the perception system needs lots of sensor data of emergency vehicles in different situations"
Data mining algorithms search the petabyte-scale sensor data archive for relevant scenarios
This enables targeted data collection rather than random sampling

Online-to-Offline (O2O) Pipeline

Real-world events are systematically converted to training data:

Copilot annotations: Vehicle operators flag "interesting, uncommon, or novel" experiences
Disengagements: Instances where operators retook control
Triage: Specialist team reviews and diagnoses significance
Test creation: Single events generate multiple test types (perception tests, trajectory evaluations, simulations)
Amplification: A single real-world event generates 50+ simulation variations by modifying parameters (speed, distance, object size)

Example: From one nudging event around a stopped vehicle, Aurora created "50 new nudging simulations" and has "practiced nudging more than 20 million times in simulation."

Sources: Virtual Testing, O2O

17. Perception Metrics

Autonomous Perception Indicator (API)

Aurora's primary fleet-level perception metric is the Autonomous Perception Indicator (API):

Tracks the percentage of total commercially-representative miles where the truck does not require human assistance
"Support" means assistance via a local vehicle (not remote assistance)
Aurora does not expect API to ever reach 100% at aggregate level (flat tires and similar events always require on-site support)
The percentage of 100% API loads is the key progress indicator
Q1 2025: 95% of loads running production release software had 100% API -- exceeding the Commercial Launch target of ~90%

Perception Testing Metrics

At the subsystem level, perception is evaluated through two approaches:

Broad performance metrics:

"How many pedestrians does it correctly identify?" (recall)
"How many false alarms does the system generate?" (precision)
These are evaluated across the full labeled dataset

Specific capability tests:

"Does the perception system see this bicyclist right next to our vehicle?" (per-scenario recall)
Targeted evaluation of individual detection scenarios

Safety Case Integration

Perception metrics feed into the Safety Case Framework:

~10,000 requirements include perception-specific requirements
4.5 million tests are run before each software release, including perception-specific tests
Perception capabilities must be "appropriate for the operating environment" (Operational Design Domain)
The Autonomy Readiness Measure (ARM) was 84% complete by Q3 2024, reaching completion for commercial launch in April 2025

Recall vs Precision Tradeoffs

Aurora explicitly discusses the safety implications:

Low recall = "might not see something important, which could be a serious safety concern" (missed detections)
Low precision = "passengers might be subjected to unnecessary maneuvers that make the ride uncomfortable and also present safety concerns" (false positives causing unnecessary braking/swerving)

Sources: Q1 2025 Shareholder Letter, Virtual Testing, Building for Scale

18. Hardware-Software Co-Design

First Principles Design Process

Aurora's hardware-software co-design follows a structured process:

Boundary scenario identification: The perception team identifies edge cases the vehicle must handle safely
Sensing requirement derivation: Boundary scenarios determine required detection range, resolution, field of view, and update rate
First principles physics: Stopping distances at highway speeds (65-80 mph for Class 8 trucks at 80,000 lbs), road curvature visibility requirements, object minimum detectable size at maximum range
Sensor configuration simulation: ~20,000 hardware design scenarios evaluated in simulation before manufacturing prototypes (using Colrspace/Protocolr for sensor-realistic rendering)
Perception expert evaluation: Hardware configurations evaluated by perception engineers against detection requirements
Iterative refinement: Hardware and software teams jointly optimize

The Key Question

Aurora's hardware team asks: "Does a single pixel 300 meters away provide enough data?" This drives camera resolution requirements, lens specifications, and sensor placement decisions.

Hardware Generation Evolution

Aspect	Gen 1 (Current)	Gen 2 (Mid-2026)	Gen 3 (2027)
Manufacturer	Aurora (internal)	Fabrinet	AUMOVIO (formerly Continental)
Compute	5,400 TOPS	Enhanced	Dual NVIDIA DRIVE Thor (~2,000 TFLOPS, Blackwell architecture)
LiDAR range	450m+	1,000m (2x improvement)	1,000m+
Hardware cost	Baseline	50%+ reduction	Further reduction
Durability	~300,000 miles	1 million miles (3x improvement)	Tens of thousands of trucks
Production scale	Small fleet	Up to ~1,500 units	Tens of thousands
Safety standard	Internal	Enhanced	ISO 26262 ASIL-D
Backup system	Dual-computer	Dual-computer	Continental/AUMOVIO specialized independent secondary system

Compute Architecture for Perception

The Aurora Driver computer uses:

Enterprise-class server architecture (not automotive-grade SoCs in Gen 1)
Processors designed for ML acceleration and camera signal processing
Custom TSN networking switch with an advanced networking chip synchronizing all sensors to microsecond precision
Self-sufficient hub design: conditions its own power, coordinates its own sensors, communicates with the vehicle over "a simple umbilical"
Vehicle-agnostic: Same compute platform operates across sedans (Toyota Sienna) and Class 8 trucks (Peterbilt, Volvo, International)

NVIDIA DRIVE Thor Integration (Gen 3)

Dual DRIVE Thor SoC configuration running DriveOS
Built on NVIDIA Blackwell architecture
1,000 TFLOPS per SoC (~2,000 TFLOPS total)
Designed to "accelerate inference tasks critical for autonomous vehicles to understand and navigate the world around them"
Production samples delivered in first half of 2025
AUMOVIO will mass-manufacture starting 2027

Perception Compute Efficiency

In Q4 2025 earnings materials, Aurora disclosed that a "new vision architecture" within its perception system "enables the Aurora Driver to perceive its surroundings more efficiently, reducing compute requirements." This model advancement "delivers superior performance at a lower cost than their previous robust perception model." This suggests that as compute budgets shrink with Gen 2/3 hardware cost targets, Aurora is simultaneously making perception models more efficient -- a critical co-design constraint where hardware cost reduction must not sacrifice perception quality.

Sensor Simulation for Hardware Design

Aurora's Colrspace acquisition (Pixar veterans) provides sensor-realistic simulation for hardware co-design:

Protocolr technology uses neural networks for inverse rendering: given an image, it infers texture, specularity, roughness, and sheen of surfaces
Combined with a differentiable image renderer to produce sensor-realistic synthetic data
Simulates camera, conventional LiDAR, and FMCW LiDAR
Evaluated ~20,000 hardware design scenarios before manufacturing prototypes
Enables perception teams to test detection algorithms against proposed sensor configurations before physical hardware exists

Sources: Product Design from First Principles, Meet Fusion, NVIDIA Partnership, Colrspace Acquisition, Q4 2025 Earnings

19. Uber ATG Perception Heritage — Complete Research Lineage

Aurora inherited a world-class perception research portfolio from the Uber ATG acquisition (January 2021). This section documents the complete lineage and how each paper contributed to Aurora's production perception stack.

Detection Lineage

Paper	Venue	Key Innovation	Production Legacy
PIXOR	CVPR 2018	Single-stage BEV oriented box detector	Established BEV detection paradigm
LaserNet	CVPR 2019	Range-view per-point multimodal detection with Laplace uncertainty	Core range-view detection architecture
LaserNet++	CVPRW 2019	Added 6-class semantic segmentation + camera fusion	Road surface estimation
HDNET	CoRL 2018	HD map as detection prior + online map prediction	Map-less fallback capability
SpotNet	arXiv 2024	Image-centric LiDAR-anchored long-range detection	Long-range perception system

Sensor Fusion Lineage

Paper	Venue	Key Innovation	Production Legacy
ContFuse	ECCV 2018	Continuous convolution for LiDAR-camera fusion	Point-wise cross-modal feature alignment
MMF	CVPR 2019	Ground estimation + depth completion as auxiliary tasks	Ground plane estimation
Multi-View Fusion	WACV 2022	BEV+RV+Camera fusion with 7-channel map	Core fusion architecture
RV-FuseNet	arXiv 2020	Range-view temporal fusion	Temporal feature integration
MVFuseNet	CVPR 2021 WS	Multi-view (RV+BEV) joint detection and forecasting	Combined view architecture

Tracking and Prediction Lineage

Paper	Venue	Key Innovation	Production Legacy
LaserFlow	ICRA 2022 (Best Paper)	Joint detection + motion forecasting in range view	Unified detection-prediction in RV
IntentNet	CoRL 2018	Joint detection + intention + trajectory prediction	Intent classification
Fast and Furious	CVPR 2018	Joint detection + tracking + forecasting with temporal 3D conv	End-to-end temporal reasoning
PnPNet	CVPR 2020	End-to-end perception + prediction with tracking-in-the-loop	Tracking architecture (LSTM + Hungarian)
SpAGNN	Uber ATG	Spatially-aware GNN for multi-agent interaction	Interaction modeling
LaneGCN	ECCV 2020 (Oral)	Lane graph convolutions for motion forecasting	Lane-aware prediction
LookOut	ICCV 2021	Diverse multi-future prediction + planning (CVAE)	Conditional forecasting

Simulation and Data Lineage

Paper	Venue	Key Innovation	Production Legacy
LiDARsim	CVPR 2020	Realistic LiDAR simulation from real-world 3D meshes	Perception simulation
MP3	CVPR 2021	Map-less driving with online BEV mapping + occupancy flow	Online mapping capability
OSIS	arXiv 2019	Open-set instance segmentation for unknown objects	Remainder Explainer foundation

Key Researchers from Uber ATG (Now at Aurora or Moved On)

Many of the key researchers from the Uber ATG papers have continued in autonomous driving:

Raquel Urtasun (Uber ATG Chief Scientist) -- founded Waabi
Ming Liang, Bin Yang, Shenlong Wang -- co-authors on ContFuse, PIXOR, IntentNet, PnPNet, and other foundational papers
Carlos Vallespi-Gonzalez -- co-author on Multi-View Fusion (WACV 2022), one of the few post-acquisition Aurora publications
Nemanja Djuric -- co-author on Multi-View Fusion, continued at Aurora

The Aurora Multi-Sensor Dataset itself was captured between January 2017 and February 2018 in Pittsburgh, PA by Uber ATG before the acquisition.

20. Aurora Multi-Sensor Dataset (MSDS)

Dataset Specifications

Property	Value
LiDAR	64-beam Velodyne HDL-64E
Cameras	7 cameras at 1920x1200 pixels (forward stereo pair + 5 wide-angle, 360-degree coverage)
Geographic area	Pittsburgh, PA metropolitan area
Collection period	January 2017 -- February 2018
Conditions	All four seasons, rain, snow, overcast, sunny, different times of day, varying traffic
Annotations	Weather labels, semantic segmentation, highly accurate localization ground truth
License	CC BY-NC-SA 4.0 (non-commercial academic use only)
Hosting	AWS Registry of Open Data

Intended Research Applications

Autonomous vehicle localization approaches
3D reconstruction
Virtual tourism
HD map construction
Map compression algorithms

Limitations

The MSDS uses a Velodyne HDL-64E (ToF LiDAR), not Aurora's FirstLight FMCW LiDAR. This means the dataset lacks instantaneous Doppler velocity data. The dataset predates Aurora's FMCW integration and represents Uber ATG-era sensor configurations.

Source: Aurora MSDS on AWS

21. Verifiable AI and Perception Debugging

Compound AI System Architecture

Aurora's "Verifiable AI" approach structures the perception-to-planning pipeline as a compound system (not end-to-end):

Each module (perception, prediction, planning) produces semantically meaningful intermediate representations
These intermediates are inspectable: "What was the state of the light? What were other actors likely to do?"
When failures occur, engineers can "introspect what went wrong and dig many steps deeper to identify the root of the problem"
Incorrect predictions can be identified and the specific failure mode covered, rather than retraining entire end-to-end models

Invariant-Based Safety Layer

Hard safety constraints are encoded as invariants rather than learned behaviors:

"Don't depart the roadway" -- requires perception to output road boundary estimates
"Stop at red lights" -- requires perception to output traffic signal states
"Yield for emergency vehicles" -- requires emergency vehicle detection
These invariants constrain planning, but perception must provide the semantic inputs they depend on

Graph Neural Networks and Transformers in the Stack

Aurora confirms using:

Graph Neural Networks: For modeling multi-agent interactions in prediction (LaneGCN/SpAGNN heritage)
Transformer-style architectures: "On the road since 2021" -- likely used in both perception (attention-based feature fusion) and prediction (attention-based interaction modeling)
These are described as "generative AI approaches to self-driving that we pioneered"

Perception Debugging Workflow

The Lightbox visualization system enables perception debugging:

Web-based 3D scene visualization
Customizable views for Perception, Planning, Triage, and Data Science teams
Built on XVIZ (Aurora's open-source C++ visualization API)
Shows sensor data, perception outputs, mapping, simulation data, and planning decisions
Tracks objects "over time, with varying levels of granularity"

Sources: AI Transparency, Verifiable AI, Lightbox

22. Sensor Simulation for Perception Development

Colrspace/Protocolr Technology

Aurora's sensor simulation (from the 2022 Colrspace acquisition) enables perception development without physical hardware:

Inverse Rendering Pipeline:

Input: Real-world photograph of a surface
Neural network infers material properties: overall color, specularity, roughness, sheen
Differentiable image renderer produces synthetic camera images that are nearly indistinguishable from real photos
The same material models drive LiDAR simulation (return intensity depends on material properties) and FMCW simulation (Doppler signatures from surface motion)

LiDARsim (CVPR 2020) provides the LiDAR simulation foundation:

Builds a 3D mesh catalog from fleet driving data (static maps + dynamic objects)
Ray casting over 3D scenes produces physics-based LiDAR returns
A neural network produces deviations from physics-based simulation, adding realistic effects:
- Dropped points on transparent surfaces (windshields)
- High intensity returns on reflective materials (license plates)
- Beam divergence effects at long range
Enables testing on long-tail events and safety-critical scenarios without real-world occurrence

Scale of Use:

~20,000 hardware design scenarios evaluated before manufacturing prototypes
Enables perception teams to validate detection algorithms against proposed sensor configurations before physical hardware exists

Sources: Colrspace Acquisition, LiDARsim (CVPR 2020)

23. Operational Perception Performance (as of March 2026)

Fleet-Level Statistics

250,000+ driverless miles with zero Aurora Driver-attributed collisions
10 routes across Texas, New Mexico, and Arizona
100% on-time performance
95% of loads achieved 100% API (Autonomous Perception Indicator) on production software
Over 4 million tests run before each software release (latest validation)

Perception Capability Milestones

Capability	Validation Date	Key Perception Requirement
Daytime highway driving	April 2025 (commercial launch)	Full sensor suite operational
Night driving	July 2025	FMCW LiDAR primary; cameras degraded
Rain and fog driving	January 2026	Imaging radar primary; LiDAR/camera degraded
Heavy wind driving	January 2026	Control adaptation for wind; perception unchanged
10-route expansion	February 2026	Atlas map coverage + perception generalization

Impact of Weather Validation

Before rain validation: inclement weather constrained operations ~40% of the time in Texas. After January 2026 software release: "step-change in potential uptime," enabling near-continuous driverless operations across the Sun Belt.

Sources: Aurora Triples Network, Q4 2025 Earnings

Appendix A: Complete Published Paper Bibliography

Aurora Innovation (Post-2021)

SpotNet: An Image Centric, Lidar Anchored Approach To Long Range Perception. arXiv:2405.15843, May 2024.
Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving. WACV 2022. (Fadadu, Pandey, Hegde, Shi, Chou, Djuric, Vallespi-Gonzalez)

Uber ATG Heritage (Pre-2021, Now Aurora IP)

PIXOR: Real-time 3D Object Detection from Point Clouds. CVPR 2018.
Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting. CVPR 2018.
Deep Continuous Fusion for Multi-Sensor 3D Object Detection (ContFuse). ECCV 2018.
HDNET: Exploiting HD Maps for 3D Object Detection. CoRL 2018.
IntentNet: Learning to Predict Intention from Raw Sensor Data. CoRL 2018.
LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving. CVPR 2019.
LaserNet++: Multi-task Learning for Joint Semantic Segmentation and 3D Object Detection. CVPRW 2019.
Multi-Task Multi-Sensor Fusion for 3D Object Detection (MMF). CVPR 2019.
Identifying Unknown Instances for Autonomous Driving (OSIS). arXiv:1910.11296, 2019.
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World. CVPR 2020.
PnPNet: End-to-End Perception and Prediction with Tracking in the Loop. CVPR 2020.
Learning Lane Graph Representations for Motion Forecasting (LaneGCN). ECCV 2020 (Oral).
RV-FuseNet: Range View Fusion for Joint Detection and Motion Forecasting. arXiv 2020.
LaserFlow: Efficient and Probabilistic Object Detection and Motion Forecasting. RA-L 2021 / ICRA 2022 (Best Paper).
MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion. CVPR 2021 Workshop.
MP3: A Unified Model to Map, Perceive, Predict and Plan. CVPR 2021.
LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving. ICCV 2021.
SpAGNN: Spatially-Aware Graph Neural Networks for Relational Behavior Forecasting from Sensor Data. Uber ATG.

Patent Number	Title	Key Innovation
US10310087B2	Range-view LIDAR-based object detection	Per-cell class + instance center + bounding box in range view
US10579063B2	ML for predicting locations of objects	Bounding polygon footprint; static object classifier
US10598791B2	Object detection based on LiDAR intensity	Intensity-based classification
US10967862B2	Road anomaly detection	IMU-based pothole/crack detection
US11029395B1	Systems for pulsed-wave LIDAR	Advanced LIDAR pulse systems
US11099569B2	Prioritizing object prediction	Priority-ordered prediction compute allocation
US11217012B2	Travel way feature identification	2D/3D fusion with spatio-temporal memory
US11262437B1	Mirror Doppler compensation	Template-matching for scanning mirror Doppler
US11366200	FMCW scanning corrections	Scanning artifact correction
US11521396B1	Probabilistic prediction of dynamic objects	Probabilistic occupancy distributions
US11531346B2	Goal-directed occupancy prediction	Lane-based 1D occupancy cells with GCN
US11550061B2	Phase coherent LIDAR classification	FMCW per-point velocity for static/dynamic separation
US11561548B2	Basis path generation	Motion planning trajectory generation
US11933901	Bistatic transceiver LIDAR	Multi-receiver FMCW for angular resolution
US12051001B2	Multi-task multi-sensor fusion	3D boxes + ground geometry estimation
US20190317219A1	Phase-coherent perception	Class-adaptive velocity aggregation
US20200234110	Dynamic dropout for neural network robustness	Adversarial robustness training
US20200400821A1	Doppler LIDAR odometry and mapping	Ego-motion from single FMCW sweep
US20210096253A1	Complementary simultaneous chirp	Dual-laser FMCW for faster acquisition
US20230057509A1	Vision-based ML with adjustable virtual camera	Camera-based perception with synthetic viewpoints

Document compiled March 2026. All information sourced from publicly available Aurora blog posts, published academic papers, granted patents, SEC filings, earnings call transcripts, investor presentations, job postings, and third-party analysis.

SLAM Methods

Methods

Aurora Innovation — Perception Stack Supplementary Deep Dive ​

1. S2A (Sensor-to-Adjustment) Deep Dive ​

Architectural Positioning ​

Per-Object Crop-and-Refine Mechanism ​

Execution Model ​

Relationship to Classical Tracking ​

Training Methodology (Inferred) ​

Sensor Dropout Training ​

2. Remainder Explainer Deep Dive ​

Core Philosophy: Explain Every Measurement ​

How Unknown Objects Get Avoidance Scores ​

Features Driving the Avoidance Score ​

Relationship to OSIS Paper ​

Real-World Validation ​

3. FMCW LiDAR Perception Advantages — Additional Details ​

Instantaneous Doppler in the Perception Pipeline ​

4. SpotNet — Additional Architecture Details ​

Backbone Architecture ​

RGB-D Input Construction ​

LiDAR Anchoring — Why It Matters ​

Critical Finding: Multi-Modal Supervision ​

2MP-to-8MP Transfer Technique ​

Post-Processing Pipeline ​

Dataset Scale ​

Computational Scaling ​

5. Multi-View Fusion — Evolution and Architecture ​

WACV 2022 Paper Architecture ​

Evolution Beyond WACV 2022 ​

MVFuseNet and RV-FuseNet Lineage ​