Aurora Innovation — Full AV Tech Stack Deep Dive
1. Perception System
Aurora's perception uses a three-layer "Swiss Cheese" architecture where overlapping layers compensate for each other's gaps:
| Layer | Function | Key Detail |
|---|---|---|
| Mainline Perception | Fuses lidar+camera+radar+HD map; detects/tracks known actors (vehicles, pedestrians, cyclists) | Powered by the S2A (Sensor-to-Adjustment) neural network operating on raw sensor data thousands of times per second per tracked object |
| Remainder Explainer | Explains every unexplained sensor measurement; tracks unknown/novel objects | ML-based avoidance score from dimensions, material type, motion. "No Measurement Left Behind" philosophy — detected a charred trailer fragment never seen in training |
| Fault Management System | Monitors hardware/software health; triggers safe pullover if degraded | Redundant backup computer; dozens of fault injection tests run daily |
Sensor Fusion Pipeline — Early fusion with three stages:
- Sensor-to-Tensor: LiDAR+HD Map+RADAR → Euclidean View (3D); cameras → Image View (2D); LiDAR → Range View (2D spherical)
- Feature Extraction: Separate CNN branches per view
- Final Fusion: 2D features projected back into 3D, concatenated, processed through additional conv layers
Neural Architectures:
- CNNs for multi-view feature extraction
- Graph Convolutional Networks (GCNs) for motion prediction (agents as nodes, interactions as edges)
- Transformers/attention — Aurora states they've been "using transformer-style models on the road since 2021"
Key Published Research:
| Paper | Venue | Innovation |
|---|---|---|
| LaserNet | CVPR 2019 | Fully convolutional range-view detector with multimodal bounding box distributions |
| LaserFlow | ICRA 2022 (Best Paper RA-L) | Joint detection + motion forecasting in range view; first range-view method to beat BEV SOTA |
| SpotNet | arXiv 2024 | Image-centric, lidar-anchored long-range 3D detection with O(1) compute complexity vs range |
Inherited from Uber ATG acquisition: PIXOR, ContFuse, SpAGNN, PnPNet, LIDARsim, LaneGCN, LookOut.
Online Sensor Calibration — auxiliary regression head on existing detection model; corrects camera-lidar misalignment in <100ms with <5% error. Critical because at 400m+, milliradians of offset = a full highway lane.
Sources: Seeing with Superhuman Clarity, Perception: No Measurement Left Behind, Continuous Real-Time Sensor Recalibration, SpotNet (arXiv:2405.15843)
2. Motion Planning & Prediction
Architecture: Proposer-Ranker (not end-to-end, not purely rule-based)
- Proposer: Generates dynamically feasible trajectory candidates ("correctness by construction")
- Ranker: Scores each proposal using learned cost functions; runs ~10 Hz
- Safety invariants (traffic rules, collision avoidance) enforced as hard constraints in ranking
Interleaved Forecasting & Planning — Aurora's key architectural differentiator:
- Rejects traditional cascaded pipeline (perception → prediction → planning)
- Instead: for each candidate AV action, the system predicts how other actors would respond (conditional forecasting)
- Evaluates joint (AV action, world response) outcomes
- Enables causal reasoning: "If I do X, what happens?" vs. "What will others do regardless of me?"
Learning Methods:
| Method | Role | Origin |
|---|---|---|
| Maximum Entropy IRL | Learns reward functions from human demonstrations | Drew Bagnell (co-founder), AAAI 2008 — won 2024 Classic Paper Award |
| LEARCH | Functional gradient imitation learning | Bagnell, Autonomous Robots 2009 |
| DAgger | Interactive imitation learning | Ross, Gordon, Bagnell (AISTATS 2011) |
| RLHF | Preference learning from virtual scenario annotations | Applied when real driving data unavailable |
"Verifiable AI" — hybrid approach: learned behavior from expert demonstrations + encoded invariants (stop at red lights, yield for emergency vehicles). Aurora explicitly rejects pure end-to-end as "an unmaintainable quagmire."
Complex Scenario Handling:
- Construction zones: 3,500+ miles of Texas construction navigated; cones/barrels treated as solid barriers; can override Atlas map with real-time perception of temporary lane lines
- Emergency vehicles: Detects rapid brightness/color changes; partnered with Frisco PD for testing
- Merges: Core use case for interleaved forecasting
- Night driving: Driverless since 2025; FirstLight LiDAR detects pedestrians >300m at night
Sources: AI Alignment blog, Verifiable AI blog, Forecasting Part 1, Construction Zones
3. Hardware & Sensor Suite
FirstLight LiDAR (FMCW)
| Spec | Value |
|---|---|
| Technology | Frequency-Modulated Continuous Wave (FMCW) — coherent detection |
| Wavelength | 1550 nm (eye-safe at higher power than 905nm ToF) |
| Range | Current: 450m+; Gen 2 (2026): 1,000m |
| Velocity | Instantaneous per-point Doppler (no frame-to-frame estimation needed) |
| Sensitivity | Single-photon sensitive |
| Interference | Immune — only detects its own signal timing/frequency/wavelength |
| Form factor | Evolving to silicon photonics chip (10x size reduction from OURS Technology acquisition) |
| Manufacturing | 78,000 sq ft facility in Bozeman, MT |
Acquisitions: Blackmore Sensors (2019, FMCW architecture) + OURS Technology (2021, silicon photonics) → combined into FirstLight.
Sources: FMCW Lidar: Game-Changer, FirstLight on a Chip
Other Sensors
| Component | Details |
|---|---|
| Cameras | 7 cameras (forward stereo pair + 5 wide-angle), 1920x1200 px each, custom lenses, HDR capable |
| Imaging Radar | Custom with Continental; true 3D imaging; penetrates rain/fog/snow |
| Total sensors | 25+ providing overlapping 360° coverage |
| Sensor cleaning | Concentrated air + washer fluid nozzles, millisecond response |
Aurora Driver Computer
| Spec | Gen 1 (Current) | Gen 2 (Mid-2026) | Gen 3 (2027) |
|---|---|---|---|
| Compute | 5,400 TOPS | Enhanced | Dual NVIDIA DRIVE Thor (Blackwell) ~2,000+ TFLOPS |
| Manufacturer | Aurora (internal) | Fabrinet | AUMOVIO (formerly Continental) |
| Cost | Baseline | 50% reduction | Further 50% reduction |
| LiDAR range | 450m | 1,000m | 1,000m+ |
| Durability | ISO 16750-3 | 1M-mile life | Tens of thousands of trucks |
| Safety standard | — | — | ISO 26262 ASIL-D |
Redundancy: Dual-computer architecture (primary + specialized fallback), redundancy across 8 domains: steering, braking, communication, computation, power, energy storage, vehicle motion, cooling.
Networking: Custom TSN (Time-Sensitive Networking) switch synchronizing all sensors to microsecond precision.
Sources: Aurora Driver, Meet Fusion, NVIDIA Partnership
4. Mapping & Localization
Aurora Atlas — proprietary HD map with two layers:
- World Geometry: Sparse 3D point cloud from fleet sensor data (lightweight for OTA)
- Semantic Annotations: Lanes, signs, signals — some auto-generated by ML, rest manually annotated
Key Design Choices:
- Locally-consistent, sharded coordinate frames (not globally-consistent) — each shard ~ city-block-sized
- Cloud Atlas: Git-like versioned map database with concurrent editing, independent layer updates
- Map building requires only a single manual drive + cloud auto-annotation
- Updates pushed OTA in hours, not days
Localization: 6-DOF pose estimation by matching stored sparse geometry against live LiDAR scans. GPS-independent. Accuracy "much more precise than GPS."
Map Override: When construction creates mismatches, the system detects temporary lane lines in real-time and deviates from the Atlas.
Sources: The Atlas, Aurora Multi-Sensor Dataset
5. Simulation & Virtual Testing
Offline Executor — custom-built (not a game engine):
- Deterministic lock-step execution with autonomy software
- Same framework and libraries as production vehicle code
- Simulates latencies and compute delays
- Module-level testing: feeds synthesized inputs indistinguishable from real sensor data
Scale:
- 5-12 million tasks/day via the Batch API
- 22+ million equivalent miles/day in simulation
- Hundreds of thousands of concurrent vCPUs + thousands of GPUs on AWS
Batch API ("Aurora's Supercomputer") — written in Go:
| Component | Role |
|---|---|
| Gateway API | Stateless, HA, Protobuf-defined |
| Runner | Sharded stateful service managing execution across clusters |
| TaskHost | Lightweight binary deployed at runtime; zero-downtime updates |
DAGs up to 1 million tasks. ~8x cheaper per compute unit than Spark. Target: scale from 10M → 1 billion daily tasks.
Scenario Generation:
- Curated: Hand-crafted with nuanced validation criteria
- Procedural: Automated recipes generating 100,000+ scenarios in hours (team includes former Pixar VFX artists)
- Completed 2.27 million unprotected left turns in simulation before attempting one in reality
Sensor Simulation: Acquired Colrspace (Pixar veterans, 2022) — Protocolr uses neural networks for inverse rendering + differentiable image rendering. Simulates camera, conventional lidar, and FMCW lidar. Evaluated ~20,000 hardware design scenarios before manufacturing prototypes.
Four-Tier Testing Pyramid:
- Codebase Tests — Unit + integration tests; must pass before merge
- Perception Tests — Labeled real-world log data with ground-truth; broad metrics + specific capability tests
- Manual Driving Evaluations — Expert trajectories compared against Aurora Driver planned movements
- Simulations — Curated + procedural scenarios; disengagement events converted to virtual tests within days
Safety Validation Scale:
- ~10,000 requirements
- 4.5 million tests before each software release
- ~20,000 FMCW hardware design simulations before prototype manufacturing
Sources: The Offline Executor, Scaling Simulation, Batch API Part 1, Batch API Part 2, Batch API Part 3, Colrspace Acquisition, Virtual Testing
6. Software Architecture & Engineering
Custom Middleware (not ROS) — deterministic framework enabling lock-step sim execution. No ROS references anywhere in Aurora's public materials.
Languages:
| Language | Use |
|---|---|
| C++ | Core autonomy (perception, planning, control); C++ standards committee participation funded |
| Go | Backend services, Batch API, fleet management |
| Python | ML pipelines, data analysis, Kubeflow orchestration |
| JavaScript | Visualization (XVIZ, streetscape.gl) |
Build & CI/CD:
| Tool | Role |
|---|---|
| Bazel | Primary build system (monorepo) |
| Buildkite | CI/CD orchestration |
| GitHub | Source control, PR-triggered integration tests |
| Docker | Containerization |
| Terraform | Infrastructure-as-code on EKS |
ML Infrastructure:
| Tool | Role |
|---|---|
| PyTorch (primary), TensorFlow, JAX | ML training |
| CUDA / cuDNN / TensorRT | GPU compute, inference optimization |
| Kubeflow Pipelines | ML workflow orchestration on EKS |
| Amazon SageMaker | Distributed training |
| Amazon S3 / EMR | Data storage (petabytes) and processing |
Data Engine — cyclical ML lifecycle:
- Identify data requirements for a capability
- Mine and label on-road sensor data iteratively
- Train ML models on prepared datasets
- Evaluate at subsystem and system level
- Feed results back for next iteration
Continuous deployment of datasets and models within two weeks of new data availability.
Observability: Grafana Cloud across 30+ Kubernetes clusters (migrated from Chronosphere + Honeycomb).
Databases & Services:
| Technology | Purpose |
|---|---|
| PostgreSQL | Primary relational DB |
| Redis | Caching layer |
| gRPC | Service-to-service communication |
| Nginx | Web server / reverse proxy |
| Protobuf | API serialization |
5 Engineering Tenets:
- #LowFriction — Remove barriers to experimentation; faster iteration wins
- #AdviceNotGatekeeping — Collaborative code review; reviewers don't block outside expertise
- #CarryOurWater — All teams share tool-building responsibility; no separate research team
- #EndToEnd — Accept imperfect tools; build end-to-end before framework transformation
- #FirstClassLanguages — Company-wide support for multiple languages with proper tooling
Sources: Engineering Tenets, Data Engine, Grafana ObservabilityCON
7. Cloud Infrastructure (AWS — Exclusive Partner)
| AWS Service | Purpose |
|---|---|
| Amazon EKS | Kubernetes orchestration for sim + ML workloads |
| Amazon EC2 (P4d GPUs) | Driving simulations (100k+ concurrent vCPUs, 1000s GPUs) |
| Amazon SageMaker | Distributed ML model training |
| Amazon EMR | Petabyte-scale data processing |
| Amazon S3 | Sensor data storage (petabyte scale) |
| AWS Global Accelerator | Website network CDN |
| AWS Route 53 | DNS |
| AWS Direct Connect | High-bandwidth vehicle-to-cloud data transfer |
| Amazon DynamoDB | Metadata and operational data |
| AWS Athena + Glue | Analytics on S3 Parquet data |
Custom Batch Autoscaler replaced Kubernetes Cluster Autoscaler for heterogeneous workloads. Tracks EC2 spot/on-demand pricing, enforces $/hr spend-rate caps, uses direct EC2 Fleet API for faster scaling.
Placement API distributes team-level resource quotas across multiple clusters and AWS regions.
Source: Aurora Accelerates Development with AWS
8. Safety & Security
Safety Case Framework
First-ever for autonomous vehicles, using Goal Structuring Notation (GSN):
5 Principles:
- Proficient (G1) — Acceptably safe during nominal operation
- Fail-Safe (G2) — Safety maintained in presence of faults/failures
- Continuously Improving (G3) — All safety issues evaluated and resolved
- Resilient (G4) — Safe during foreseeable misuse and unavoidable events
- Trustworthy (G5) — Enterprise is trustworthy
- ~10,000 requirements, 4.5 million tests
- Publicly viewable at safetycaseframework.aurora.tech/gsn
- Independently audited by TUV SUD
Cybersecurity
- Zero Trust Architecture — continuous verification
- Cryptographic attestation via DICE protocols (libnat20 open-sourced)
- Custom intrusion detection integrated into FMS
- No remote vehicle control by design — remote assistants can only suggest
Sources: Safety Case 101, Cybersecurity blog
9. Fleet Operations & Cloud
| System | Function |
|---|---|
| Aurora Beacon | Cloud-based fleet mission control (dispatch, monitoring, remote support, OTA updates) |
| Aurora Cloud | Customer-facing API for autonomous vehicle integration |
| McLeod TMS Integration | Plugs into existing carrier fleet management |
OEM Partners: PACCAR (Peterbilt/Kenworth), Volvo (VNL Autonomous), International (LT Series), Toyota (Sienna ride-hail).
Operational Status (March 2026):
- 250,000+ driverless miles, zero Aurora Driver-attributed collisions
- 10 routes across Texas, New Mexico, Arizona
- 100% on-time performance
- Customers: Uber Freight, FedEx, Werner Enterprises, Hirschbach
- Target: 200+ trucks, $80M run rate by end of 2026
Sources: Aurora Triples Network, 250K Miles
10. Patent Portfolio
1,400+ patent assets (~50% hardware, ~50% software). Key areas:
| Area | Examples |
|---|---|
| FMCW LiDAR | 19 families covering beam steering, chirp ranging, transceiver design |
| Perception | Sensor dropout training, point cloud generation, deep learning for image analysis |
| Planning | Basis path generation (US11561548B2), joint trajectory distribution modeling |
| Mapping | Fleet learning + map encoding (US20200400821A1), training data from sensors |
| Security | DICE attestation (libnat20) |
Sources: GreyB Patent Analysis, Justia Patents
11. Open Source
| Repo | Lang | Stars | Description |
|---|---|---|---|
| xviz | JS | ~1,068 | Protocol for real-time autonomy data transfer/visualization |
| streetscape.gl | JS | ~977 | React + WebGL visualization for robotics data |
| au | C++ | ~413 | Zero-dependency C++14 physical units library |
| libnat20 | C++ | 5 | DICE attestation (hardware root-of-trust) |
| Kubeflow forks (3) | Python/Jsonnet | — | Customized ML orchestration |
12. Key Acquisitions
| Company | Year | Technology Gained |
|---|---|---|
| Blackmore Sensors | 2019 | FMCW LiDAR architecture, digital signal processing |
| Uber ATG | 2020 | ~1,200 engineers, perception/prediction research, road test data, patents |
| OURS Technology | 2021 | Silicon photonics, lidar-on-chip (10x size reduction) |
| Colrspace | 2022 | Sensor simulation, inverse rendering (Pixar veterans) |
Founding Team:
- Chris Urmson — ex-Google/Waymo CTO
- Sterling Anderson — ex-Tesla Autopilot head
- Drew Bagnell — ex-Uber ATG autonomy/perception head; 43,000+ citations; MaxEnt IRL, DAgger, LEARCH inventor
13. Comparison: Aurora vs Waymo
| Dimension | Aurora | Waymo |
|---|---|---|
| Primary Market | Long-haul trucking (L4) | Robotaxi / urban (L4) |
| LiDAR Type | FMCW (FirstLight, 1550nm) | ToF (6th-gen in-house) |
| LiDAR Range | >450m (next-gen: 1,000m) | Not disclosed at same detail |
| Instantaneous Velocity | Yes (Doppler from FMCW) | No (estimated from frames) |
| Architecture | Modular with interleaved forecasting | Modular (separate perception/planning) |
| Planning | Proposer-Ranker with MaxEnt IRL | End-to-end foundation model (2024+) |
| Simulation Engine | Custom Offline Executor (deterministic lock-step) | SimulationCity + generative World Model |
| Sim Scale | 5-12M tasks/day; 9B+ equivalent miles | 15B+ virtual miles; generative scenarios |
| Cloud | AWS-native (EKS, EC2, SageMaker, S3) | Google Cloud / TPU-native |
| Compute | 5,400 TOPS; moving to NVIDIA DRIVE Thor | Custom TPU-based |
| Key Advantage | Highway-speed long-range perception (400m+) | Largest operational fleet, urban expertise |
| Commercial Status | Launched driverless trucking May 2025 (Texas) | Operating robotaxis in multiple cities |
14. Key Engineering Blog Posts
| Post | Topic | URL |
|---|---|---|
| Software Engineering Tenets | 5 engineering principles | Link |
| The Offline Executor | Deterministic simulation architecture | Link |
| Virtual Testing | Testing pyramid | Link |
| Scaling Simulation | Procedural scenario generation | Link |
| Data Engine | ML infrastructure with Kubeflow/Bazel | Link |
| Batch API Part 1 | Supercomputer infrastructure | Link |
| Batch API Part 2 | Resource management | Link |
| Batch API Part 3 | Supercomputing at scale | Link |
| Seeing with Superhuman Clarity | Perception architecture | Link |
| Perception: No Measurement Left Behind | Multi-layer perception | Link |
| Continuous Real-Time Sensor Recalibration | Online calibration | Link |
| The Atlas | HD mapping system | Link |
| FMCW Lidar: Game-Changer | FirstLight technology | Link |
| FirstLight on a Chip | Lidar miniaturization | Link |
| AI Alignment | Safe and human-like driving | Link |
| Verifiable AI | Invariants + learned driving | Link |
| Forecasting Part 1 | Interleaved forecasting | Link |
| AI Transparency | GNNs, transformers, debugging | Link |
| Fault Management | FMS architecture | Link |
| Cybersecurity | Zero Trust, DICE attestation | Link |
| Safety Case 101 | GSN-based safety methodology | Link |
| Introducing Au | C++ units library | Link |
| Product Design from First Principles | Hardware-software co-design | Link |
| Meet Fusion | Next-gen hardware | Link |
| Construction Zones | Construction zone handling | Link |
| Emergency Vehicles | Emergency vehicle response | Link |
| Stormy Weather | Adverse weather driving | Link |
15. Corporate SaaS Tools (from DNS TXT Records)
| Tool | Category |
|---|---|
| Atlassian (Jira/Confluence) | Project management / documentation |
| Slack | Communication |
| 1Password | Secrets management |
| Monday.com | Project management |
| JetBrains | IDEs (likely CLion for C++) |
| Autodesk | CAD/engineering design |
| DocuSign | Contract signing |
| Zoom | Video conferencing |
| Cisco | Networking |
| Google Workspace | Primary email/productivity |
| Microsoft 365 | Secondary productivity suite |
| Mixpanel | Product analytics |
| OneTrust | Privacy/cookie compliance |
16. Website Tech Stack
| Layer | Technology |
|---|---|
| Framework | Nuxt 3 (Vue.js 3.5.13) |
| Build Tool | Vite (Rollup) |
| State | Pinia |
| CMS | Contentful (headless, GraphQL API) |
| Search | Algolia |
| 3D/WebGL | Three.js + custom GLSL shaders |
| Animation | GSAP v3.12.7 (ScrollTrigger, ScrollSmoother) |
| Video | Mux |
| Hosting | Netlify |
| CDN | AWS Global Accelerator |
| DNS | AWS Route 53 |
| Analytics | Google Tag Manager, HubSpot, Sentry, Mixpanel |
| Fonts | Google Fonts (Inter + IBM Plex Mono) |
| Built by | Dogstudio agency |
17. Coordinate Frame Architecture — Point Cloud Processing
Primary Processing Frame: Ego-Centric BEV
Aurora does not bind point clouds to the map frame. All perception processing happens in the ego-centric vehicle frame. This is confirmed in their Multi-View Fusion paper (WACV 2022): "each LiDAR sweep S_t at time t comprises LiDAR points represented by their (x,y,z) locations. Then, sweep S_t is voxelized in a BEV image centered on the SDV."
Aurora's "Euclidean View" is their internal name for ego-centric BEV. It is where LiDAR, RADAR, and HD Map data converge after transformation.
How Each Data Source Enters the Ego Frame
| Data Source | Native Frame | Transformation |
|---|---|---|
| LiDAR | 3D sensor frame (spherical) | Voxelized directly into ego-centric BEV grid (deltaL=0.16m, deltaW=0.16m, deltaV=0.2m); also projected to sensor-native Range View |
| Cameras | 2D image frame | Features extracted in image space, projected into 3D ego-centric BEV via camera intrinsics/extrinsics (K, R, t) |
| RADAR | 3D radar frame | Transformed into ego-centric 3D space via calibrated extrinsics |
| HD Map (Atlas) | Local map anchor frame(s) | Transformed using 6-DOF localization pose → rasterized as 7 binary mask channels into the same ego-centric BEV grid |
The map data comes to the ego frame, not the other way around. The Atlas is rasterized into the BEV grid as channel layers alongside sensor data.
Source: Multi-View Fusion (arXiv:2008.11901)
Why Not Map Frame?
Aurora's Atlas uses locally-consistent, sharded coordinate frames (each shard ~ city-block-sized, with its own anchor point) — there isn't even a single unified global map frame to project into. Aurora explicitly states: "globally accurate maps simply aren't necessary for self-driving since our localization system doesn't require global consistency."
Processing in ego frame is preferred because:
- Motion planning is native to Cartesian/BEV space — outputs feed directly to the planner
- Avoids map error propagation — localization inaccuracies affect map overlay, not fundamental LiDAR/camera processing
- Metric consistency — object sizes are constant regardless of range (unlike perspective view)
- Generalization — learned features don't depend on global position/orientation
Source: The Atlas
Temporal Multi-Sweep Fusion (LaserFlow)
When fusing multiple LiDAR sweeps over time, Aurora does not transform raw point clouds into a common frame. Instead, LaserFlow performs fusion in feature space in the most recent sweep's coordinate frame:
- Each sweep's features are extracted in its native Range View frame (sensor-centric spherical coordinates) using shared CNN weights
- Ego-motion encoding: Vehicle pose delta between sweeps is computed as
Delta_s = P_0 * P_s^(-1)and rotated to local azimuth alignment - Feature warping: 3D points are transformed to the current sweep's frame (
x' = P_0 * P_s^(-1) * x), then reprojected to Range View coordinates - A Feature Transformer Network receives ego-motion features concatenated with per-sweep features and learns to "undo the effect of ego-motion on the extracted features" — isolating object motion from vehicle motion
The key insight: naively transforming point clouds between Range View frames destroys information because spherical projections cause perspective changes and point occlusion. Feature-space fusion avoids this.
Source: LaserFlow (arXiv:2003.05982) — ICRA 2022 Best Paper
Three-Stage Pipeline by Frame
Stage 1: Sensor-to-Tensor
├── LiDAR + RADAR + HD Map ──→ Ego-centric BEV ("Euclidean View")
│ └── Map rasterized into BEV via 6-DOF localization pose
├── Cameras ──→ Image View (camera-native 2D)
└── LiDAR ──→ Range View (sensor-native spherical)
Stage 2: Feature Extraction (parallel CNNs)
├── Left path: BEV features (ego frame)
└── Right path: Range View + Image View features (sensor-native frames)
Stage 3: Final Fusion
└── Right path features projected INTO ego-centric BEV ──→ concatenated with left path ──→ final conv layers
└── Output: detections in ego-centric BEV frameEverything converges to ego-centric BEV at the end. The Range View and Image View are only used for feature extraction in their native frames — the final representation is always ego-centric.
S2A Tracker and Remainder Explainer Frames
- S2A (Sensor-to-Adjustment): Operates on local ego-centric crops centered on each tracked object, extracted from the ego-centric sensor representation. Runs thousands of iterations/second per object.
- Remainder Explainer: Operates on raw sensor returns in the same ego-centric frame as Mainline Perception — analyzes unexplained measurements and provides avoidance scores to motion planning.
Uber ATG Heritage — All Ego-Centric
| Paper | Frame |
|---|---|
| PIXOR | Ego-centric BEV (70m forward, 40m lateral) |
| PointPillars | Ego-vehicle coordinate frame (pillar-based BEV pseudo-image) |
| ContFuse | Projects LiDAR to ego-centric BEV, retrieves camera features via KNN + unprojection |
| LaserNet | Sensor-native Range View, outputs in ego-centric BEV |
No Aurora paper, blog post, or patent describes transforming point clouds into a global/map frame before perception processing.
SpotNet — Camera-Centric for Long Range
SpotNet (arXiv:2405.15843) operates primarily in the camera frame for long-range detection:
- 3D positions expressed relative to camera position/orientation
- LiDAR points projected into image space using z-buffering
- Object heading parameterized as bearing-relative: theta_k = phi_k - alpha_k
- 3D distance parameterized along camera ray (range invariant): delta_d = r * v
- Input: 5-channel RGB-D tensor (3 RGB + sparse LiDAR depth + binary sentinel)
- Computational cost scales O(1) with range (vs O(r^2) for BEV methods)
- Final NMS uses both 2D image-space NMS (IoU 0.5) and BEV NMS (IoU 0.2)
Source: SpotNet (arXiv:2405.15843)
18. Perception Stack Deep Dive — Segmentation, Object Representation, and Classification
Object Representation Format
Aurora's primary output is oriented 3D bounding boxes (cuboids), not polygon meshes or per-point segmentation masks. However, patent US10579063B2 explicitly mentions "bounding polygon or other shape" as an object footprint representation, suggesting polygonal representations may exist internally for certain use cases.
Detection Head Output Formats by Network
LaserNet (Range View, per-point):
- Class probabilities for C classes
- K=3 mixture modes per vehicle (K=1 for pedestrians/bikes) — each mode outputs: center offset (dx, dy), orientation (omega_x, omega_y), length, width
- Laplace distribution scales (uncertainty per mode)
- Mixture weights
- Fused via mean-shift clustering with inverse-variance weighting
- Height is fixed — all objects assumed on same ground plane
Source: LaserNet (CVPR 2019)
LaserFlow (Range View, per-point, temporal):
- Class, length, width
- 6 future waypoints over 3 seconds (0.5s intervals) as sequential displacement vectors
- Per-timestep heading as doubled-angle encoding: (cos 2omega, sin 2omega)
- Per-timestep Laplace uncertainty decomposed into along-track and cross-track
Source: LaserFlow (arXiv:2003.05982)
SpotNet (Image-centric, long-range):
- Three heads at H/2 x W/2: class, 2D box, 3D box
- 3D box: projected centroid offsets, distance along camera ray, extents (w, l, h), bearing-relative heading
- Detected classes: vehicles, VRU (pedestrians/cyclists), construction equipment, signs, traffic cones, traffic lights
Source: SpotNet (arXiv:2405.15843)
S2A Tracker (production system):
- Operates on local ego-centric sensor crops around each tracked object
- Outputs: refined position, velocity, orientation + estimated future positions
- Runs directly on raw lidar/radar/camera data, not just upstream detections
Source: Seeing with Superhuman Clarity
Per-Point Semantic Segmentation (LaserNet++)
Aurora does perform per-point LiDAR segmentation — confirmed in LaserNet++ (CVPRW 2019). Six classes:
| Class | IoU |
|---|---|
| Background | — |
| Road | 98.23% |
| Vehicle | — |
| Pedestrian | — |
| Bicycle | — |
| Motorcycle | — |
Runs jointly with detection on a shared backbone — single 1x1 conv output layer. The road segmentation effectively provides drivable surface estimation.
Source: LaserNet++ (arXiv:1904.11466)
Per-Point Instance Segmentation (OSIS — Remainder Explainer Predecessor)
OSIS (arXiv:1910.11296, Uber ATG) performs open-set instance segmentation:
- Category-agnostic embedding network learns per-point embeddings in BEV
- DBSCAN clustering groups points into instances — both known AND unknown categories
- BEV resolution: 0.15625m over 160x160x5m region
- Known: vehicle, pedestrian, motorbike
- Unknown: anything else — groups novel physical objects into separate instances without classification
This is almost certainly the research foundation for the Remainder Explainer.
Source: OSIS (arXiv:1910.11296)
Static vs Dynamic Object Separation
What Comes from the HD Map (Static — No Real-Time Detection Needed)
Aurora's Atlas provides 7 rasterized binary mask channels in BEV:
- Driving paths
- Crosswalks
- Lane boundaries
- Road boundaries
- Intersections
- Driveways
- Parking lots
Plus: 3D geometry of static scenery (buildings, vegetation, infrastructure), traffic signal positions, stop signs, speed limits.
Source: Multi-View Fusion (arXiv:2008.11901)
What Comes from Real-Time Perception (Dynamic + Novel Static)
| Category | Representation | Source Module |
|---|---|---|
| Vehicles, pedestrians, cyclists, motorcycles | Oriented 3D bounding boxes with velocity/trajectory | Mainline Perception (LaserNet/LaserFlow/SpotNet -> S2A tracker) |
| Unknown physical objects | Generic cuboids with avoidance score (gray in Lightbox viz) | Remainder Explainer |
| Construction cones/barrels | Individual detections -> aggregated into "blockage regions" (yellow wall in Lightbox) | Construction perception module |
| Temporary lane markings | Real-time camera-based lane detection | Overrides Atlas when detected |
| Road surface (drivable area) | Per-point semantic label ("Road" class) | LaserNet++ segmentation head |
FMCW LiDAR Static/Dynamic Separation (Patent US11550061B2)
Aurora's FMCW lidar gives instantaneous per-point Doppler velocity, enabling hardware-level static/dynamic classification:
- Points with zero radial velocity -> immobile (trees, signs, guardrails) -> can be excluded from trajectory planning
- Points with non-zero velocity -> mobile (vehicles, pedestrians, animals) -> full tracking pipeline
- Patent explicitly states: "can exclude object(s) having certain classification(s) indicative of non-moving objects, such as a tree classification"
This is a massive architectural advantage — ToF lidars can't do this without multi-frame tracking.
Source: US11550061B2
Construction Zone Pipeline
Sensor data
|
+-- Detect individual construction elements (cones, barrels, barriers, equipment)
| +-- SpotNet classifies: "traffic cones", "construction equipment" as distinct classes
|
+-- Aggregate into BLOCKAGE REGIONS
| +-- Individual detections merged into continuous barrier representation
| +-- Treated as equivalent to a SOLID WALL by the motion planner
| +-- Rendered as yellow wall in Lightbox visualization
|
+-- Detect temporary lane markings (camera-based)
| +-- When detected, OVERRIDES Atlas HD map lane geometry
| +-- Vehicle follows perceived lanes instead of mapped lanes
|
+-- Extended nudging capability
+-- Can drive partially on shoulder to avoid construction encroachment3,500+ miles of Texas construction zones driven through.
Sources: Tackling Construction, Eyes on the Roadmap
Ground Plane Estimation
MMF (Multi-Task Multi-Sensor Fusion, CVPR 2019):
- Ground height estimation as an auxiliary training task
- Estimated ground height subtracted from regression targets to improve 3D localization
- Used during training only — not required at inference
Patent US12051001B2 (Multi-Task Multi-Sensor Fusion):
- Includes a dedicated machine-learned mapping model that estimates ground geometry (road surface elevation) as a multi-task output alongside 3D/2D detection and depth completion
LaserNet limitation: Assumes flat ground plane. LaserNet++ reports misclassifying road points as background on steep roads — suggesting Aurora has since improved ground estimation for production.
Patent US10967862B2 (Road Anomaly Detection):
- Detects potholes, rail tracks, road cracks, depressions using IMU data signatures (accelerometer, gyroscope, magnetometer)
- Matches against a library of known road anomaly patterns
- Updates localization maps with labeled anomaly locations
Sources: MMF (CVPR 2019), US12051001B2, US10967862B2
Occupancy Representations
| Method | Type | Details |
|---|---|---|
| PIXOR BEV input | 3D voxel grid | 800x700x38, binary occupancy + intensity, 0.1m resolution |
| OSIS | BEV rasterization | 0.15625m resolution, 160x160x5m, 3D occupancy voxels |
| MP3 | Semantic occupancy flow | Per-BEV-cell motion vectors + probabilities — represents dynamic objects as flowing occupancy, not instance bounding boxes |
| Patent US11531346B2 | Lane-based occupancy cells | 1D spatial cells per lane, probability of object occupying each cell over prediction horizon, uses Graph CNNs |
| Patent US11521396B1 | Probabilistic occupancy distributions | Scene rasterization with probabilistic prediction |
| Multi-view fusion input | BEV voxels | LxWxV dimensions, T=10 historical sweeps stacked |
MP3's occupancy flow is particularly notable — it represents dynamic objects as per-cell probability distributions with motion vectors rather than tracked instances, which avoids the hard data association problem entirely.
Sources: PIXOR (CVPR 2018), MP3 (CVPR 2021), US11531346B2, US11521396B1
Full Object Class Taxonomy (Reconstructed)
Mainline Perception (confirmed classes):
- Vehicle (car, truck — likely sub-categorized)
- Motorcycle
- Pedestrian
- Cyclist / Bicycle
SpotNet extended classes:
- Construction equipment
- Signs
- Traffic cones
- Traffic lights
Semantic segmentation (LaserNet++):
- Background
- Road
- Vehicle
- Pedestrian
- Bicycle
- Motorcycle
Remainder Explainer (no classification — avoidance only):
- Generic unknown obstacle -> assigned avoidance score based on dimensions, material, motion
- Examples handled: charred trailer, dropped debris, detached tires, livestock
FMCW static classification (Patent US11550061B2):
- Trees
- Signs
- Other immobile objects (guardrails, barriers, buildings)
- Animals (classified as mobile)
The production taxonomy is almost certainly 10+ classes given SpotNet's granularity.
Lane Detection: Map vs Perception
From HD Map (default): The Atlas contains lane boundaries, lane segments (described relationally to predecessor/successor), and road markings. Cameras contribute "semantic details such as signage, lane markers, and critical actor attributes."
Real-time override for construction: When Aurora detects temporary lane markings that conflict with the Atlas, the perception system overrides the map. The vehicle deviates from Atlas-guided paths and follows perceived lane lines instead.
HDNET fallback (CoRL 2018): When HD maps are unavailable, a map prediction module estimates road layout on-the-fly from a single LiDAR sweep, predicting BEV road segmentation.
Sources: The Atlas, HDNET (CoRL 2018)
Spatio-Temporal Memory for Feature Detection (Patent US11217012B2)
Aurora's travel way feature identification system uses a non-parametric memory database:
- Receives 2D image data and 3D LiDAR data
- DNN produces per-pixel class probability estimates from camera images
- LiDAR points projected onto camera imagery assign classifications to 3D locations
- Memory database captures local space and time — the system remembers past observations, reinforces repeated detections, and forgets stale ones
- Includes occlusion reasoning via depth comparison across timesteps
- Used for lane markings, traffic cones, signs, and other travel way features
Source: US11217012B2
Key Perception Patents
| Patent | Title | Key Detail |
|---|---|---|
| US10310087B2 | Range-view LIDAR-based object detection | Per-cell class + instance center + bounding box in range view |
| US10579063B2 | ML for predicting locations of objects | "Bounding polygon or other shape" as object footprint; static object classifier |
| US11550061B2 | Phase coherent LIDAR classification | FMCW per-point velocity -> static/dynamic separation |
| US12051001B2 | Multi-task multi-sensor fusion | 3D bounding boxes + ground geometry estimation |
| US11531346B2 | Goal-directed occupancy prediction | Lane-based 1D occupancy cells with GCN |
| US11521396B1 | Probabilistic prediction of dynamic objects | Probabilistic occupancy distributions |
| US11217012B2 | Travel way feature identification | 2D/3D segmentation fusion with spatio-temporal memory |
| US10967862B2 | Road anomaly detection | IMU-based pothole/crack detection |
| US10489686B2 | Object detection for an autonomous vehicle | Pixel-level detection via background subtraction |
Key Published Papers on Object Representation
| Paper | Venue | Contribution |
|---|---|---|
| LaserNet | CVPR 2019 | Per-point multimodal mixture of oriented 3D boxes in range view |
| LaserNet++ | CVPRW 2019 | Per-point semantic segmentation (6 classes) + detection with camera fusion |
| LaserFlow | RA-L 2021 / ICRA 2022 | Joint detection + temporal displacement trajectories in range view |
| SpotNet | arXiv 2024 | Image-centric, LiDAR-anchored long-range 3D boxes with learned uncertainty |
| PIXOR | CVPR 2018 | BEV single-stage oriented bounding box detector |
| Fast and Furious | CVPR 2018 | Joint detection + tracking + forecasting with BEV temporal 3D convolutions |
| HDNET | CoRL 2018 | HD map road segmentation in BEV as detection prior + online map prediction |
| ContFuse | ECCV 2018 | Continuous convolution for LiDAR-camera fusion in BEV |
| MMF | CVPR 2019 | Ground estimation + depth completion as auxiliary tasks for 3D detection |
| PnPNet | CVPR 2020 | End-to-end perception + prediction with tracking in the loop |
| IntentNet | CoRL 2018 | Joint detection + intention + trajectory prediction from BEV LiDAR |
| MP3 | CVPR 2021 | Mapless driving with online BEV semantic mapping + occupancy flow |
| OSIS | arXiv 2019 | Open-set per-point instance segmentation for unknown objects |
| LiDARsim | CVPR 2020 | 3D mesh reconstruction of objects + environments for realistic simulation |
| Multi-View Fusion | WACV 2022 | BEV+RV+Camera fusion with 7-channel rasterized map input |
| RV-FuseNet | arXiv 2020 | Range-view temporal fusion for joint detection and trajectory estimation |
| MVFuseNet | CVPR 2021 Workshop | Multi-view fusion (RV + BEV) for joint detection and motion forecasting |
19. Non-ML Perception Technology Deep Dive
FMCW LiDAR Signal Processing (FirstLight)
Aurora's FMCW LiDAR originates from two acquisitions: Blackmore Sensors & Analytics (May 2019, Bozeman, MT) and OURS Technology (February 2021, Berkeley, CA).
Chirp Pattern and Waveform
Two principal chirp approaches from Aurora's patents:
Triangle chirp (sequential up/down): The laser frequency is linearly swept upward (up-chirp) then downward (down-chirp). The beat frequency during the up-chirp is f_beat_up = k*tau - f_Doppler and during the down-chirp is f_beat_down = k*tau + f_Doppler, where k is the chirp rate, tau is the round-trip delay, and f_Doppler is the Doppler shift. Solving simultaneously:
- Range:
R = c(f_beat_up + f_beat_down) / (4k) - Velocity:
v = lambda(f_beat_down - f_beat_up) / 4
Complementary simultaneous chirp (OURS Technology, Patent US20210096253A1): Two lasers paired — one with increasing frequency, one with decreasing frequency — firing simultaneously. A 90-degree optical hybrid separates the resulting beat frequencies by sign in the I/Q domain. This eliminates sequential up/down sweeps, reducing acquisition time by ~50%. Range is extracted as (f_up_beat - f_down_beat)/2 and velocity as (f_up_beat + f_down_beat)/2.
Beat Frequency Extraction and IF Signal
Per Blackmore patent US7742152B2, coherent detection uses a self-homodyne scheme: both transmitted signal and local oscillator (LO) are modulated by the identical chirp. The backscattered delayed signal u(t-Delta) beats against the current LO chirp u(t) at the photodetector, producing an intermediate frequency (IF) signal. The de-chirped signal yields a constant beat frequency: f_d = (f2-f1)*Delta/tau, directly mapping transit delay to frequency.
Tested parameters from the patent: frequency chirped from 100 MHz to 200 MHz within each 40 us pulse, producing a 2.5 MHz/us chirping rate, with pulse repetition rate of 8.7 kHz at ~35% duty cycle.
DSP Algorithm Chain
- Balanced photodetection: Balanced photodiode pairs (80 kHz to 650 MHz bandwidth per US10948600) mix the returned signal with the LO, producing I and Q electrical channels
- Transimpedance amplification (TIA): Converts photocurrent to voltage
- ADC digitization: Converts analog IF signal to digital samples. De-chirping in the optical domain compresses bandwidth so lower-rate ADCs suffice
- Digital Down Conversion (DDC): Shifts frequency content to baseband using LPF/HPF with cutoff at half the ADC sampling rate, with power comparison selecting the band with maximum energy
- Windowing: Applied to reduce spectral leakage (the first sidelobe of a rectangular window FFT sits at -13 dB)
- FFT for range: Transforms the IF signal from time domain to frequency domain; the peak bin position identifies beat frequency, hence range
- PSD computation for I+jQ: Complex-valued signal from I and Q channels undergoes power spectral density analysis to identify both positive and negative frequency peaks (per US20210096253A1)
- ST-CFAR (Simple Threshold Constant False Alarm Rate): Post-FFT interpolation between bins to achieve sub-bin resolution, improving range accuracy beyond theoretical FFT bin spacing
Coherent Detection vs Direct Detection: SNR Advantage
From patent US7742152B2, coherent detection SNR scales linearly with received signal power (1 dB/dB slope), while direct detection SNR scales quadratically (2 dB/dB slope). This yields approximately ~10 dB advantage at lower power levels. Coherent detection achieves shot-noise-limited performance, sensitive to single photons. The narrower receiver bandwidth (hundreds of MHz vs. >2 GHz for ToF) further reduces noise.
Phase Noise Mitigation
Patent US7742152B2 describes two methods:
- 3x3 fiber coupler method: Three outputs separated by 120-degree phase differences feed three photodetectors. DSP combines these to remove phase noise
- 90-degree I/Q method: Quadrature outputs allow signal extraction by squaring and adding photocurrents
Mirror Doppler Compensation (Patents US11262437B1, US11366200)
Scanning mirrors at high angular speeds (>5 kdeg/s) cause Doppler broadening. The patented solution:
- Sample L points around frequency peaks in both primary and reference signals (which undergo identical mirror Doppler broadening)
- Convolve the two sample sets
- The convolution peak at index Delta reveals the spectral shift; refine both peaks by Delta/2
This is a template-matching approach achieving blind equalization without explicit Doppler rate estimation.
Interference Rejection
FMCW inherently rejects interference because the coherent receiver only responds to light matching its own timing, frequency, and wavelength. Non-matching light is filtered out during heterodyne mixing.
Sources: US7742152B2, US20210096253A1, US20190317219A1, US20200400821A1, US11262437B1, US11366200, US10948600, Blackmore - LIDAR Magazine
Silicon Photonics Integration (OURS Technology)
The on-chip implementation integrates:
- Optical distribution network: Binary tree structure distributing frequency-modulated signals to parallel transceiver slices
- Coherent receiver (CR) units per transceiver slice
- On-chip optical antennas (surface grating couplers or edge couplers)
- 2x2 bidirectional splitters for light distribution
- Balanced 2x2 mixers or optical hybrids for coherent mixing
- Integrated photodiodes for optical-to-electrical conversion
- Mode field converters to shape beam divergence
A lens system at the focal plane creates collimated beams at different angles. A mechanical scanner deflects all beams together in coordinated raster patterns, with denser scanning at the center of the FOV for higher resolution.
Sources: US20210096253A1, FirstLight on a Chip
Point Cloud Preprocessing (Pre-ML)
Ego-Motion Compensation / Deskewing (Patent US20200400821A1)
FMCW-unique: because each point carries instantaneous radial velocity via Doppler, ego-motion can be estimated from a single sweep without IMU:
Translational velocity estimation:
- Collect raw points, each carrying (azimuth, inclination, range, radial_velocity)
- Solve for 3D translational velocity (vx, vy, vz) via least-squares minimization of residuals across all points
- Apply bidirectional scan averaging to cancel acceleration artifacts
- Detect transverse velocity discontinuities when beam wraps around FOV
Rotational velocity estimation:
- For N sensor origins at different lever arms from rotation center:
v_y(x) = v_y0 + omega_z * x - Least-squares fit extracts angular velocity (omega_x, omega_y, omega_z)
- PCA on parameterized velocity-vs-position identifies rotation vector and center of rotation
Point cloud correction:
- Remove sensor motion effects to project points to earth-fixed frame
- Distinguish stationary features via threshold filtering on corrected velocities
- Apply deskewing to align temporal sampling artifacts
This is entirely classical linear algebra and least-squares estimation — no ML involved.
Source: US20200400821A1
Range View Discretization
Aurora uses range view (spherical coordinate) discretization rather than BEV cartesian grids for primary LiDAR representation. The range view discretizes points in spherical coordinates, producing a dense rasterized grid native to the sensor's viewpoint. This avoids the sparsity problem BEV suffers at long range.
Source: Harnessing the Range View
Sensor Calibration
Pre-Mission Calibration (Classical)
Aurora uses checkerboard fiducials placed around the vehicle at terminals. The calibration system captures these from all cameras and LiDARs, then aligns and optimizes sensor data to find the relative pose of each fiducial for all camera-LiDAR pairs. Standard techniques: PnP (Perspective-n-Point) for camera-to-world and point-based registration for LiDAR-to-world.
Online Recalibration
Regresses pitch and yaw miscalibration as an auxiliary output of the existing long-range detection model:
- Operates in real-time (<100 ms)
- Average absolute error under 5% of injected noise values
- Extensible to full 6-DOF
- Uses aleatoric heteroscedastic uncertainty estimation to flag low-confidence calibration situations (motion blur, featureless scenes)
Post-Mission Validation
Generates calibration status reports that feed into a data validation system, filtering miscalibrated data before ingestion by the labeling engine.
Source: Continuous Real-Time Sensor Recalibration
TSN (Time-Sensitive Networking) and Sensor Synchronization
Aurora's custom networking switch is the backbone of the Aurora Driver computer. It uses an advanced networking chip with next-generation, high-bandwidth automotive physical layers to:
- Move data between nodes efficiently
- Duplicate data packets for redundancy
- Provide redundant pathways
- Synchronize sensors to microsecond precision
TSN is an IEEE 802.1 standard suite providing deterministic, low-latency communication. Aurora's implementation stitches all sensors and peripheral devices into one common hub with a common time reference, enabling tight temporal alignment of LiDAR sweeps, camera frames, radar returns, and IMU measurements.
The Aurora computer uses enterprise-class server architecture with processors designed for ML acceleration and camera signal processing. It conditions and distributes its own power, coordinates and synchronizes its own sensors, communicates with the vehicle over a simple umbilical, and communicates with transportation networks over a common network.
Source: Product Design from First Principles
Localization Against the Atlas
D-LOAM (Doppler LIDAR Odometry and Mapping — Patent US20200400821A1)
The Blackmore/Aurora vehicle odometry patent describes D-LOAM that uses the ego-motion-corrected point cloud for:
- Feature extraction from stationary objects (identified via Doppler velocity thresholding)
- Correlation with previously mapped features
- 6-DOF pose estimation (3 translational + 3 rotational)
- Continuous map updates and vehicle localization
This is a classical SLAM-like approach enhanced by FMCW's direct velocity measurements, which allow immediate separation of moving vs. stationary features — something ToF LiDAR cannot do without multi-frame tracking.
Atlas Matching
Aurora states localization "determines the vehicle's position in all 6 degrees of freedom relative to the Atlas by matching up stored geometry data with what our sensors are 'seeing' in real time." The stored world geometry is a sparse 3D representation, lightweight for OTA updates.
While Aurora does not publicly name the specific algorithm, the description (6-DOF pose from matching live sensor data against stored sparse 3D geometry) is consistent with point-to-plane ICP or NDT (Normal Distributions Transform) scan matching.
Sources: US20200400821A1, The Atlas
Tracking and State Estimation (Classical Components)
Extended Kalman Filter (EKF)
Aurora's job postings for Senior Perception Software Engineer - Tracking explicitly require: "design, implement, evaluate, and improve state estimation algorithms (such as EKF) for tracking objects around the AV" and demand "extensive experience in state estimation, Kalman Filter implementation, and 3D object tracking."
This confirms EKF is a core component of Aurora's tracking pipeline, operating alongside/within the S2A neural network tracker.
Velocity Estimation (Patent US20190317219A1)
The phase-coherent LiDAR perception patent describes classical velocity aggregation methods selected adaptively based on object class and data quality:
- Mean aggregation: Simple average of per-point velocities (used for pedestrians)
- Median aggregation: Robust central tendency
- Histogram analysis: Bin velocities by range, select modal bin (used for vehicles)
- Technique selection depends on object classification, quantity of data points, and object proximity
Sources: Senior Perception Engineer - Tracking Job Posting, US20190317219A1
Radar Signal Processing
Aurora develops custom imaging radar with "far greater range and resolution than traditional automotive radar," producing "true and precise 3D images" through rain, dense fog, and snow.
While Aurora has not published detailed radar DSP specs, their imaging radar necessarily uses standard FMCW radar DSP:
- MIMO virtual array for angular resolution enhancement
- Range FFT (fast-time) and Doppler FFT (slow-time) to produce range-Doppler maps
- Beamforming (digital or analog) for angle estimation
- CFAR detection on the range-Doppler-angle cube
The perception system dynamically shifts reliance toward imaging radar during dust storms or fog where optical sensors are degraded.
Source: Product Design from First Principles
Camera ISP and Preprocessing
Aurora uses custom-designed camera modules with Aurora-designed lenses and rolling shutter imagers selected for improved resolution and SNR. The cameras are "high-dynamic-range" capable of seeing through sun glare.
Standard ISP operations that necessarily occur before any ML model:
- Debayering (demosaicing the Bayer-pattern raw data)
- HDR tone mapping (for their high-dynamic-range cameras)
- Lens distortion correction (given custom lens designs)
- Auto-exposure and auto-white-balance
- Rolling shutter correction (they specifically note rolling shutter imagers)
Camera signal processing is handled by the Aurora computer's dedicated processors.
Aurora has an onboard sensor-cleaning system that preserves optical quality — concentrated air + washer fluid nozzles with millisecond response.
Sources: Product Design from First Principles, Superhuman Clarity
Post-Processing: NMS and Clustering
Per patent US20190317219A1, after ML detection, Aurora uses classical post-processing:
- Nearest neighbor and/or other clustering techniques to determine portions of data that likely correspond to a single classification
- Spatial region aggregation where class probability exceeds threshold to form cohesive object subgroups
- Standard NMS variants for final bounding box selection
Source: US20190317219A1
Monopulse Super-Resolution (Classical)
Patent US20190317219A1 describes a monopulse FMCW LiDAR enhancement using dual or triple receivers at different positions. Classical super-resolution techniques improve angular accuracy beyond beam width:
- Frequency domain methods: Fourier-based shift/aliasing properties
- Spatial domain methods: Maximum Likelihood, Maximum a Posteriori (MAP), Projection Onto Convex Sets (POCS), Bayesian treatments
These are all classical signal processing / computational imaging techniques, not ML-based.
Source: US20190317219A1
Fault Detection (Non-ML)
Aurora's Fault Management System (FMS) uses a three-part architecture:
- Detection: Each component "constantly reports diagnostic health checks to the other components." The FMS "actively monitors the health of the vehicle, including the self-driving software, sensors, and onboard computer"
- Diagnosis: Evaluates fault severity and determines impact on safe driving capability
- Response: Triggers mitigation strategies — reduced speed, hazard light activation, safe pullover
Demonstrated on public highways with fault injection testing, including simulated sensor and compute loss. A backup compute pathway handles complete primary system failure.
Sensor data consistency checking occurs implicitly through the Remainder Explainer, which flags sensor returns that cannot be explained by any known object — anomalous readings could indicate sensor degradation.
Sources: Fault Management, Fault Management Part 2
Sensor Fusion at the Data Level (Pre-ML)
The pre-ML temporal and spatial alignment of asynchronous sensors relies on:
- TSN microsecond synchronization for temporal alignment
- Extrinsic calibration transforms for spatial registration (SE(3) rigid-body)
- Pose interpolation using IMU data to align sensors firing at different times within a sweep
- Coordinate transforms between sensor frames
Source: ThinkAutonomous: Aurora Deep Learning Sensor Fusion
Key Non-ML Specifications
| Parameter | Value | Source |
|---|---|---|
| FirstLight Range | >450m (current), >1000m (next-gen) | Aurora blog |
| Wavelength | ~1550 nm (eye-safe) | Blackmore, Aurora blog |
| Beam Divergence | <0.01 degrees | Blackmore/LIDAR Magazine |
| Velocity Resolution | +/-0.25 m/s across +/-100 m/s spread | Patent US20200400821A1 |
| Online Calibration Latency | <100 ms | Aurora blog |
| Online Calibration Error | <5% of injected noise | Aurora blog |
| TSN Sync Precision | Microsecond | Aurora blog |
| FMCW Patent Families | ~36 (109 worldwide patents) | GreyB analysis |
| Blackmore Patents | ~58% of Aurora portfolio | GreyB analysis |
| Silicon Photonics Production Target | 2027 | Aurora blog |
What Remains Proprietary / Undisclosed
Aurora explicitly guards several critical implementation details:
- Specific localization algorithm (ICP variant, NDT, or hybrid) for Atlas matching
- Specific data association algorithm for multi-object tracking (Hungarian, JPDA, GNN)
- Specific Kalman filter variant beyond "EKF" (UKF? factor graph? iSAM2?)
- Specific CFAR variant used in LiDAR and radar detection
- Specific windowing function (Hanning, Hamming, Kaiser) used in FFT processing
- Specific IMU-LiDAR tight coupling method for localization
- Camera ISP pipeline details (debayering algorithm, tone mapping curve)
- Occupancy grid implementation (if used — not confirmed for production)
- Radar beamforming architecture and MIMO configuration
- NMS variant used in post-processing