Mobileye Perception Stack: Exhaustive Deep Dive

Last updated: March 2026

True Redundancy Architecture
EyeQ Chip Perception Accelerators
Camera-Only Perception Subsystem
ViDAR (Vision as Virtual LiDAR)
Road Experience Management (REM)
LiDAR/Radar Subsystem
Free Space Detection
Lane Detection
Traffic Light and Sign Detection
Pedestrian/VRU Detection
Object Detection Architecture
Object Tracking
Semantic Segmentation
Occupancy Prediction
Depth Estimation
Temporal Processing
RSS Integration with Perception
VLSA (Vision Language Safety Agent)
Model Efficiency
Calibration
Key Patents
Key Publications
Perception Evolution: EyeQ1 through EyeQ Ultra

1. True Redundancy Architecture

Philosophical Foundation

Mobileye's True Redundancy is a fundamental architectural divergence from the industry-standard sensor fusion paradigm. Rather than fusing multiple sensor modalities into a single unified world model -- the approach used by Waymo, Cruise, and most other AV developers -- Mobileye maintains two fully independent perception subsystems, each of which must independently achieve safety-level perception.

The rationale is rooted in validation complexity. A fused perception system creates a single pipeline whose failure modes are combinatorially complex: sensor interactions, calibration drift, and cross-modal ambiguities compound into a validation burden that Mobileye estimates requires hundreds of millions of hours of real-world driving data. With True Redundancy, each independent channel requires only tens of thousands of hours of validation. The probability of both independent subsystems failing simultaneously on the same hazard is the product of their individual failure rates -- a dramatically smaller number than the single-system failure rate.

The Two Subsystems

Camera-Only Subsystem

Processes data from up to 13 cameras (seven 8MP, four to six 2MP depending on platform)
Must independently detect all road users (vehicles, pedestrians, cyclists, animals, debris), lanes, drivable paths, traffic signs, traffic signals, and road geometry
Runs entirely on EyeQ silicon using deep learning perception
Produces a complete, standalone world model including object lists with position, velocity, classification, and predicted trajectories
The camera subsystem serves as the "backbone" of the production AV

Radar/LiDAR Subsystem

Processes data from imaging radar (proprietary 4D imaging radar SoC) and, on the Drive platform, LiDAR sensors
Must also independently detect all critical road users and road geometry with zero camera input
Builds its own complete world model: object lists, drivable space, hazard map
Serves as a "diversified safety backup" that provides a significantly higher mean time between failures (MTBF)

Algorithmic Redundancy Within Subsystems

Mobileye implements a further layer of redundancy within each subsystem. Within the camera channel, critical perception functions are developed using multiple independent algorithmic principles:

Vehicle detection: Performed both via pattern recognition (monocular deep learning) and via triangulation (multi-camera stereo geometry). These are fundamentally different mathematical approaches -- one is appearance-based, the other is geometry-based.
Depth estimation: Computed both from monocular depth networks and from multi-view stereo/structure-from-motion across temporally adjacent frames.
Free space: Estimated both from semantic segmentation (per-pixel classification) and from geometric methods (ground plane estimation, structure from motion).

This means the camera subsystem itself has internal redundancy: even within a single sensor modality, Mobileye develops multiple diverse algorithms for each safety-critical function.

Cross-Check Mechanism

The two subsystems do not fuse at the perception level. Instead, each independently filters its world model through Mobileye's RSS (Responsibility-Sensitive Safety) framework. The planning layer receives two independent RSS-filtered world models. The system follows the more conservative response when the two models disagree -- if either channel detects a hazard, the vehicle acts on it.

Mobileye maintains this independence rigorously in development: the company operates two separate developmental AV fleets -- one driving camera-only (no radar or LiDAR installed), and another driving radar/LiDAR-only (cameras physically covered or absent). Each fleet must demonstrate safe autonomous operation independently.

Primary-Guardian-Fallback (PGF) Architecture

Building on True Redundancy, Mobileye's safety architecture uses a layered decision-making model called Primary-Guardian-Fallback:

Primary: The main self-driving system generates a trajectory. It has full access to the camera subsystem's world model and REM maps.
Guardian: An independent monitoring layer evaluates the Primary's planned trajectory against RSS safety constraints and perception data from both subsystems. The Guardian can veto any trajectory it deems unsafe.
Fallback: A separate self-driving system capable of generating its own trajectory. If the Guardian rejects the Primary's plan, the system switches to the Fallback trajectory.

PGF ensures that a failure requires at least two independent subsystems to fail simultaneously. With three different sensor types (camera, radar, LiDAR) covering the same field of view, the probability of undetected failure drops to negligible levels.

2. EyeQ Chip Perception Accelerators

Heterogeneous Architecture Philosophy

Mobileye's EyeQ chips achieve their extreme power efficiency (5--40W for full autonomous perception) through a heterogeneous computing architecture specifically co-designed with Mobileye's perception software. Rather than using general-purpose GPUs (which waste power on capabilities unnecessary for automotive perception), each EyeQ contains four classes of proprietary accelerators, each optimized for a specific family of algorithms.

The Four Accelerator Classes

XNN -- Deep Learning Accelerator

The XNN is Mobileye's dedicated CNN/deep learning inference engine. It is purpose-built to execute convolutional neural networks, transformer attention computations, and dense matrix operations at maximum throughput per watt.

Architecture: Systolic-array-like dataflow architecture optimized for matrix multiply-accumulate operations
Precision: Int8 inference with support for quantization-aware training (QAT) and post-training quantization (PTQ) to maintain accuracy
EyeQ5: Contains dedicated XNN cores capable of 24 TOPS peak deep learning performance
EyeQ Ultra: Contains 16 XNN CNN accelerator cores, providing the bulk of the 176 TOPS rating
Workload: All CNN inference -- object detection, semantic segmentation, depth estimation, lane detection, traffic sign classification. The XNN handles the heaviest computational load in the perception pipeline.

PMA -- Programmable Macro Array (CGRA)

The PMA is a coarse-grained reconfigurable architecture (CGRA) -- a dataflow machine with a unique architecture that delivers outstanding performance for dense, regular computational patterns.

Architecture: An array of processing elements connected by a configurable interconnect, programmed at the macro-operation level rather than individual instructions
EyeQ Ultra: Contains 8 PMA cores
Workload: Dense computer vision algorithms (feature extraction, image preprocessing, filtering), deep learning operations that map well onto dataflow architectures. PMA cores can also run deep neural networks "extremely efficiently," providing low-level support for any dense-resolution sensor (cameras, next-generation LiDARs, and radars)

VMP -- Vector Microcode Processor (SIMD/VLIW)

The VMP is a vector processor combining single-instruction multiple-data (SIMD) parallelism with very-long-instruction-word (VLIW) scheduling for maximum throughput on vectorizable workloads.

Architecture: SIMD VLIW machine -- multiple operations packed into a single wide instruction word, each operating on vectors of data simultaneously
Workload: FFT (Fast Fourier Transform) processing required for radar and ultrasonic sensor data; signal processing chains; image filtering operations where the computation pattern is regular but requires precise programmatic control. The VMP is particularly important for the radar/LiDAR subsystem.

MPC -- Multi-Thread Processor Cluster

The MPC is a cluster of barrel-threaded CPU cores designed for high-throughput, latency-tolerant workloads.

Architecture: Barrel-threaded (hardware multi-threaded) cores that can switch between threads on every cycle, hiding memory latency and keeping execution units busy
Workload: Control flow-heavy algorithms, tracking, state management, data marshaling between accelerators, and perception tasks that require complex conditional logic rather than pure data parallelism

Workload Distribution Across Accelerators

The perception pipeline is decomposed across accelerators based on computational characteristics:

Raw Camera Frames
    |
    v
[ISP] -- Image Signal Processing (dedicated hardware block)
    |
    v
[PMA] -- Preprocessing: image normalization, feature pyramid construction
    |
    v
[XNN] -- CNN inference: object detection, segmentation, depth estimation
    |
    v
[MPC] -- Post-processing: NMS, tracking state updates, constellation formation
    |
    v
[VMP] -- Radar/LiDAR signal processing (FFT, beamforming)
    |
    v
[CPU] -- Planning integration, RSS computation, trajectory generation

This heterogeneous design means EyeQ achieves its performance targets with far fewer raw TOPS than competing solutions: the EyeQ Ultra's 176 TOPS delivers the equivalent perception capability of systems rated at 500+ TOPS on general-purpose architectures, because every transistor is tuned for its assigned workload.

EyeQ Ultra Silicon Specifications

Attribute	Specification
Process node	5 nm (TSMC)
CPU cores	12 dual-threaded RISC-V cores (first EyeQ to abandon MIPS ISA)
Accelerator cores	64 total: 16 XNN, 8 PMA (CGRA), additional VMP and MPC cores
GPU	Arm GPU, up to 256 GFLOPS
VPU	Vision Processing Unit for traditional CV operations
ISP	Integrated Image Signal Processor
Video encoding	H.264/H.265 encoding cores
DL performance	176 TOPS (int8)
Power envelope	< 100 W
Target	Single-chip L4 AV-on-chip: processes both camera and radar/LiDAR subsystems, central computing, HD map, and driving policy

EyeQ6H Silicon Specifications

Attribute	Specification
Process node	7 nm
CPU/threads	8 cores / 32 threads
DL performance	34 DL TOPS (int8)
Power	~6.25 W (25% more than EyeQ5H for 3x the compute)
Built-in blocks	Dedicated ISP, GPU, video encoder
Benchmark	> 1,000 FPS on pixel-labeling neural networks (vs. 91 FPS on EyeQ5)
Efficiency	ResNet-50 single-stream latency: 0.5 ms (vs. 1.64 ms for Jetson AGX Orin at 275 TOPS)

EyeQ5 Silicon Specifications

Attribute	Specification
Process node	7 nm FinFET (TSMC, 15 metal layers)
CPU cores	8 multithreaded
Vision processor cores	18 next-generation
Accelerator classes	XNN, PMA, VMP, MPC
DL performance	Up to 24 TOPS (peak); 12 TOPS typical; EyeQ5M variant: 4.6 DL TOPS int8
Power	< 5 W (typical operating)
Sensors	Fuses up to 20 sensors
Co-design partner	STMicroelectronics (design); TSMC (fabrication)

3. Camera-Only Perception Subsystem

Overview

The camera-only perception subsystem is the heart of Mobileye's autonomy stack -- it is the one subsystem that is present across every product tier from basic L0 ADAS (one camera, one EyeQ4) to full L4 Drive (13 cameras, 4x EyeQ6H). The entire subsystem runs exclusively on camera data processed by EyeQ chips, with no radar or LiDAR input.

Camera Array Configuration

SuperVision (L2++):

7 x 8MP cameras: front narrow, front wide, 2x side-front, 2x side-rear, rear
4 x 2MP cameras: short-range surround (parking/close proximity)
Total: 11 cameras, 360-degree coverage

Drive (L4, ID.Buzz AD):

Up to 13 cameras with overlapping fields of view for stereo triangulation

All cameras are high-dynamic-range CMOS sensors operational in low light, high glare, direct sun, and tunnel transitions.

Perception Pipeline Architecture

The camera perception pipeline runs at approximately 10 Hz (10 full perception cycles per second) and produces a comprehensive scene understanding through multiple specialized neural networks running in parallel on EyeQ accelerators.

Raw Camera Images (11-13 streams)
    |
    v
+------------------+
| Image Signal     |  -- Debayering, HDR tone mapping, noise reduction,
| Processing (ISP) |     lens distortion correction
+------------------+
    |
    v
+------------------+
| Feature          |  -- Multi-scale backbone (shared across tasks)
| Extraction       |     Produces feature pyramid at multiple resolutions
+------------------+
    |
    +--------+--------+--------+--------+--------+--------+
    |        |        |        |        |        |        |
    v        v        v        v        v        v        v
[Object]  [Lane]  [Free]  [Depth] [Seg]  [TSR]  [TFL]
[Detect]  [Det.]  [Space] [Est.]  [ment] [Sign] [Light]
    |        |        |        |        |        |        |
    v        v        v        v        v        v        v
+-----------------------------------------------------------+
| Post-Processing & Association                              |
| -- NMS, multi-camera projection, temporal fusion,         |
|    track management, 3D lifting                           |
+-----------------------------------------------------------+
    |
    v
+-----------------------------------------------------------+
| World Model Construction                                   |
| -- Object list (position, velocity, class, uncertainty)   |
| -- Road model (lanes, boundaries, drivable area)          |
| -- Infrastructure (signs, signals, construction zones)    |
+-----------------------------------------------------------+
    |
    v
[RSS Safety Layer] --> [Planning/Policy]

Multi-Scale Feature Extraction

Mobileye uses a shared backbone network that produces a feature pyramid at multiple spatial resolutions. This enables:

High-resolution features (1/4 or 1/8 of input): Capture fine-grained details needed for small/distant object detection, lane marking localization, and sign text recognition
Mid-resolution features (1/16): Balanced detail and context for standard vehicle/pedestrian detection
Low-resolution features (1/32 or 1/64): Large receptive field for scene-level understanding, road type classification, and weather/lighting estimation

The shared backbone amortizes compute cost -- feature extraction is performed once, and multiple task-specific heads consume features from the pyramid at appropriate scales.

2D Detection to 3D Lifting

Camera perception inherently operates in 2D image space. Mobileye employs multiple techniques to lift detections into 3D world coordinates:

Monocular 3D estimation: Neural networks trained to predict 3D bounding box parameters (depth, height, width, orientation) directly from 2D image features, using learned priors about object sizes and road geometry
Multi-camera triangulation: Overlapping camera fields of view allow stereo-like triangulation to compute metric depth for objects visible in multiple cameras
Temporal structure from motion: Camera motion between frames (known from ego-motion estimation) creates parallax that enables depth computation through multi-view geometry
ViDAR point cloud generation: Deep networks predict dense 3D point clouds from camera images (see Section 4)
Ground plane projection: For objects assumed to rest on the road surface, 2D bounding box bottom edges combined with camera calibration yield metric distance estimates

Road Model Generation

The camera subsystem constructs a comprehensive road model:

Lane geometry: Represented as polynomial curves (3rd or 4th degree) in bird's-eye-view coordinates, with lane type classification (solid, dashed, double, Botts' dots)
Lane topology: Graph structure capturing merges, splits, and intersection connectivity
Road edges: Curbs, barriers, guardrails, and soft road boundaries (grass, gravel)
Drivable path: The full free-space region where the vehicle can safely travel
Road surface: Estimated curvature, slope, bank angle, and surface condition

Outputs

The camera-only subsystem produces a complete, self-contained world model sufficient for autonomous driving:

Output Category	Specifics
Object list	3D position, velocity, acceleration, heading, classification (vehicle, pedestrian, cyclist, motorcycle, animal, debris), dimensions, uncertainty bounds
Lane model	Per-lane polynomial geometry, lane type, lane graph topology, merge/split points
Free space	Per-direction free-space boundary with boundary classification (curb, vehicle, barrier, unknown)
Traffic infrastructure	Sign detections with classification, signal detections with state (red/yellow/green/arrow), construction zone boundaries
Road model	Road geometry (curvature, slope, bank), road type classification, speed limit inference
Depth map	Dense per-pixel depth estimates (ViDAR)
Semantic map	Per-pixel semantic labels for entire scene

4. ViDAR (Vision as Virtual LiDAR)

Concept

ViDAR (Vision Detection and Ranging, or Visual LiDAR) is Mobileye's technique for generating LiDAR-like 3D point clouds from camera data alone. The fundamental idea: if a deep neural network can learn the mapping from 2D camera images to 3D point cloud representations, then the camera subsystem can reason about 3D geometry with the same data format as a LiDAR -- enabling a single set of downstream algorithms (path planning, obstacle avoidance, free-space estimation) to operate on either real LiDAR data or camera-derived virtual LiDAR data.

Architecture

ViDAR uses multiple views of the environment (both spatial -- from multiple cameras -- and temporal -- from vehicle motion) to produce dense depth estimates that are converted into a 3D point cloud.

The architecture broadly follows a three-stage pipeline:

History Encoder: A visual backbone encodes multiple camera images (potentially from multiple timesteps) into a bird's-eye-view (BEV) feature representation. This step lifts 2D features into a 3D-aware latent space.
Latent Rendering Operator: A novel module that models the 3D geometric latent space and bridges the encoder to the decoder. This operator converts the BEV features into a form suitable for 3D point prediction by reasoning about geometric relationships in latent space.
Point Cloud Decoder: Predicts dense 3D point clouds at target locations, generating the virtual LiDAR output. In the forecasting variant, an auto-regressive transformer iteratively predicts future point clouds for arbitrary timestamps.

Training Methodology

ViDAR networks are trained on paired camera + LiDAR data:

Data: Driving sequences where camera images and simultaneous real LiDAR point clouds are captured
Supervision: The real LiDAR point cloud serves as ground truth; the network learns to predict 3D structure from camera images alone
Loss: Geometric losses (chamfer distance, depth error) ensure the predicted point cloud matches the real LiDAR spatially
Augmentation: The training framework can also leverage "visual point cloud forecasting" -- predicting future point clouds from historical visual inputs -- which provides simultaneous supervision of semantics, 3D structure, and temporal dynamics

Accuracy vs. Real LiDAR

ViDAR does not fully match real LiDAR in raw point density or range precision, but it achieves sufficient accuracy for safe driving decisions:

At close range (0--30 m): Depth accuracy is high due to strong stereo/motion cues
At medium range (30--100 m): Accuracy degrades gracefully; the system compensates through learned priors about object size and road geometry
At long range (100+ m): ViDAR produces sparser, less certain estimates, but imaging radar fills this gap in the True Redundancy architecture

ViDAR has been validated on four downstream perception tasks: 3D object detection, semantic occupancy prediction, map segmentation, and multi-object tracking. Notably, ViDAR using solely Image-LiDAR pre-training sequences outperforms supervised 3D detection pre-training on these benchmarks.

Role in Production

In Mobileye's SuperVision system (camera-only), ViDAR provides the 3D perception backbone that allows the camera subsystem to maintain spatial awareness comparable to vehicles equipped with active sensors. The ViDAR output feeds directly into free-space estimation, path planning, and RSS computations.

For the Chauffeur and Drive platforms (which also have radar/LiDAR), ViDAR serves as the camera subsystem's 3D backbone, while real LiDAR/radar point clouds feed the independent active-sensor subsystem. The two 3D representations are never fused -- they are compared only at the planning layer, maintaining True Redundancy.

5. Road Experience Management (REM)

Architecture: Three-Layer System

REM is built on three layers that form a closed loop between millions of production vehicles and Mobileye's cloud infrastructure:

Layer 1: Harvesting Agents

Every Mobileye-equipped ADAS vehicle on the road (150+ million as of 2025) is a potential harvesting agent. The onboard EyeQ chip performs real-time analysis of camera feeds and extracts:

Driving path geometry: The trajectory the vehicle follows, encoded as a 3D spline curve
Stationary landmarks: Traffic signs (including text, colors, shape), lane markings, lane markings directional arrows, stop lines, traffic signals, guardrails, barriers, poles, road edges
Road surface features: Line representations of lane markings and road edges
Road signature profiles: Elevation and terrain characteristics
Traffic patterns: Dynamic observations of typical speeds, congestion, behavior

The critical innovation is on-device compression: all of this information is compressed into Road Segment Data (RSD) packets at approximately 10 KB per kilometer -- roughly the size of a plain text email. This is achieved through Mobileye's real-time geometrical and semantic analysis algorithms, which extract only the essential map-relevant information and discard raw imagery. The data is fully anonymized before transmission.

Layer 2: Map Aggregating Server (Cloud)

Road Segment Data from millions of vehicles is uploaded to Mobileye's cloud infrastructure (running on AWS with 500,000 peak CPU cores, Apache Spark on Amazon EKS). The cloud pipeline:

Aggregates multiple drives over the same road segment from different vehicles and time periods
Aligns overlapping data using landmark detection and matching
Reconciles conflicting observations through statistical fusion
Detects changes (new construction, removed signs, changed lane markings) through sophisticated change-detection algorithms
Builds/updates the Mobileye Roadbook -- the output HD map

Map generation is fully automated -- Mobileye demonstrated mapping all of Japan (25,000 km of roads) in 24 hours at the push of a button, with the entire map occupying only 400 MB (~16 KB/km).

Layer 3: Map-Consuming Agents

Autonomous and semi-autonomous vehicles receive the Roadbook via OTA (over-the-air) updates and use it for:

Centimeter-accurate localization: The vehicle detects landmarks stored in the Roadbook using its cameras, then computes its precise position relative to those landmarks. This achieves 5-10 cm localization accuracy -- far superior to GPS alone
Path planning: Lane-level geometry and topology provide the reference trajectory
Anticipation: The map tells the vehicle what to expect ahead (upcoming curves, intersections, speed changes), enabling smoother driving
Change detection feedback: If the vehicle observes a discrepancy between the Roadbook and reality (e.g., a sign has been removed), it reports this back to the cloud, closing the loop

Roadbook Contents

Category	Data Stored
Lane geometry	3D polynomial representations of each lane boundary and center
Lane topology	Graph structure: merge/split points, connectivity through intersections
Road signs	Position, type, text content, colors, relevancy to each lane
Traffic signals	Position, type, number of lights, applicable lanes
Road boundaries	Curbs, barriers, guardrails with 3D positions
Landmarks	Distinctive features used for localization (poles, sign posts, specific markings)
Driving culture	Typical speeds, common driving behaviors, traffic patterns per time of day
Road surface	Curvature, slope, bank angle, elevation profile
Stop/yield lines	Position and type
Directional arrows	Lane-level directional information

Scale and Update Cadence

Harvesting fleet: 150+ million vehicles equipped with Mobileye technology
Data harvested in 2024: 29.6 billion miles
Update frequency: Near real-time for high-traffic areas; change-detection algorithms trigger updates when discrepancies are detected
Coverage: Global, with particularly dense coverage in Europe, North America, and East Asia
Map freshness: Measured in hours/days for heavily traveled roads, not months or years

Compression: 10 KB/km

The 10 KB/km figure is achieved because REM transmits semantic abstractions rather than raw sensor data:

Instead of transmitting images or point clouds (megabytes per frame), the harvesting agent extracts the geometric and semantic essence: "there is a speed limit sign reading 50 at position (x, y, z) relative to the lane center"
Landmarks are stored as compact geometric descriptors (position, type, orientation) rather than image patches
Driving paths are stored as polynomial curves (a few coefficients) rather than dense point sequences
Traffic patterns are statistical summaries (average speed, variance) rather than raw trajectory logs

6. LiDAR/Radar Subsystem

Imaging Radar

Mobileye has developed a proprietary software-defined imaging radar from the ground up -- silicon, algorithms, and signal processing -- designed to serve as an independent perception channel that operates without any camera input.

Hardware Architecture

Specification	Value
Processing SoC	Proprietary radar processor, 11 TOPS compute
RFIC	Custom Mobileye-designed radar RF integrated circuits
BSR (forward-facing) virtual channels	> 1,500 at 20 FPS
BSRC (corner-mounted) virtual channels	> 300
Angular resolution	< 0.5 degrees
Sidelobe suppression	-40 dBc (vs. typical -20 to -30 dBc in automotive radars)
Dynamic range	100 dB (vs. ~60 dB in conventional automotive radars)
Frame rate	20 Hz
Manufacturing partner	WNC (collaboration for production)

The massive antenna array combined with Mobileye-designed RFICs allows exceptional flexibility in signal transmission and the ability to receive and sample the entire radar signal in a wide bandwidth while keeping noise at a low level. Mobileye achieved a 12x resolution increase without a proportional compute increase through algorithmic innovation in signal processing.

Detection Capabilities

Detection Target	Range
Road users (pedestrians, motorcycles, cyclists)	Up to 315 m
Potential hazards (debris, tire on road)	Up to 230 m
Vehicles	Up to 300+ m
Operating speed	Reliable up to 130 km/h

The imaging radar's key differentiator over conventional automotive radar is its ability to provide height (elevation) information as an additional dimension beyond traditional 2D angular and range data, producing a "rich point cloud" that enables:

Exact lane assignment of detected objects
Distinction between overhead signs/bridges and road-level obstacles
Detection of stationary vehicles under bridges (a notorious failure mode of conventional radar)
Detection of a child at 150 m when a bus is only 10 m away (demonstrating exceptional dynamic range)

Independent Perception

The imaging radar is designed to function as a complete, independent perception system aligned with True Redundancy principles. It:

Detects both dynamic and static objects independently of cameras or LiDAR
Generates point clouds dense enough for object classification and tracking
Operates reliably in adverse conditions (rain, fog, dust, direct sunlight, darkness) where cameras degrade
Has been selected by a global automaker for SAE Level 3 eyes-off driving applications (announced May 2025)

LiDAR Strategy

Former FMCW LiDAR Program (Ended September 2024)

Mobileye had been developing a proprietary FMCW (Frequency-Modulated Continuous-Wave) LiDAR-on-a-chip:

4D measurements: position (x, y, z) + instantaneous velocity
Range: up to 300 m
Point density: 600 points per degree
Pulse rate: 2 million laser pulses per second
Advantage over ToF LiDAR: Direct velocity measurement and operation at lower, safer power levels

The ~100-person LiDAR division was shut down in September 2024 after advances in camera perception (EyeQ6-based) and imaging radar performance made proprietary FMCW LiDAR "less essential."

Current LiDAR Approach

Drive (L4): Uses third-party LiDARs from Innoviz Technologies
Chauffeur (L3): Front-facing LiDAR from Luminar (e.g., Polestar 4 integration announced)
ID.Buzz AD configuration: 9 LiDARs + 5 radars + 13 cameras = 27 total sensors

Radar/LiDAR Subsystem Perception Pipeline

The active-sensor subsystem runs its own independent perception pipeline:

Radar signal processing: Raw antenna data --> FFT (on VMP accelerators) --> beamforming --> detection --> point cloud generation
LiDAR point cloud processing: Raw returns --> filtering --> ground plane removal --> clustering
Object detection: 3D object detection on radar/LiDAR point clouds, independent of any camera data
Tracking: Multi-object tracking using radar/LiDAR detections only
Free space estimation: Drivable area determined from point cloud density and ground plane analysis
World model: Complete object list + drivable space map, fed independently to RSS

7. Free Space Detection

Definition

Free space detection determines the drivable area surrounding the vehicle -- the region of road surface where the vehicle can safely travel without collision. It is one of the most safety-critical perception outputs: errors in free space estimation directly translate to either unnecessary stopping (false obstacles) or collisions (missed boundaries).

Camera-Based Free Space Methods

Mobileye employs multiple complementary algorithms for free space detection from cameras:

Semantic Segmentation Approach

A fully convolutional neural network classifies every pixel in the camera image into semantic categories. The "road surface" class defines the drivable area. Pixels classified as non-road (vehicles, pedestrians, curbs, barriers, vegetation) define the free-space boundary.

Advantages: Dense, per-pixel output; can handle arbitrary road shapes and complex intersections
Challenges: Requires robust classification at boundaries; errors at road edges are most safety-critical

Top-View Free Space Method

A dedicated algorithm projects camera features into a bird's-eye-view (BEV) representation and estimates the free-space boundary as a polar function around the ego vehicle -- for each angular direction, the algorithm determines the distance to the nearest obstacle or road boundary.

This method operates similarly to how a LiDAR system would determine free space: for each radial direction, find the first non-traversable point.

Column-Wise Boundary Regression

Rather than classifying every pixel, this approach treats each image column as a 1D classification problem: for each column, predict the row (height) at which the free-space boundary lies. This is computationally cheaper than full semantic segmentation and produces smooth, continuous boundary curves.

The boundary is further classified by type:

Traversable boundaries: Lane markings (can be crossed if needed)
Non-traversable boundaries: Curbs, barriers, walls, vehicles
Specific classes: Pedestrian, vehicle, curb, barrier, unknown

Geometric Methods

Structure-from-motion and ground plane estimation provide geometric free-space cues:

Points that are consistent with the estimated ground plane model are classified as road surface
Points that deviate from the ground plane (elevated or depressed) indicate obstacles or boundaries

Boundary Classification

Mobileye's free space system doesn't just detect the boundary -- it classifies what forms the boundary. This classification is safety-critical: the proper response to a curb boundary differs from the proper response to a pedestrian boundary.

Boundary Type	Example	Response Implication
Vehicle	Parked car, moving vehicle	Full stop or lane change
Pedestrian	Person standing at curb	Maximum caution, full stop
Curb	Raised curb edge	Traversable in emergency; normally a hard boundary
Barrier	Jersey barrier, guardrail	Non-traversable
Vegetation	Grass, bushes	Soft boundary; traversable in emergency
Construction	Cones, temporary barriers	Non-traversable
Unknown	Unclassified obstacle	Treated conservatively

Fusion of Free Space Methods

The multiple free-space algorithms produce overlapping estimates. Post-processing fuses these into a single free-space map using conservative logic: if any algorithm detects a boundary, it is included in the final free-space map. This "union of hazards" approach ensures maximum safety.

Snow and Edge Cases

Mobileye holds patents covering free-space detection in challenging conditions, including:

Identifying probable road edge locations when lane markings are obscured by snow
Using road geometry history from the REM Roadbook to infer boundaries when visual cues are absent
Leveraging tire tracks in snow as implicit lane marking substitutes

8. Lane Detection

Mobileye's Lane Detection Heritage

Lane detection was one of Mobileye's founding capabilities -- the EyeQ1 (2008) shipped with lane departure warning (LDW) as a core function. Over nearly two decades, Mobileye has evolved from simple edge-detection-based lane finding to neural-network-based 3D lane detection with full topological understanding.

Classical Lane Model

In earlier EyeQ generations and the Mobileye 560/630 aftermarket systems, lanes were represented as polynomial curves:

Model: Third-degree polynomial in the vehicle's coordinate system: y = a0 + a1x + a2x^2 + a3*x^3
Parameters detected: Lane width, curvature, curvature derivative, heading angle, lateral offset
Output: Coefficients for left and right lane boundaries, lane width, confidence level

This polynomial representation is compact and computationally efficient, suitable for highway driving where lanes follow smooth, predictable curves.

3D-LaneNet: Neural 3D Lane Detection

In 2019, Mobileye researchers Noa Garnett, Rafi Cohen, Tomer Pe'er, Roee Lahav, and Dan Levi published "3D-LaneNet" (ICCV 2019), introducing the first end-to-end neural network for 3D multiple lane detection from a single image.

Key Innovations

Intra-Network IPM (Inverse Perspective Mapping): The network includes an internal projection layer that converts image-view features into top-view (BEV) features, enabling dual-representation information flow. The network processes features in both views simultaneously, allowing image-view features to capture appearance details while top-view features capture spatial relationships.
Anchor-Based Lane Representation: Rather than detecting lane pixels and then fitting curves, the network uses an "anchor-per-column" output representation that directly predicts lane positions. This replaces heuristic post-processing (clustering, outlier rejection) with a learned end-to-end pipeline, casting lane estimation as an object detection problem.
3D Prediction: The network directly predicts lanes in 3D -- not just 2D image coordinates, but 3D road coordinates including elevation variations (uphills, overpasses, road undulations). This was the first approach to predict 3D lane layouts from monocular images without assuming constant lane width or flat road surfaces.

Follow-up: 3D-LaneNet+

Mobileye researchers subsequently published "3D-LaneNet+: Anchor Free Lane Detection using a Semi-Local Representation" (2020), which improved on the original by removing the need for predefined anchors and using a semi-local representation that better handles complex lane topologies.

Lane Graph Topology

Modern Mobileye systems go beyond individual lane boundary detection to construct a full lane graph:

Lane connectivity: Which lanes connect to which at intersections, merges, and splits
Lane type: Driving lane, turn lane, bus lane, bike lane, shoulder
Lane direction: Direction of travel for each lane
Lane boundaries: Type (solid, dashed, double solid, Botts' dots) and color (white, yellow, blue)
Lane changes: Where lane changes are permitted or prohibited

Intersection Handling

Intersections present unique challenges because lane markings often disappear or become ambiguous. Mobileye addresses this through:

REM Roadbook: The HD map provides pre-mapped intersection topology, including the connection routes between entrance and exit lanes
Implicit lane inference: When markings are absent, the system infers lane structure from road geometry, curb positions, and vehicle trajectories observed in the Roadbook
Traffic signal association: Determining which signal controls which lane at complex intersections

Lane Detection Without Lane Lines

At CES 2025, Mobileye indicated that its advanced systems no longer rely solely on painted lane lines for autonomous driving. The REM Roadbook provides lane-level geometry independent of paint markings, and the perception system can determine lane structure from:

Road edges and curbs
Vehicle behavior patterns from the Roadbook
Road geometry (width, curvature)
Traffic infrastructure (signals, signs, islands)

9. Traffic Light and Sign Detection

Traffic Sign Recognition (TSR)

Traffic sign recognition was one of Mobileye's original ADAS capabilities, shipping on the EyeQ1 in 2008. It has evolved into a comprehensive, multi-layered recognition system.

Detection and Classification Pipeline

Detection: A CNN-based detector identifies candidate sign regions in the camera image, handling varying sizes (from distant highway signs to nearby street signs), partial occlusion, and diverse environmental conditions (rain, glare, night)
Classification: Detected sign regions are classified into specific sign types through a multi-network pipeline:
- Shape-based classification: Geometric shape (circle, triangle, octagon, diamond, rectangle) provides initial categorization
- Appearance-based classification: CNN classifiers determine the specific sign meaning (speed limit value, warning type, regulatory instruction)
- OCR (Optical Character Recognition): For signs containing text (speed limit numbers, street names, variable message signs), OCR networks read the text content
- Signature-based classification: A novel Mobileye approach where signs are matched against a database of visual "signatures." This allows the system to be updated post-deployment when new sign types are introduced -- the vehicle receives OTA updates with new sign signatures without requiring full model retraining
Relevancy determination: A critical innovation -- the system determines which lane each sign applies to. A speed limit sign on a highway offramp should not affect the main lanes. The relevancy algorithm considers:
- Sign position relative to lane boundaries
- Sign orientation and facing direction
- Sign mounting height and type (overhead, post-mounted, pole-mounted)
- Road geometry context

Intelligent Speed Assist (ISA)

Mobileye launched the world's first camera-only Intelligent Speed Assist system certified across all 27 EU countries plus Norway, Switzerland, and Turkey, meeting the EU General Safety Regulation (GSR) mandate effective July 2022 for new models and July 2024 for existing production lines.

The ISA system integrates five technologies:

Technology	Function
Traffic Sign Recognition	Identifies explicit speed limit signs
Implicit Sign Recognition	Detects signs indicating speed changes (school zones, construction, highway entrances/exits)
Traffic Sign Relevancy	Determines which lane a sign applies to
Signature-Based Classification	Future-proof recognition of new sign types via OTA updates
Road-Type Classification	Deep neural network determines road type (highway, urban, rural) to infer applicable speed limit

The system was validated by six independent labs across five European countries, achieving >90% correct speed limit determination across total distance and >80% per road type (exceeding EU minimums).

Training Data

The TSR system is trained on "tens of millions of video clips, encompassing an enormous variety of parameters and features of roadways from around the world," covering diverse sign designs across different countries and continents.

Traffic Light Detection

Traffic light detection follows a similar pipeline with additional challenges:

Detection: Locate traffic signal heads in the image, distinguishing them from other light sources (tail lights, streetlights, reflections)
State classification: Determine the active state: red, yellow, green, flashing, arrow (and arrow direction)
Association: Determine which signal controls which approach lane -- a particularly difficult problem at complex intersections with multiple signal heads
Relevancy: Confirm the signal applies to the ego vehicle's current path

The REM Roadbook pre-maps traffic signal locations, types, and lane associations, providing a strong prior that helps resolve ambiguities in real-time detection.

10. Pedestrian/VRU Detection

ADAS Heritage

Pedestrian detection is one of Mobileye's deepest areas of expertise. The EyeQ2 (2010) introduced pedestrian detection as a key capability, enabling automatic emergency braking (AEB) for pedestrian collision avoidance. Today, approximately 125-150 million vehicles worldwide use Mobileye's pedestrian detection technology.

The Insurance Institute for Highway Safety (IIHS) found that AEB can reduce the rate of pedestrian collisions by over a quarter. Mobileye estimates that if vehicles involved in 2020 pedestrian collisions had been equipped with AEB, approximately 1,700 lives could have been saved in the United States alone.

Multi-Method Detection Architecture

Mobileye employs multiple parallel detection algorithms operating simultaneously on camera feeds, with each method designed to catch cases the others might miss:

Classic Pattern Recognition

The foundational method: a trained classifier (CNN-based) identifies and classifies objects and road users based on learned visual patterns. The network has been trained on millions of annotated pedestrian examples across diverse conditions (day, night, rain, partial occlusion, varying clothing, various poses).

Full Image Detection

For larger objects in close proximity to the vehicle, a wide-area detection method covers the full image frame rather than scanning local regions. This catches pedestrians who are very close (and therefore very large in the image) or who are at unusual positions (e.g., crossing from behind a parked vehicle).

Segmentation Method

A semantic segmentation network labels individual pixels and groups of pixels to identify smaller elements in the driving environment, including pedestrians and cyclists. This method excels at detecting partially occluded pedestrians (only legs visible under a vehicle, only upper body visible behind a bollard).

Top-View Free Space Method

Distinguishes road users from the road surface itself in the bird's-eye-view representation. A pedestrian standing on the road creates a distinct free-space boundary that this method detects even when the person is at an unusual angle or partially blended with the background.

Wheel Detection

A specialized algorithm that classifies vehicles by identifying their wheels. While primarily for vehicle detection, it also contributes to distinguishing between pedestrians and vehicles in ambiguous situations.

ViDAR-Based 3D Detection

The ViDAR system creates a 3D point cloud from camera data. Pedestrians appear as distinctive 3D clusters in this representation, providing an additional detection channel that is fundamentally different from 2D image-based methods.

Specialized VRU Detection

Beyond standard pedestrian detection, Mobileye has developed dedicated algorithms for:

Baby strollers: Detecting strollers even when the pushing adult is not visible
Wheelchairs: Recognizing wheelchair users with their distinctive profile
Open car doors: Detecting when a parked car's door is opening into the path of travel
Cyclists: Dedicated bicycle and e-scooter detection
Motorcyclists: Detection at extended ranges

Behavioral Prediction

Mobileye's pedestrian detection goes beyond mere detection to behavioral analysis:

Orientation estimation: Determining which direction a pedestrian is facing
Posture analysis: Standing, walking, running, crouching, bending
Gesture recognition: Detecting hand signals, waving, pointing
Intent prediction: Using orientation, posture, and trajectory history to predict whether a pedestrian is likely to step into the road

Euro NCAP Compliance

Mobileye's pedestrian detection supports OEM compliance with Euro NCAP VRU (Vulnerable Road User) testing protocols:

AEB Pedestrian scenarios test detection and braking from speeds of 10 km/h upward
Day and night scenarios are required
The system must detect pedestrians crossing from both sides, including partially occluded scenarios
Recent protocols require detection of cyclists and other VRUs

11. Object Detection Architecture

Design Philosophy

Mobileye uses multiple specialized neural network architectures rather than a single monolithic model. Each network is optimized for its specific task and the computational profile of EyeQ accelerators.

Multi-Scale Detection

Object detection operates across the feature pyramid produced by the shared backbone:

Small/distant objects (pedestrians at 100+ m, small debris): Detected at high-resolution feature levels where spatial detail is preserved
Medium objects (vehicles at 20-80 m): Detected at mid-resolution levels balancing detail and context
Large/close objects (adjacent vehicles, trucks): Detected at low-resolution levels where large receptive fields capture the full object

Detection Head Architecture

Mobileye's detection heads follow a region-based or anchor-based paradigm (similar in spirit to CenterPoint-style detection), with outputs including:

2D bounding box: Location in image coordinates
3D bounding box parameters: Depth, 3D dimensions (height, width, length), orientation angle
Classification: Multi-class output (car, truck, bus, motorcycle, bicycle, pedestrian, animal, debris, cone, barrier)
Confidence score: Per-class probability
Attributes: Brake lights, turn signals, hazard lights (for vehicles); orientation, posture (for pedestrians)

Teacher-Student Training Framework

Mobileye employs a teacher-student knowledge distillation framework:

Teacher network: A large, computationally expensive model is trained on the full dataset using high-precision (FP32) computation, potentially running offline on cloud GPUs
Automatic Ground Truth (Auto GT): Advanced sensors and offline processing generate accurate representations of driving scenarios automatically, reducing manual labeling dependency
Student network: A smaller, efficient model is trained to mimic the teacher's outputs while being tailored for specific EyeQ hardware configurations (EyeQ4 vs. EyeQ6H vs. EyeQ Ultra)

This approach enables Mobileye to deploy different-complexity models across different product tiers while maintaining accuracy: the EyeQ4 student model is simpler but benefits from knowledge distilled from a much more capable teacher.

Quantization

All inference models are quantized for Int8 execution on XNN accelerators:

Quantization-Aware Training (QAT): During training, fake quantization nodes simulate the precision loss of Int8, allowing the network to adapt its weights to maintain accuracy under quantization
Post-Training Quantization (PTQ): For faster deployment, models can be quantized after training with calibration on representative data
EyeQ6H is specifically "designed for Int8 computation" -- the hardware is optimized for this precision level

12. Object Tracking

Camera-Only Tracking

In the camera subsystem, object tracking maintains consistent identity and trajectory estimation for detected objects across frames:

Association

Each detection in a new frame is matched to an existing track using:

Position prediction: Kalman filter or similar state estimator predicts where each tracked object should appear in the new frame
Appearance matching: Feature vectors extracted by the detection network are compared between tracks and new detections
3D motion consistency: The predicted 3D trajectory must be physically plausible

State Estimation

For each tracked object, the tracker maintains:

3D position (x, y, z) in ego-vehicle coordinates
3D velocity (vx, vy, vz)
3D acceleration estimates
Heading angle and yaw rate
Object dimensions
Classification history
Confidence/track quality

Multi-Camera Tracking

With 11-13 cameras providing 360-degree coverage, objects transition between camera views as they move around the vehicle. Mobileye's tracker must:

Re-identify objects when they move from one camera's field of view to another
Maintain continuous tracks through camera transitions
Handle temporary occlusion (object briefly hidden behind another vehicle, pole, etc.)
Manage track initiation (new objects entering the scene) and termination (objects leaving)

Radar/LiDAR-Augmented Tracking

In the radar/LiDAR subsystem (Chauffeur and Drive platforms), tracking operates independently on active-sensor detections:

Radar provides direct radial velocity measurements for each detection, greatly simplifying velocity estimation
LiDAR provides precise 3D point clouds that enable robust association even when objects are closely spaced
The active-sensor tracker produces its own independent track list, which is compared to (but not fused with) the camera tracker's output at the planning layer

Temporal Consistency

The tracker enforces temporal consistency constraints:

Objects cannot teleport -- position changes must be physically plausible given estimated velocity
Classifications should be stable -- a vehicle should not suddenly become a pedestrian (classification history is used to smooth transient misclassifications)
Dimensions should be consistent -- an object's estimated size should not vary wildly between frames

13. Semantic Segmentation

Per-Pixel Classification

Mobileye's semantic segmentation network assigns a class label to every pixel in each camera image. On the EyeQ6H, this runs at over 1,000 FPS -- enabling multiple segmentation passes per perception cycle, or the use of higher-resolution inputs.

Segmentation Classes

Mobileye's segmentation network covers a comprehensive set of urban and highway driving classes:

Category	Classes
Road surface	Drivable road, non-drivable road, crosswalk, speed bump
Lane markings	Solid line, dashed line, double line, Botts' dots, stop line, directional arrow
Vehicles	Car, truck, bus, motorcycle, bicycle
VRUs	Pedestrian, cyclist, rider
Infrastructure	Curb, sidewalk, barrier, guardrail, fence, wall
Vegetation	Tree, bush, grass
Traffic elements	Traffic sign, traffic light, pole, utility structure
Construction	Cone, temporary barrier, construction zone
Sky	Sky region
Terrain	Unpaved ground, gravel, dirt

Architecture

The segmentation network uses a fully convolutional architecture (no fully connected layers) that:

Consumes multi-scale features from the shared backbone
Produces a dense prediction at the target resolution
Can operate at multiple resolutions depending on the task (full-resolution for lane markings, lower resolution for coarse scene understanding)

The EyeQ6H's achievement of >1,000 FPS on pixel-labeling networks (compared to 91 FPS on EyeQ5) represents more than a 10x improvement, enabling denser and more frequent segmentation than previous generations.

Applications

Semantic segmentation feeds into multiple downstream tasks:

Free space estimation: The road surface class defines drivable area
Lane detection: Lane marking classes provide input to lane geometry estimation
Object detection validation: Segmentation provides a complementary signal to bounding-box detection
Scene understanding: Overall scene composition (urban, highway, rural, parking lot) is inferred from the distribution of semantic classes

Point Density Improvement

The EyeQ6H improves pixel segmentation capabilities through a dynamic neural network architecture that achieves more than double the point density compared to the EyeQ4M era. This means finer-grained boundary detection, more precise lane marking localization, and better separation of closely spaced objects.

14. Occupancy Prediction

Mobileye's Approach to 3D Scene Representation

While Tesla popularized the "occupancy network" concept at AI Day 2022, Mobileye's approach to 3D scene representation predates this and takes a different architectural path aligned with Compound AI principles.

Camera-Based 3D Occupancy

Mobileye's perception stack constructs a 3D understanding of the scene through multiple complementary representations:

ViDAR Point Clouds: Dense 3D point cloud predictions from camera images (see Section 4) provide explicit 3D occupancy information -- regions with predicted points are occupied, regions without are free
BEV Feature Maps: Multi-camera features are lifted into bird's-eye-view space, where occupancy can be inferred from feature density and classification
Semantic Voxels: The combination of depth estimation and semantic segmentation produces a semantically labeled 3D voxel representation of the scene

Free Space as Inverse Occupancy

Mobileye's free space detection (Section 7) is effectively the complement of occupancy prediction -- it determines where the vehicle can drive, which is equivalent to determining where obstacles are not. The multiple free-space algorithms together produce a comprehensive occupancy map.

RSS-Compatible Occupancy

A critical distinction in Mobileye's approach: the occupancy representation must be compatible with RSS computations. RSS requires:

Discrete object representations (position, velocity, dimensions) for each road user
Well-defined geometric relationships between the ego vehicle and each object
Conservative uncertainty handling (absence of detection does not mean the space is clear)

This means Mobileye's occupancy representation goes beyond generic voxel grids to maintain object-level semantics necessary for formal safety reasoning.

Integration with REM

The REM Roadbook provides a static occupancy prior -- the map tells the system where permanent structures (buildings, barriers, poles) are located. The real-time perception system then detects dynamic occupancy (vehicles, pedestrians, temporary obstacles) on top of this static layer, enabling efficient change detection and reducing the compute burden of full 3D reconstruction.

15. Depth Estimation

Multi-Method Depth Estimation

Mobileye employs multiple complementary methods for estimating depth (distance to objects and surfaces) from cameras:

Monocular Depth Estimation

A deep neural network predicts per-pixel depth from a single camera image. The network learns depth cues from:

Perspective: Objects appear smaller with distance
Occlusion: Closer objects occlude farther ones
Texture gradients: Road texture density increases with distance
Known object sizes: Learned priors about typical sizes of cars, people, signs
Atmospheric perspective: Distant objects have reduced contrast

Mobileye holds patents on joint depth estimation and semantic segmentation from a single image (see Section 21), where the system simultaneously predicts both depth and semantic labels for each pixel, with the two tasks mutually reinforcing each other.

Multi-Camera Stereo

With 11-13 cameras on SuperVision/Drive, overlapping fields of view between adjacent cameras create stereo pairs:

Front narrow + front wide cameras provide stereo depth for the forward direction
Side-front + side-rear cameras provide stereo depth for lateral regions
The geometric baseline between cameras is known from vehicle design, enabling triangulated depth computation

This approach provides metric depth without learned priors -- it is purely geometric, making it a valuable redundant signal alongside monocular estimation.

Structure from Motion (SfM)

As the vehicle moves, each camera captures the scene from slightly different viewpoints across successive frames. This temporal parallax enables depth estimation through structure from motion:

Ego-motion is estimated from visual odometry, IMU, and wheel speed sensors
Feature correspondences between frames are triangulated using the known ego-motion
This produces a sparse 3D point cloud that is densified through interpolation

SfM is particularly effective for static scene elements (road surface, buildings, infrastructure) and provides an independent depth estimate that does not rely on learned networks.

ViDAR Dense Depth

The ViDAR system (Section 4) produces the densest depth representation, predicting a full 3D point cloud from camera images. This combines the advantages of monocular depth learning (dense predictions, works for dynamic objects) with geometric reasoning (multi-view consistency).

Depth Fusion

The multiple depth estimates are fused to produce a robust, comprehensive depth map:

Monocular depth provides dense initialization everywhere in the image
Stereo depth provides precise metric calibration where overlapping cameras exist
SfM provides temporal validation for static scene elements
ViDAR integrates all cues into a unified 3D representation

Patent: Estimating Distance Using Monocular Camera Sequences (US8164628B2)

Filed by Mobileye Technologies Ltd in 2006, this foundational patent covers estimating distance to an object using a sequence of images recorded by a monocular camera. The method uses the changing apparent size and position of objects across frames, combined with known camera motion, to compute metric distances.

16. Temporal Processing

Multi-Frame Aggregation

Single-frame perception is inherently noisy and incomplete. Mobileye's perception stack aggregates information across multiple frames to improve accuracy, consistency, and completeness.

Feature-Level Temporal Fusion

Multi-frame features are aggregated in BEV space:

Features from consecutive frames are warped to compensate for ego-motion (using estimated vehicle odometry)
Warped features are aggregated through learned temporal fusion modules (e.g., deformable convolution, attention mechanisms)
The aggregated features contain temporal context that individual frames lack: motion information, occluded region filling, and noise reduction

Detection-Level Temporal Smoothing

Object detections are smoothed across frames through the tracking system:

Transient false detections are suppressed (an object must be detected consistently across multiple frames to be confirmed)
Transient missed detections are bridged (tracking maintains an object's predicted state through brief occlusion)
Classification is stabilized (the most frequent classification over recent frames is used)

Map-Level Temporal Accumulation

The REM system provides the ultimate temporal accumulation: information from millions of vehicles over months and years is aggregated into the Roadbook. This provides a robust prior that supplements real-time single-vehicle perception.

Ego-Motion Estimation

Accurate ego-motion estimation is essential for temporal processing -- without knowing precisely how the vehicle moved between frames, multi-frame aggregation degrades. Mobileye estimates ego-motion from:

Visual odometry (feature matching across camera frames)
Vehicle dynamics sensors (wheel speed, steering angle, IMU)
REM map-based localization (matching observed landmarks to the Roadbook)

STAT Transformer for Temporal Efficiency

Mobileye's STAT (Sparse Typed Attention) transformer architecture (described in Section 19) is used for temporal processing. By organizing tokens into structured groups with "manager tokens" enabling controlled communication, STAT achieves 100x efficiency improvements over standard transformer attention, making temporal multi-frame processing practical on EyeQ hardware.

17. RSS Integration with Perception

Perception as RSS Input

The RSS (Responsibility-Sensitive Safety) model operates on the output of perception. It does not process raw sensor data -- it consumes a structured representation of the driving scene.

Required Perception Outputs for RSS

Perception Output	RSS Usage
Object position (x, y, z)	Input to safe distance calculation
Object velocity (vx, vy)	Used in longitudinal and lateral safe distance formulas
Object classification	Determines applicable RSS rules (vehicle vs. pedestrian vs. cyclist)
Object dimensions	Defines the spatial extent for collision geometry
Lane model	Determines lateral relationships (same lane, adjacent lane, oncoming)
Free space	Constrains available escape maneuvers
Traffic signals/signs	Determines right-of-way assignments
Ego position and velocity	One side of every constellation pair

Constellation Formation

For each detected object, RSS creates a "constellation" -- a pairwise geometric relationship between the ego vehicle and that object. The constellation captures:

Longitudinal distance and relative velocity
Lateral distance and relative velocity
Lane relationship (same lane, adjacent, oncoming, crossing)
Right-of-way status (who has priority)
Visibility status (is the object in a blind spot or occluded area?)

Safe Distance Computation

For each constellation, RSS computes the minimum safe distance using the mathematical formulas:

Longitudinal safe distance (d_min_longitudinal): Ensures that even if the leading vehicle brakes at maximum deceleration instantaneously, the ego vehicle -- after a bounded response time rho -- can brake to a stop without collision.

Parameters:

v_r: rear (ego) vehicle velocity
v_f: front vehicle velocity
rho: response time (perception + actuation latency)
a_max_accel: maximum acceleration during response time
a_min_brake: minimum braking deceleration of ego vehicle
a_max_brake: maximum braking deceleration of lead vehicle

Lateral safe distance (d_min_lateral): An analogous formula accounting for lateral velocities, lateral response time, and lateral braking capabilities.

Proper Response

When the measured distance drops below d_min, the situation is classified as "dangerous" and RSS computes a "proper response" -- acceleration limits [a_min, a_max] for both longitudinal and lateral dimensions that will restore safety. The proper response guarantees that the ego vehicle will not cause a collision.

Aggregation

All per-object proper responses are combined into a single actuation constraint. The planner must generate trajectories within this constraint envelope. Any trajectory satisfying all constraints is provably safe with respect to RSS.

RSS Dependency on Perception Quality

RSS's safety guarantees are only as good as the perception that feeds it. This creates specific requirements:

Recall: Perception must not miss safety-critical objects. A missed pedestrian means RSS cannot create a constellation for them, invalidating the safety guarantee. True Redundancy addresses this: if either subsystem detects the object, RSS will account for it.
Position accuracy: RSS safe distance formulas are sensitive to position error. Overestimating distance to a hazard can lead to insufficient braking. The 5-10 cm localization accuracy from REM helps ground the perception in metric coordinates.
Latency: The response time rho in RSS includes perception latency. Lower perception latency allows shorter rho values, which translate to smaller required safe distances and more agile driving behavior.

RSS Rule 4: Occluded Regions

RSS Rule 4 ("Be cautious with limited visibility") specifically addresses perception limitations. When perception cannot see into an occluded region (behind a parked truck, around a blind corner), RSS assumes a hazard may exist there and computes a conservative speed limit that would allow the vehicle to stop if a hazard emerges.

This is a formal way of encoding "absence of evidence is not evidence of absence" into the driving policy.

18. VLSA (Vision Language Safety Agent)

Compound AI Architecture: Fast-Think / Slow-Think

Mobileye's autonomy software architecture, unveiled across CES 2025 and 2026, replaces monolithic end-to-end approaches with a modular Compound AI System (CAIS). The core principle is the separation of driving intelligence into two processing frequencies:

Fast-Think System (~10 Hz)

Responsibility: All safety-critical, reflexive decisions
Content: RSS enforcement, immediate obstacle avoidance, emergency braking, lane keeping
Compute location: Entirely on-vehicle (EyeQ chips)
Characteristics: Deterministic, formally verifiable, low latency
Models: Purpose-built perception networks (CNNs, transformers) + formal RSS rules

Slow-Think System (~1 Hz)

Responsibility: Scene-level reasoning, complex semantic understanding, edge-case resolution
Content: Interpreting ambiguous situations, understanding construction zones, navigating unusual intersections, predicting complex multi-agent interactions
Compute location: On-vehicle + cloud (leveraging powerful VLMs at lower frequency)
Characteristics: Stochastic, probabilistic, higher latency acceptable
Models: Vision-Language Models (VLMs) and VLSA

VLSA: Vision-Language-Semantic-Action Model

VLSA is the slow-think component. It is a vision-language model that processes deep scene semantics, described by Mobileye as functioning "like an adult accompanying a young driver in complex driving situations."

What VLSA Does

Processes scene semantics: Understands not just what objects are present, but what they mean in context (e.g., a police officer directing traffic overrides traffic lights; a school bus with flashing lights requires a full stop)
Provides structured semantic guidance: Outputs structured information to the planning system -- not direct vehicle control commands, but semantic annotations like "this is a construction zone requiring lane shift left" or "the pedestrian group at the crosswalk is likely to cross"
Resolves edge cases: Handles situations that fast-think cannot resolve through reflexive rules alone: unusual road configurations, ambiguous signage, complex social driving situations

What VLSA Does NOT Do

Does NOT sit in the safety loop: Safety remains in the fast-think system governed by RSS. If VLSA produces incorrect guidance, the fast-think system will still maintain safety.
Does NOT control the vehicle directly: VLSA provides semantic context; the fast-think system converts this into safe trajectories.
Does NOT need to run in real-time: Operating at ~1 Hz (or even lower for some decisions), VLSA can use larger, more capable models than would be possible at 10 Hz.

Cloud Connectivity

VLSA can partially run in the cloud, calling powerful vision-language models at low frequency. This enables:

Use of much larger models than could fit on-vehicle
Continuous model improvement via cloud updates without vehicle hardware changes
In many cases, VLSA can replace human remote operators for edge-case resolution

ACI: Artificial Community Intelligence

Complementing VLSA, Mobileye developed ACI -- a self-play-based framework for training driving policy:

Approach: Uses sensing-state simulation (abstract state representations) rather than photorealistic imagery
Map integration: Places simulated agents (cars, pedestrians, buses) onto real road layouts from the REM Roadbook
Behavioral diversity: Each agent can exhibit millions of possible driving behaviors, including rare and reckless ones
Scale: Generates billions of simulated driving hours overnight
Purpose: Trains driving policy to handle complex multi-agent interactions, understanding how one decision affects the behavior of others and how that chain of reactions feeds back to the ego vehicle

ACI is built on multi-agent reinforcement learning, where the key insight is that driving is a social activity -- each participant's actions influence others' responses.

19. Model Efficiency

The Efficiency Challenge

Mobileye faces a unique constraint in the autonomous driving industry: its perception stack must run on custom silicon consuming 5-40W (EyeQ6L at ~4W to EyeQ Ultra at <100W), while competitors like Waymo run on data-center-class hardware consuming hundreds of watts. This constraint drives extreme optimization at every level.

Hardware-Software Co-Design

Mobileye's fundamental efficiency advantage comes from designing the EyeQ hardware and the perception software together:

XNN accelerators are designed for the specific matrix sizes, sparsity patterns, and data flow requirements of Mobileye's neural networks
PMA CGRA cores map exactly to the computational patterns of Mobileye's vision processing algorithms
Memory hierarchy is tuned for the access patterns of automotive perception (sequential frame processing, feature pyramid construction, multi-scale detection)

This co-design means EyeQ chips deliver more useful perception computation per watt than general-purpose alternatives, even though their raw TOPS rating is lower.

Quantization Strategy

All inference runs at Int8 precision on XNN accelerators:

Quantization-Aware Training (QAT): Networks are trained with simulated Int8 quantization in the forward pass, allowing weights to adapt to precision loss during training
Post-Training Quantization (PTQ): For rapid deployment, pre-trained models are calibrated on representative automotive datasets to determine optimal quantization parameters
The EyeQ6H achieves a 0.5 ms latency on ResNet-50 at Int8, compared to 1.64 ms on the NVIDIA Jetson AGX Orin at FP16 -- despite the Orin having ~8x the raw TOPS rating

STAT: Sparse Typed Attention Transformer

Mobileye's most significant efficiency innovation for transformer-based models is STAT (Sparse Typed Attention):

Problem: Standard transformer self-attention has O(n^2) complexity, making it prohibitively expensive for automotive perception on power-constrained hardware
Solution: STAT organizes tokens into structured groups with defined types (e.g., "regular tokens" and "manager tokens"). Rather than every token attending to every other token, communication flows through the manager tokens, which serve as information hubs
Example: 300 regular tokens + 32 manager (link) tokens, where regular tokens only attend to their local group and the manager tokens, dramatically reducing attention computation
Result: 100x efficiency improvement over standard transformers without compromising accuracy
Impact: Makes transformer-based perception practical on EyeQ6H hardware, enabling attention-based temporal fusion, cross-camera feature aggregation, and global scene reasoning at automotive frame rates

Knowledge Distillation

Mobileye uses teacher-student distillation to deploy efficient models:

A large "teacher" network is trained on cloud GPUs with full-precision computation
A smaller "student" network, sized for specific EyeQ hardware, is trained to match the teacher's outputs
The student benefits from the teacher's superior feature representations without requiring the teacher's compute budget at inference time

This enables Mobileye to deploy different complexity models across product tiers: a simpler student for EyeQ4-based ADAS, a more capable student for EyeQ6H-based SuperVision, without retraining from scratch.

Automatic Ground Truth (Auto GT)

Mobileye uses "advanced sensors and offline processing" to automatically generate accurate ground truth labels for training data, reducing dependence on manual annotation. This enables:

Faster iteration on model training
More consistent labeling quality
Scaling to the massive 200+ PB dataset Mobileye has collected

Multiple perception tasks share a common backbone feature extractor:

Object detection, lane detection, free space estimation, semantic segmentation, and depth estimation all consume features from the same backbone
The backbone computation (the most expensive part) is amortized across all tasks
Task-specific heads are lightweight and add minimal overhead

Benchmark Results: EyeQ6H vs. NVIDIA Jetson AGX Orin

Benchmark	EyeQ6H (34 TOPS, Int8)	Jetson AGX Orin (275 TOPS, FP16)
ResNet-50 single-stream latency	0.5 ms	1.64 ms
EfficientViT-B1 (224x224, 9.1M params)	0.564 ms	1.48 ms
EfficientViT-B2 (224x224, 24M params)	0.932 ms	2.63 ms
Pixel-labeling NN throughput	>1,000 FPS	Not published

Despite having 8x fewer raw TOPS, EyeQ6H consistently achieves 2-3x lower latency on representative perception workloads.

20. Calibration

The Calibration Challenge

With 11-13 cameras mounted on a vehicle, precise calibration is essential -- a 0.1-degree error in camera orientation translates to a multi-meter error at 100 m distance. Mobileye addresses calibration at multiple levels.

Factory Calibration

Initial camera calibration is performed during vehicle manufacture or ADAS installation:

Intrinsic parameters: Focal length, principal point, distortion coefficients -- characterize the camera optics
Extrinsic parameters: Position and orientation of each camera relative to the vehicle body -- characterize the mounting

Automatic Self-Calibration

Mobileye's systems include automatic calibration that runs during normal driving, eliminating the need for calibration targets or special procedures:

Procedure: Drive the vehicle normally for 5-15 minutes
Method: The system uses vanishing points (from road markings, building edges, and other structured elements in the scene) and road plane estimation to determine camera orientation relative to the road surface
Parameters recovered: Camera pitch (tilt), yaw (rotation), and roll relative to the road plane; camera height above road surface
Online refinement: Calibration is continuously refined during driving, compensating for:
- Suspension changes due to load
- Tire pressure variations
- Temperature-induced mechanical drift
- Windshield camera re-mounting after service

Fleet-Scale Calibration

With 150+ million Mobileye-equipped vehicles, calibration operates at fleet scale:

REM data quality depends on accurate calibration across millions of harvesting vehicles
The cloud aggregation pipeline statistically detects and compensates for poorly calibrated vehicles
Consistent landmark observations from many vehicles create a calibration cross-check: if one vehicle's observations are systematically offset, its calibration is flagged

Multi-Camera Extrinsic Calibration

For multi-camera systems (SuperVision, Drive), cross-camera calibration is critical:

Overlapping fields of view between adjacent cameras provide mutual calibration constraints
Feature correspondences in overlapping regions constrain relative camera poses
The vehicle's motion (known from odometry) provides additional constraints through temporal structure-from-motion

Calibration for REM

The REM harvesting pipeline includes calibration-aware processing:

Landmark positions are corrected for estimated calibration errors before transmission
The cloud pipeline further aligns observations from multiple vehicles, effectively averaging out individual calibration errors
This fleet-scale statistical averaging produces map accuracy (5-10 cm) that exceeds any individual vehicle's calibration precision

21. Key Patents

Mobileye Vision Technologies Ltd. and Mobileye Technologies Ltd. collectively hold thousands of patents spanning the full perception stack. Below are representative patents across key perception areas.

Object Detection and Obstacle Avoidance

Patent	Title	Year	Key Innovation
US7113867B1	System and method for detecting obstacles to vehicle motion and determining time to contact therewith using sequences of images	2006	Foundational monocular obstacle detection using image sequences; time-to-contact computation
US8164628B2	Estimating distance to an object using a sequence of images recorded by a monocular camera	2006	Monocular distance estimation from temporal image sequences
US9205776B2	Vehicle vision system using kinematic model of vehicle motion	—	Kinematic model integration with vision for motion prediction

Free Space and Drivable Area

Patent	Title	Key Innovation
Multiple patents	Systems for mapping road segment free spaces for autonomous vehicle navigation	Determine lateral free space regions adjacent to road segments; update navigation models across fleet
Multiple patents	Determining drivable free-space for autonomous vehicles	Camera-based boundary detection with classification (curb, vehicle, barrier); traversable vs. non-traversable boundary types
Multiple patents	Identifying probable road edge locations under snow cover	Free space estimation when lane markings are obscured

Lane Detection and Mapping

Patent	Title	Year	Key Innovation
US9665100B2	Sparse map for autonomous vehicle navigation	2016	Sparse map structure with lane-level geometry; polynomial representations of road features
US9946260	Sparse map autonomous vehicle navigation	2016	Navigation using sparse map with data density < 1 MB/km
US11086334	Crowdsourcing a sparse map for autonomous vehicle navigation	2017	Fleet-scale crowdsourced map data collection, alignment, road surface information, and vehicle localization using lane measurements
US20180024568A1	Crowdsourcing a sparse map for autonomous vehicle navigation	2017	Landmark-based localization, 3D spline curve trajectories, road signature profiles

Depth Estimation and Semantic Segmentation

Patent	Title	Key Innovation
US10019657B2	Joint depth estimation and semantic segmentation from a single image	Simultaneously estimates depth and semantic labels using coarse-to-fine pipeline with global templates and local segment analysis
US20160350930A1	Joint depth estimation and semantic segmentation from a single image	Machine learning approach merging global and local depth/semantic layouts

Traffic Sign Recognition

Patent	Title	Key Innovation
US20080137908A1	Detecting and recognizing traffic signs	2008

Patent	Title	Key Innovation
CN112384760A	System and method for autonomous vehicle navigation	Lane marking mapping, directional arrow mapping, selective road information harvesting based on data quality, free space mapping, traffic light mapping

Key Patent Statistics

Amnon Shashua: 94+ patents in computer vision, ADAS, and autonomous driving
Mobileye Vision Technologies Ltd.: Hundreds of assigned patents
Core areas: Monocular and multi-camera depth estimation, crowdsourced HD map construction, formal safety envelope computation, low-power CNN accelerator architectures, camera-based free-space estimation, real-time semantic segmentation, traffic sign recognition, pedestrian detection
European Inventor Award: Amnon Shashua was a finalist (2019, European Patent Office) for his patent portfolio

22. Key Publications

Foundational Papers by Amnon Shashua

Multi-View Geometry and Tensor Theory

Paper	Venue	Year	Significance
Projective Structure from Uncalibrated Images: Structure from Motion and Recognition	IEEE TPAMI, Vol. 16(8), pp. 778-790	1994	Foundational work on recovering 3D structure from uncalibrated cameras
Trilinearity of Three Perspective Views and Its Associated Tensor	ICCV, Boston	1995	Introduced the trifocal tensor -- a fundamental construct relating three views of a scene (with M. Werman)
Relative Affine Structure: Canonical Model for 3D from 2D Geometry and Applications	IEEE TPAMI, 18(9):873-883	1996	Canonical representations for multi-view reconstruction (with N. Navab)
Duality of Multi-Point and Multi-Frame Geometry: Fundamental Shape Matrices and Tensors	ECCV	1996	Dual theory of multi-view geometry (with D. Weinshall and M. Werman)
Trilinear Tensor: The Fundamental Construct of Multiple-view Geometry and its Applications	AFPAC Workshop, Kiel	1997	Comprehensive treatment of the trilinear tensor
Tensor Embedding of the Fundamental Matrix	SMILE Workshop	1998	Embedding the fundamental matrix in the trifocal tensor (with S. Avidan)

Photometry and Visual Recognition

Paper	Venue	Year	Significance
Geometry and Photometry in 3D Visual Recognition	MIT PhD Dissertation	1992	Foundational thesis on separating geometric and photometric factors in visual recognition
On Photometric Issues in 3D Visual Recognition from a Single 2D Image	IJCV, Vol. 21, pp. 99-122	1997	Analysis of illumination effects on 3D object recognition

These early papers established Shashua as a leading figure in multi-view geometry -- the mathematical framework for understanding how 3D scenes relate to their 2D projections. This theoretical foundation directly underlies Mobileye's camera-based 3D perception: structure from motion, stereo depth estimation, and multi-camera calibration are all practical applications of Shashua's multi-view geometry research.

RSS and Safety Papers

Paper	Authors	Year	Venue	Significance
On a Formal Model of Safe and Scalable Self-driving Cars	Shalev-Shwartz, Shammah, Shashua	2017	arXiv:1708.06374	Introduces RSS; foundational formal safety model
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving	Shalev-Shwartz, Shammah, Shashua	2016	arXiv:1610.03295	Multi-agent RL framework; deep RL for driving strategy
Responsibility-Sensitive Safety (extended)	Mobileye	2022	arXiv:2206.03418	Comprehensive technical specification of RSS
Implementing the RSS Model on NHTSA Pre-Crash Scenarios	Mobileye	—	Technical report	RSS validated across all 37 NHTSA scenario types
A Safety Architecture for Self-Driving Systems	Mobileye	—	Whitepaper	Full SDoV safety architecture with PGF framework

Perception Papers by Mobileye Researchers

Paper	Authors	Year	Venue	Significance
3D-LaneNet: End-to-End 3D Multiple Lane Detection	Garnett, Cohen, Pe'er, Lahav, Levi	2019	ICCV	First end-to-end 3D lane detection from single image; intra-network IPM; anchor-based lane representation
3D-LaneNet+: Anchor Free Lane Detection using a Semi-Local Representation	Efrat, Bluvstein, Oron, Levi, Garnett, Shlomo	2020	ML4AD Workshop	Improved 3D lane detection without anchor dependency

Machine Learning Theory

Paper	Authors	Year	Significance
Understanding Machine Learning: From Theory to Algorithms	Shalev-Shwartz, Ben-David	2014	Authoritative textbook on ML theory; foundational reference for ML practitioners

Researcher Profiles

Prof. Amnon Shashua (CEO): 160+ publications, 94+ patents, h-index cited by 30,342 researchers. Research spans multi-view geometry, photometric stereo, illumination cone geometry, visual recognition, and deep learning for autonomous driving.
Prof. Shai Shalev-Shwartz (CTO): Machine learning theorist; co-authored foundational textbook on ML theory. Leads ADAS/AV/RSS/REM development. Research includes online learning, reinforcement learning for autonomous driving, and formal safety models.
Noa Garnett: Lead researcher in 3D lane detection and perception.
Dan Levi: Senior researcher in depth estimation and perception architectures.

Open-Source Projects

intel/ad-rss-lib: C++ library implementing the RSS model for autonomous vehicles; includes situation analysis, response computation, and integration examples

23. Perception Evolution: EyeQ1 through EyeQ Ultra

EyeQ1 (2008) -- Birth of Camera Perception

Process: 180 nm
Silicon: Sampled in 2004; commercial launch 2008
Perception capabilities:
- Lane Departure Warning (LDW)
- Forward Collision Warning (FCW)
- Traffic Sign Recognition (TSR)
- Adaptive Headlight Control (AHC)
Architecture: Single camera processing; hand-crafted vision algorithms (Haar cascades, HOG features, edge detection); minimal ML
Significance: Proved that a single camera + custom silicon could deliver production-grade ADAS. This was radical in an era when ADAS was synonymous with expensive radar systems.

EyeQ2 (2010) -- Pedestrian Detection

Performance: 6x EyeQ1
Key perception advance: Pedestrian detection -- the first EyeQ to reliably detect people in camera images
Enabled features: Automatic Emergency Braking (AEB) for pedestrian scenarios; urban operation
Architecture: Early pattern recognition classifiers; beginning of ML-based approaches
Significance: Opened the door to Euro NCAP pedestrian AEB testing and scoring, driving massive OEM adoption

EyeQ3 (2014) -- Surround Perception

Performance: 6x EyeQ2 (~0.25 TOPS)
Key perception advance: Multi-camera input -- the first EyeQ to accept inputs from surround-view camera systems
Enabled features: Safety "cocoon" around the vehicle; L2 capability
Architecture: Multiple camera streams processed; more sophisticated classifiers
Significance: Transition from single-camera ADAS to surround perception; enabled early highway assist features

EyeQ4 (2018) -- Deep Learning at Scale

Process: 28 nm FD-SOI (STMicro)
Performance: 2.5 TOPS at ~4.5 W
Key perception advance: Deep learning inference -- first EyeQ with dedicated CNN acceleration
Architecture: 4 MIPS CPU cores + proprietary vision processing accelerators; fuses up to 8 sensors
Enabled features: Multi-sensor fusion; higher-resolution processing; improved detection accuracy through CNNs
Significance: The mass-market workhorse -- tens of millions shipped. Brought deep learning from research to production ADAS at automotive scale and power budget.

EyeQ5 (2021) -- Autonomous-Grade Perception

Process: 7 nm FinFET (TSMC)
Performance: 24 TOPS peak; <5 W typical
Key perception advance: Full autonomous perception pipeline -- sufficient compute for L2++ (SuperVision) with two chips
Architecture: 8 CPU cores, 18 vision processor cores, 4 accelerator classes (XNN, PMA, VMP, MPC); fuses up to 20 sensors
Enabled features: SuperVision hands-free highway driving; ViDAR; full semantic segmentation; 3D lane detection; dense depth estimation
Perception benchmark: 91 FPS on pixel-labeling neural networks
Significance: Enabled SuperVision deployment at production scale (Zeekr). Proved that camera-only hands-free driving was achievable on <10W silicon.

EyeQ6L (2024) -- Cost-Optimized Next-Gen ADAS

Process: 7 nm
Performance: 5 DL TOPS (int8); 4.5x EyeQ4M compute at half the die area
Key perception advance: Higher-resolution front camera processing (8 MP with 120-degree lateral FOV -- 20-degree improvement over EyeQ4M); improved pixel segmentation with more than double the point density
Enabled features: Enhanced L1-L2 ADAS with better detection range and accuracy
Significance: Cost-effective replacement for EyeQ4 in new designs; nominated volumes grew 3.5x in 2025 vs. 2024

EyeQ6H (2025) -- High-Performance Perception

Process: 7 nm
Performance: 34 DL TOPS (int8); 3x EyeQ5H compute at 25% more power
Key perception advance: 1,000+ FPS on pixel-labeling neural networks (10x improvement over EyeQ5); transformer model support; integrated ISP, GPU, video encoder
Architecture: 8 cores / 32 threads; built-in ISP eliminates external ISP chip; GPU enables bird's-eye-view display and parking visualization
Enabled features: Surround ADAS on single chip; SuperVision with 2x EyeQ6H; Chauffeur with 3x; Drive with 4x; driver monitoring; video recording; automated parking
Significance: The universal building block -- a single chip type scales from Surround ADAS to L4 through chip count. EyeQ Kit SDK opens development to third parties.

EyeQ Ultra (2025+) -- Full L4 on a Single Chip

Process: 5 nm (TSMC)
Performance: 176 TOPS; <100 W
Key perception advance: Single-chip processing of both True Redundancy subsystems -- camera-only + radar/LiDAR perception, HD map processing, and driving policy all on one chip
Architecture: 12 RISC-V cores (dual-threaded), 64 accelerator cores (16 XNN CNN accelerators, 8 PMA CGRA, VMP, MPC), Arm GPU (256 GFLOPS), VPU, ISP, H.264/H.265 encoding
Enabled features: Full L4 autonomous driving for consumer vehicles (not just robotaxis); complete perception-to-actuation pipeline on a single SoC
Significance: Brings L4 autonomy cost down to consumer vehicle economics. The RISC-V ISA transition signals Mobileye's long-term architectural direction away from MIPS.

Perception Capability Progression Summary

Generation	Year	TOPS	Key Perception Milestone
EyeQ1	2008	—	Lane detection, FCW from single camera
EyeQ2	2010	—	Pedestrian detection and AEB
EyeQ3	2014	~0.25	Multi-camera surround perception
EyeQ4	2018	2.5	Deep learning CNN inference at scale
EyeQ5	2021	24	Full autonomous perception pipeline; ViDAR; SuperVision
EyeQ6L	2024	5	2x segmentation density; 8MP wide-FOV camera
EyeQ6H	2025	34	1,000+ FPS segmentation; transformer support; Surround ADAS to L4
EyeQ Ultra	2025+	176	Single-chip L4; both True Redundancy subsystems

The 20-year progression from EyeQ1 to EyeQ Ultra represents a roughly 700x increase in deep learning compute, from hand-crafted vision algorithms detecting lane markings at 180 nm to transformer-based 3D scene understanding at 5 nm -- all while maintaining the automotive power and cost constraints that define Mobileye's approach.

Appendix: Architecture Diagrams

True Redundancy Data Flow

                    CAMERA ARRAY (11-13 cameras)
                              |
                    +--------------------+
                    | Camera Perception  |
                    | (EyeQ XNN/PMA)     |
                    | - Object detection |
                    | - Lane detection   |
                    | - Free space       |
                    | - Depth/ViDAR      |
                    | - Segmentation     |
                    | - TSR/TFL          |
                    +--------------------+
                              |
                    Camera World Model
                              |
                    +--------------------+
                    | RSS Safety Check   |
                    | (Camera channel)   |
                    +--------------------+
                              |
                              v
                    +--------------------+
                    | PLANNING LAYER     |<---- REM Roadbook
                    | - Trajectory gen.  |<---- VLSA guidance
                    | - Comfort opt.     |
                    | - RSS envelope     |
                    +--------------------+
                              ^
                              |
                    +--------------------+
                    | RSS Safety Check   |
                    | (Radar/LiDAR ch.)  |
                    +--------------------+
                              |
                    Radar/LiDAR World Model
                              |
                    +--------------------+
                    | Radar/LiDAR       |
                    | Perception         |
                    | (EyeQ VMP/XNN)     |
                    | - Point cloud det. |
                    | - Object detect.   |
                    | - Free space       |
                    | - Tracking         |
                    +--------------------+
                              |
              IMAGING RADAR (BSR/BSRC) + LiDAR

EyeQ Accelerator Workload Map

+------------------------------------------------------------------+
|                        EyeQ SoC                                   |
|                                                                   |
|  +----------+  +----------+  +----------+  +----------+          |
|  |   XNN    |  |   PMA    |  |   VMP    |  |   MPC    |          |
|  | CNN/DL   |  |  CGRA    |  | SIMD/VLIW|  | Barrel-  |          |
|  | Accel.   |  | Dataflow |  | Vector   |  | threaded |          |
|  +----------+  +----------+  +----------+  +----------+          |
|       |              |              |              |              |
|  Object det.   Feature      Radar FFT      Tracking              |
|  Segmentation  extraction   Beamforming    State mgmt            |
|  Depth est.    Image proc.  Signal proc.   Control flow          |
|  Lane det.     DNN layers   Ultrasonic     Data marshal           |
|  Sign class.   Sensor proc. processing     Post-proc.            |
|                                                                   |
|  +----------+  +----------+  +----------+  +----------+          |
|  |   CPU    |  |   ISP    |  |   GPU    |  |  Video   |          |
|  | RISC-V   |  |  Image   |  |  Arm GPU |  | H.264/5  |          |
|  | General  |  |  Signal  |  | Render   |  | Encode   |          |
|  +----------+  +----------+  +----------+  +----------+          |
|       |              |              |              |              |
|  RSS compute   Debayer       BEV display   Recording             |
|  Planning      HDR tone map  Parking vis.  DMS video             |
|  Scheduling    Distortion    Driver mon.   OTA upload            |
|                correction                                        |
+------------------------------------------------------------------+

Sources: Mobileye corporate technology pages, CES 2025/2026 presentations by Prof. Amnon Shashua, Mobileye AI Day presentations, EyeQ chip specification pages and benchmark data, Mobileye blog posts on True Redundancy/REM/imaging radar/ISA/pedestrian detection, IEEE/CVF publications (3D-LaneNet ICCV 2019), arXiv papers (RSS 1708.06374, 2206.03418; Safe Multi-Agent RL 1610.03295), Google Patents (US7113867B1, US8164628B2, US9665100B2, US9946260, US10019657B2, US11086334, US20150206015A1, US20160350930A1, US20180024568A1, US20190286153A1, CN112384760A), WikiChip EyeQ5 specifications, EE Times/Electronic Design technical articles, AWS Mobileye case studies, Amnon Shashua academic publications database (Hebrew University/Stanford).

SLAM Methods

Methods

Mobileye Perception Stack: Exhaustive Deep Dive ​

Table of Contents ​

1. True Redundancy Architecture ​

Philosophical Foundation ​

The Two Subsystems ​

Algorithmic Redundancy Within Subsystems ​

Cross-Check Mechanism ​

Primary-Guardian-Fallback (PGF) Architecture ​

2. EyeQ Chip Perception Accelerators ​

Heterogeneous Architecture Philosophy ​

The Four Accelerator Classes ​

XNN -- Deep Learning Accelerator ​

PMA -- Programmable Macro Array (CGRA) ​

VMP -- Vector Microcode Processor (SIMD/VLIW) ​

MPC -- Multi-Thread Processor Cluster ​

Workload Distribution Across Accelerators ​

EyeQ Ultra Silicon Specifications ​

EyeQ6H Silicon Specifications ​

EyeQ5 Silicon Specifications ​

3. Camera-Only Perception Subsystem ​

Overview ​

Camera Array Configuration ​

Perception Pipeline Architecture ​

Multi-Scale Feature Extraction ​

2D Detection to 3D Lifting ​

Road Model Generation ​

Outputs ​

4. ViDAR (Vision as Virtual LiDAR) ​

Concept ​