Event-Camera VIO and SLAM

Executive Summary

Event-camera VIO/SLAM uses asynchronous brightness-change events rather than ordinary frame images as the main visual signal. The attraction is clear: event cameras have high temporal resolution, high dynamic range, low latency, and resistance to motion blur. Those properties directly target failure cases of frame-based VIO: aggressive motion, flicker, glare, night lighting transitions, and high-speed robotics.

Representative systems include early EVIO, ESVIO, PL-EVIO, EVI-SAM, and ESVO2. The field has two broad families:

Feature/factor graph systems: track event corners, line features, and optional frame-camera features, then optimize IMU preintegration plus visual residuals.
Direct event systems: align event representations such as time surfaces, adaptive accumulations, and event images, often with stereo depth and IMU priors.

For AVs, event-camera SLAM is promising as a high-dynamic-range, low-latency supplement. It is not yet a drop-in replacement for camera/LiDAR/radar localization because event data is motion-dependent, sparse when the scene is static, harder to calibrate, and less supported by production perception tooling.

Problem Fit

Event-camera VIO/SLAM fits:

High-speed motion where frame cameras blur.
High dynamic range scenes with headlights, sun glare, shadows, tunnel exits, or floodlights.
Low-latency stabilization for drones, small robots, and aggressive maneuvers.
Scenarios where edge structure is strong and static.
Stereo event rigs where depth can be recovered without ordinary frames.

It is weaker for:

Slow or stopped vehicles, because events vanish without brightness change.
Textureless scenes with weak edges.
Scenes dominated by independently moving objects.
Standard automotive localization stacks that require mature tooling, maps, and certification evidence.

Sensor Model

An event camera emits events:

text

e_i = (u_i, v_i, t_i, p_i)

where (u, v) is pixel location, t is timestamp, and p is polarity. An event is triggered when log intensity changes by a contrast threshold:

text

L(u, v, t_i) - L(u, v, t_i - dt) = p_i C

VIO systems may use:

Monocular event camera + IMU.
Stereo event cameras + IMU.
Event camera + standard frame camera + IMU.
Optional depth maps, TSDF fusion, or dense event mapping.

The estimator state usually resembles visual-inertial SLAM:

text

x = [R, p, v, b_g, b_a, T_event_imu, T_frame_event, camera_intrinsics]

with landmarks, line features, depth maps, or event-map states depending on the method.

Pipeline

Event preprocessing
- Denoise hot pixels and refractory noise.
- Build time surfaces, event frames, adaptive accumulation maps, or feature tracks.
- Optionally synchronize event packets with frame images.
IMU propagation
- Preintegrate IMU between keyframes or event windows.
- Provide motion priors for fast tracking and weakly observable axes.
Front-end tracking
- Feature methods detect event corners/lines and track point/line measurements.
- Direct methods align event representations or time surfaces.
- Stereo methods estimate depth through temporal/static stereo matching.
Back-end optimization/filtering
- Fuse IMU residuals with event reprojection, line, direct alignment, or photometric-like residuals.
- Maintain a sliding window, keyframes, local map, or dense TSDF.
Mapping
- Sparse systems keep landmarks or line features.
- ESVO-style systems build semi-dense depth maps.
- EVI-SAM reconstructs dense colored maps using image-guided event mapping and TSDF fusion.
Diagnostics
- Track event rate, feature distribution, line support, IMU residuals, bias stability, and event/frame timing.

Mathematical Mechanics

Feature-based event VIO uses point and line residuals:

text

X* = arg min_X
    sum_imu    || r_imu(x_i, x_j) ||^2
  + sum_point  rho(|| z_uv - pi(T_CW p_W) ||^2)
  + sum_line   rho(|| r_line(T_CW, L_W, z_line) ||^2)
  + sum_prior  || r_prior ||^2

PL-EVIO is representative: event point and line residuals, frame-image point residuals, and IMU preintegration are tightly coupled in a keyframe graph.

Direct event methods avoid explicit matching under large baseline changes. They align event representations:

text

r_direct = E_ref(u) - E_cur( warp(u, T, depth) )

where E may be a time surface or adaptive accumulation image. ESVO2 improves earlier ESVO-style direct stereo event odometry by using adaptive accumulation, contour-point sampling, static/temporal stereo fusion, IMU preintegration, and a tightly coupled back end for velocity and IMU bias.

EVI-SAM combines feature matching and direct alignment:

text

X* = arg min_X
    sum_imu       || r_imu ||^2
  + sum_reproj    rho(|| r_event_reproj ||^2)
  + sum_align     rho(|| r_event_2d2d_align ||^2)
  + sum_mapping   rho(|| r_depth_tsdf ||^2)

The observability challenge is distinct: event data depends on motion and scene contrast. IMU priors are not just helpful for scale; they stabilize rotation and velocity when event alignment is underconstrained.

Assumptions

Event camera timestamps are precise and synchronized with IMU.
Contrast thresholds and biases are stable or calibrated.
There is sufficient relative motion to generate useful events.
Static edges dominate the event stream.
IMU noise and bias models are realistic.
Stereo event rigs have stable calibration and enough event correspondence.
Frame/event fusion methods correctly model the time and modality difference.

Strengths

High dynamic range for glare, headlights, tunnel exits, and night floodlights.
Low latency and high temporal resolution.
Strong resistance to motion blur.
Efficient sparse data under motion.
Line features are useful in human-made scenes.
Stereo event systems can recover metric depth without ordinary frames.
Good complement to conventional cameras and IMUs for aggressive motion.

Limitations

Static scenes produce little data.
Motion-dependent measurements complicate data association and benchmarking.
Event streams include sensor noise, hot pixels, and flicker artifacts.
Textureless but thermally/visually uniform scenes remain hard.
Calibration is more complex than ordinary frame-camera VIO.
Learned or hand-tuned event representations may not transfer across sensors.
Automotive-grade event SLAM tooling and datasets are less mature than LiDAR/VIO.
Rain, snow, rotating beacons, LED flicker, and dynamic objects can create high event clutter.

Datasets and Benchmarks

Common datasets:

MVSEC: stereo event, frame, LiDAR, IMU, and ground truth in driving and indoor scenes.
DSEC: stereo event driving dataset with high-resolution event cameras.
TUM-VIE: visual-inertial event dataset with motion-capture/ground-truth segments.
EVIMO2: event-camera dataset with object motion and ground truth.
RPG event datasets: classic event-camera VO/VIO sequences.
ESVIO / PL-EVIO / EVI-SAM author data: useful for reproducing method claims.

Metrics:

ATE/RPE with consistent alignment.
Tracking success rate under fast motion and HDR.
Event-rate sensitivity and latency.
IMU bias drift.
Depth-map completeness for stereo/direct systems.
Runtime on CPU and embedded targets.
Failure rate during low-motion intervals.

AV Relevance

Event cameras can help AV localization where conventional cameras struggle:

Headlight glare and night ramp lighting.
Rapid illumination transitions under terminals or tunnels.
Motion blur during vibration or fast maneuvers.
Low-latency wheel-slip or yaw-rate correction in close-quarters movement.

They are not a primary AV localization sensor today. A practical architecture would use event VIO as an aiding factor:

Frame camera/LiDAR/radar map localization remains the main pose source.
Event-camera residuals provide high-rate relative motion and HDR edge constraints.
A supervisor gates event input using event rate, spatial distribution, and dynamic-object indicators.

Indoor/Outdoor Relevance

Indoor: Strong in high-contrast corridors, warehouses, labs, and drone spaces. Weak when lighting flicker or low motion dominates.

Outdoor: Promising for HDR and fast motion. Needs robust dynamic-object rejection and weather validation.

Mixed indoor/outdoor: Strong use case because dynamic range changes are exactly where frame cameras can degrade.

Airside Deployment Notes

Airside has several event-camera opportunities:

Floodlight glare at night.
Sudden shadow/sun transitions near terminal structures.
High-contrast aircraft, poles, markings, and service-road edges.
Low-latency motion feedback for small autonomous tow tractors or inspection robots.

But risks are significant:

Flashing beacons and LED signage can create event clutter.
Rain, spray, and rotating lights can dominate events.
Slow pushback and creeping maneuvers may not generate enough events.
Dynamic GSE and crew can violate static-scene assumptions.

Use event VIO as a supplemental relative-motion channel, not a standalone global localization method.

Validation Checklist

Calibrate event-camera intrinsics, event-IMU extrinsics, and time offset.
Measure event rate and feature coverage by scenario.
Test stopped/slow motion, fast vibration, and aggressive turns.
Include HDR, night, glare, LED flicker, rain, and spray.
Compare against frame VIO under matched motion.
Verify estimator covariance during low-event intervals.
Confirm dynamic-object gating under moving vehicles and people.
Test runtime on target compute with realistic event rates.
Validate recovery after event burst saturation.

Sources

Niu et al., "ESVO2: Direct Visual-Inertial Odometry with Stereo Event Cameras," arXiv/T-RO, 2024/2025: https://arxiv.org/abs/2410.09374
ESVO2 official implementation: https://github.com/NAIL-HNU/ESVO2
Guan et al., "EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Mapping," arXiv, 2023/2024: https://arxiv.org/abs/2312.11911
EVI-SAM project page: https://kwanwaipang.github.io/EVI-SAM/
Guan et al., "PL-EVIO: Robust Monocular Event-based Visual Inertial Odometry with Point and Line Features," arXiv, 2022: https://arxiv.org/abs/2209.12160
PL-EVIO official implementation: https://github.com/arclab-hku/PL-EVIO_open
ESVIO paper: https://arxiv.org/abs/2212.13184
Early event-based VIO: https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhu_Event-Based_Visual_Inertial_CVPR_2017_paper.pdf

SLAM Methods

Methods

Event-Camera VIO and SLAM ​

Executive Summary ​

Problem Fit ​

Sensor Model ​

Pipeline ​

Mathematical Mechanics ​

Assumptions ​

Strengths ​

Limitations ​

Datasets and Benchmarks ​

AV Relevance ​

Indoor/Outdoor Relevance ​

Airside Deployment Notes ​

Validation Checklist ​

Sources ​

Event-Camera VIO and SLAM

Executive Summary

Problem Fit

Sensor Model

Pipeline

Mathematical Mechanics

Assumptions

Strengths

Limitations

Datasets and Benchmarks

AV Relevance

Indoor/Outdoor Relevance

Airside Deployment Notes

Validation Checklist

Sources