Event-Camera VIO and SLAM
Related docs: OpenVINS, VINS-Mono / VINS-Fusion, SVO, LSD-SLAM / DSO, factor graphs and iSAM2, event and thermal camera models, visible cameras, and robust multi-sensor localization.
Executive Summary
Event-camera VIO/SLAM uses asynchronous brightness-change events rather than ordinary frame images as the main visual signal. The attraction is clear: event cameras have high temporal resolution, high dynamic range, low latency, and resistance to motion blur. Those properties directly target failure cases of frame-based VIO: aggressive motion, flicker, glare, night lighting transitions, and high-speed robotics.
Representative systems include early EVIO, ESVIO, PL-EVIO, EVI-SAM, and ESVO2. The field has two broad families:
- Feature/factor graph systems: track event corners, line features, and optional frame-camera features, then optimize IMU preintegration plus visual residuals.
- Direct event systems: align event representations such as time surfaces, adaptive accumulations, and event images, often with stereo depth and IMU priors.
For AVs, event-camera SLAM is promising as a high-dynamic-range, low-latency supplement. It is not yet a drop-in replacement for camera/LiDAR/radar localization because event data is motion-dependent, sparse when the scene is static, harder to calibrate, and less supported by production perception tooling.
Problem Fit
Event-camera VIO/SLAM fits:
- High-speed motion where frame cameras blur.
- High dynamic range scenes with headlights, sun glare, shadows, tunnel exits, or floodlights.
- Low-latency stabilization for drones, small robots, and aggressive maneuvers.
- Scenarios where edge structure is strong and static.
- Stereo event rigs where depth can be recovered without ordinary frames.
It is weaker for:
- Slow or stopped vehicles, because events vanish without brightness change.
- Textureless scenes with weak edges.
- Scenes dominated by independently moving objects.
- Standard automotive localization stacks that require mature tooling, maps, and certification evidence.
Sensor Model
An event camera emits events:
e_i = (u_i, v_i, t_i, p_i)where (u, v) is pixel location, t is timestamp, and p is polarity. An event is triggered when log intensity changes by a contrast threshold:
L(u, v, t_i) - L(u, v, t_i - dt) = p_i CVIO systems may use:
- Monocular event camera + IMU.
- Stereo event cameras + IMU.
- Event camera + standard frame camera + IMU.
- Optional depth maps, TSDF fusion, or dense event mapping.
The estimator state usually resembles visual-inertial SLAM:
x = [R, p, v, b_g, b_a, T_event_imu, T_frame_event, camera_intrinsics]with landmarks, line features, depth maps, or event-map states depending on the method.
Pipeline
Event preprocessing
- Denoise hot pixels and refractory noise.
- Build time surfaces, event frames, adaptive accumulation maps, or feature tracks.
- Optionally synchronize event packets with frame images.
IMU propagation
- Preintegrate IMU between keyframes or event windows.
- Provide motion priors for fast tracking and weakly observable axes.
Front-end tracking
- Feature methods detect event corners/lines and track point/line measurements.
- Direct methods align event representations or time surfaces.
- Stereo methods estimate depth through temporal/static stereo matching.
Back-end optimization/filtering
- Fuse IMU residuals with event reprojection, line, direct alignment, or photometric-like residuals.
- Maintain a sliding window, keyframes, local map, or dense TSDF.
Mapping
- Sparse systems keep landmarks or line features.
- ESVO-style systems build semi-dense depth maps.
- EVI-SAM reconstructs dense colored maps using image-guided event mapping and TSDF fusion.
Diagnostics
- Track event rate, feature distribution, line support, IMU residuals, bias stability, and event/frame timing.
Mathematical Mechanics
Feature-based event VIO uses point and line residuals:
X* = arg min_X
sum_imu || r_imu(x_i, x_j) ||^2
+ sum_point rho(|| z_uv - pi(T_CW p_W) ||^2)
+ sum_line rho(|| r_line(T_CW, L_W, z_line) ||^2)
+ sum_prior || r_prior ||^2PL-EVIO is representative: event point and line residuals, frame-image point residuals, and IMU preintegration are tightly coupled in a keyframe graph.
Direct event methods avoid explicit matching under large baseline changes. They align event representations:
r_direct = E_ref(u) - E_cur( warp(u, T, depth) )where E may be a time surface or adaptive accumulation image. ESVO2 improves earlier ESVO-style direct stereo event odometry by using adaptive accumulation, contour-point sampling, static/temporal stereo fusion, IMU preintegration, and a tightly coupled back end for velocity and IMU bias.
EVI-SAM combines feature matching and direct alignment:
X* = arg min_X
sum_imu || r_imu ||^2
+ sum_reproj rho(|| r_event_reproj ||^2)
+ sum_align rho(|| r_event_2d2d_align ||^2)
+ sum_mapping rho(|| r_depth_tsdf ||^2)The observability challenge is distinct: event data depends on motion and scene contrast. IMU priors are not just helpful for scale; they stabilize rotation and velocity when event alignment is underconstrained.
Assumptions
- Event camera timestamps are precise and synchronized with IMU.
- Contrast thresholds and biases are stable or calibrated.
- There is sufficient relative motion to generate useful events.
- Static edges dominate the event stream.
- IMU noise and bias models are realistic.
- Stereo event rigs have stable calibration and enough event correspondence.
- Frame/event fusion methods correctly model the time and modality difference.
Strengths
- High dynamic range for glare, headlights, tunnel exits, and night floodlights.
- Low latency and high temporal resolution.
- Strong resistance to motion blur.
- Efficient sparse data under motion.
- Line features are useful in human-made scenes.
- Stereo event systems can recover metric depth without ordinary frames.
- Good complement to conventional cameras and IMUs for aggressive motion.
Limitations
- Static scenes produce little data.
- Motion-dependent measurements complicate data association and benchmarking.
- Event streams include sensor noise, hot pixels, and flicker artifacts.
- Textureless but thermally/visually uniform scenes remain hard.
- Calibration is more complex than ordinary frame-camera VIO.
- Learned or hand-tuned event representations may not transfer across sensors.
- Automotive-grade event SLAM tooling and datasets are less mature than LiDAR/VIO.
- Rain, snow, rotating beacons, LED flicker, and dynamic objects can create high event clutter.
Datasets and Benchmarks
Common datasets:
- MVSEC: stereo event, frame, LiDAR, IMU, and ground truth in driving and indoor scenes.
- DSEC: stereo event driving dataset with high-resolution event cameras.
- TUM-VIE: visual-inertial event dataset with motion-capture/ground-truth segments.
- EVIMO2: event-camera dataset with object motion and ground truth.
- RPG event datasets: classic event-camera VO/VIO sequences.
- ESVIO / PL-EVIO / EVI-SAM author data: useful for reproducing method claims.
Metrics:
- ATE/RPE with consistent alignment.
- Tracking success rate under fast motion and HDR.
- Event-rate sensitivity and latency.
- IMU bias drift.
- Depth-map completeness for stereo/direct systems.
- Runtime on CPU and embedded targets.
- Failure rate during low-motion intervals.
AV Relevance
Event cameras can help AV localization where conventional cameras struggle:
- Headlight glare and night ramp lighting.
- Rapid illumination transitions under terminals or tunnels.
- Motion blur during vibration or fast maneuvers.
- Low-latency wheel-slip or yaw-rate correction in close-quarters movement.
They are not a primary AV localization sensor today. A practical architecture would use event VIO as an aiding factor:
- Frame camera/LiDAR/radar map localization remains the main pose source.
- Event-camera residuals provide high-rate relative motion and HDR edge constraints.
- A supervisor gates event input using event rate, spatial distribution, and dynamic-object indicators.
Indoor/Outdoor Relevance
Indoor: Strong in high-contrast corridors, warehouses, labs, and drone spaces. Weak when lighting flicker or low motion dominates.
Outdoor: Promising for HDR and fast motion. Needs robust dynamic-object rejection and weather validation.
Mixed indoor/outdoor: Strong use case because dynamic range changes are exactly where frame cameras can degrade.
Airside Deployment Notes
Airside has several event-camera opportunities:
- Floodlight glare at night.
- Sudden shadow/sun transitions near terminal structures.
- High-contrast aircraft, poles, markings, and service-road edges.
- Low-latency motion feedback for small autonomous tow tractors or inspection robots.
But risks are significant:
- Flashing beacons and LED signage can create event clutter.
- Rain, spray, and rotating lights can dominate events.
- Slow pushback and creeping maneuvers may not generate enough events.
- Dynamic GSE and crew can violate static-scene assumptions.
Use event VIO as a supplemental relative-motion channel, not a standalone global localization method.
Validation Checklist
- Calibrate event-camera intrinsics, event-IMU extrinsics, and time offset.
- Measure event rate and feature coverage by scenario.
- Test stopped/slow motion, fast vibration, and aggressive turns.
- Include HDR, night, glare, LED flicker, rain, and spray.
- Compare against frame VIO under matched motion.
- Verify estimator covariance during low-event intervals.
- Confirm dynamic-object gating under moving vehicles and people.
- Test runtime on target compute with realistic event rates.
- Validate recovery after event burst saturation.
Sources
- Niu et al., "ESVO2: Direct Visual-Inertial Odometry with Stereo Event Cameras," arXiv/T-RO, 2024/2025: https://arxiv.org/abs/2410.09374
- ESVO2 official implementation: https://github.com/NAIL-HNU/ESVO2
- Guan et al., "EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Mapping," arXiv, 2023/2024: https://arxiv.org/abs/2312.11911
- EVI-SAM project page: https://kwanwaipang.github.io/EVI-SAM/
- Guan et al., "PL-EVIO: Robust Monocular Event-based Visual Inertial Odometry with Point and Line Features," arXiv, 2022: https://arxiv.org/abs/2209.12160
- PL-EVIO official implementation: https://github.com/arclab-hku/PL-EVIO_open
- ESVIO paper: https://arxiv.org/abs/2212.13184
- Early event-based VIO: https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhu_Event-Based_Visual_Inertial_CVPR_2017_paper.pdf