HoloVIC

What It Is

HoloVIC is a CVPR 2024 dataset and benchmark for large-scale, holistic vehicle-infrastructure cooperative perception.

It is built around intelligent traffic intersections with many synchronized roadside sensors.

The benchmark emphasizes a global, multi-sensor view rather than a single ego vehicle.

The dataset includes monocular cameras, fisheye cameras, and LiDAR across multiple intersections.

It supports detection, tracking, and vehicle-infrastructure cooperative perception tasks.

Core Technical Idea

HoloVIC treats an intersection as a sensor network.

Each roadside node observes only a partial view, but the full intersection can be reconstructed by combining synchronized sensors.

The central idea is "holographic" perception: produce a consistent global scene representation from heterogeneous infrastructure sensors.

The benchmark stresses:

Wide-area coverage.
Multi-sensor calibration.
Cross-camera and cross-LiDAR consistency.
Global object identity across devices and time.
Cooperative perception between infrastructure and vehicle viewpoints.

This makes HoloVIC closer to a deployed traffic infrastructure system than a small paired-agent V2X dataset.

Inputs and Outputs

Inputs:

Roadside RGB camera streams.
Roadside fisheye camera streams.
Roadside LiDAR point clouds.
Calibration between sensors and global coordinates.
Time-synchronized frames.
Optional vehicle-side observations for VIC settings.

Outputs:

Monocular 3D detections.
LiDAR 3D detections.
Multi-sensor 3D detections.
Multi-object tracks with global identities.
Vehicle-infrastructure cooperative detections.

The annotation space is road-intersection-centric, not airport-centric.

Architecture or Benchmark Protocol

HoloVIC is a dataset and benchmark suite, not one model.

The protocol separates tasks so methods can be evaluated by sensing mode and cooperation level.

Representative task families:

Monocular 3D object detection.
LiDAR 3D object detection.
Multi-sensor detection and tracking.
Vehicle-infrastructure cooperative perception.

The benchmark uses globally consistent annotations so predictions from different sensors can be compared in one frame.

This is important for measuring whether a cooperative model improves global intersection perception rather than only local agent perception.

Training and Evaluation

Training uses synchronized multi-sensor sequences with calibration and 3D annotations.

Evaluation reports detection and tracking metrics appropriate to each task.

Typical signals include:

3D detection AP.
Orientation-aware detection metrics for camera tasks.
MOTA, IDF1, or related identity-continuity metrics for tracking.
Performance differences between single-sensor, multi-sensor, and cooperative modes.

HoloVIC also supports held-out intersection evaluation, which is useful for measuring deployment transfer across sensor layouts.

Strengths

Large-scale real-world infrastructure dataset.
Multiple intersections with different sensor layouts.
Heterogeneous sensor coverage, including fisheye cameras.
Global tracking annotations make it more useful for operations than frame-only detection.
Supports both infrastructure-only and vehicle-infrastructure cooperative evaluation.
Better match to fixed-site perception than vehicle-only datasets such as nuScenes or Waymo.

Failure Modes

Global performance can hide local blind spots near conflict points.
Sensor synchronization quality constrains tracking and fusion accuracy.
Fisheye and camera calibration errors can produce systematic 3D localization bias.
Road traffic classes and behaviors do not cover aircraft or ramp-service operations.
Intersections have clearer lane structure than apron stands.
Privacy processing or visual degradation may affect camera-only baselines.

Airside AV Fit

HoloVIC is highly relevant for instrumented airport stands and service-road intersections.

The key transferable idea is a fixed multi-sensor site model with global identities.

Airside use cases:

Holographic stand monitoring from terminal, mast, and jet-bridge sensors.
Long-lived tracking of GSE around an aircraft turnaround.
Cooperative perception between an autonomous baggage tractor and stand infrastructure.
Global occupancy and object state for ramp-control dashboards.

For safety use, HoloVIC would need an airport version with surveyed stand frames, aircraft geometry, service-equipment classes, worker pose or visibility tags, and night/weather splits.

Implementation Notes

Treat HoloVIC as a benchmark pattern for fixed-site perception, not as a drop-in airport model.
Preserve per-sensor calibration metadata in any derived data loader.
Do not collapse all camera views before checking which sensor actually contributes to each object.
For airside, add per-zone metrics: under-wing area, equipment staging area, stand lead-in line, pushback lane, and service-road crossing.
Use tracking IDs to evaluate handoff between infrastructure and onboard perception.
Add latency and dropped-frame fields if adapting HoloVIC to real-time airport monitoring.

SLAM Methods

Methods

HoloVIC ​

What It Is ​

Core Technical Idea ​

Inputs and Outputs ​

Architecture or Benchmark Protocol ​

Training and Evaluation ​

Strengths ​

Failure Modes ​

Airside AV Fit ​

Implementation Notes ​

Sources ​