Skip to content

HoloVIC

What It Is

HoloVIC is a CVPR 2024 dataset and benchmark for large-scale, holistic vehicle-infrastructure cooperative perception.

It is built around intelligent traffic intersections with many synchronized roadside sensors.

The benchmark emphasizes a global, multi-sensor view rather than a single ego vehicle.

The dataset includes monocular cameras, fisheye cameras, and LiDAR across multiple intersections.

It supports detection, tracking, and vehicle-infrastructure cooperative perception tasks.

Core Technical Idea

HoloVIC treats an intersection as a sensor network.

Each roadside node observes only a partial view, but the full intersection can be reconstructed by combining synchronized sensors.

The central idea is "holographic" perception: produce a consistent global scene representation from heterogeneous infrastructure sensors.

The benchmark stresses:

  • Wide-area coverage.
  • Multi-sensor calibration.
  • Cross-camera and cross-LiDAR consistency.
  • Global object identity across devices and time.
  • Cooperative perception between infrastructure and vehicle viewpoints.

This makes HoloVIC closer to a deployed traffic infrastructure system than a small paired-agent V2X dataset.

Inputs and Outputs

Inputs:

  • Roadside RGB camera streams.
  • Roadside fisheye camera streams.
  • Roadside LiDAR point clouds.
  • Calibration between sensors and global coordinates.
  • Time-synchronized frames.
  • Optional vehicle-side observations for VIC settings.

Outputs:

  • Monocular 3D detections.
  • LiDAR 3D detections.
  • Multi-sensor 3D detections.
  • Multi-object tracks with global identities.
  • Vehicle-infrastructure cooperative detections.

The annotation space is road-intersection-centric, not airport-centric.

Architecture or Benchmark Protocol

HoloVIC is a dataset and benchmark suite, not one model.

The protocol separates tasks so methods can be evaluated by sensing mode and cooperation level.

Representative task families:

  • Monocular 3D object detection.
  • LiDAR 3D object detection.
  • Multi-sensor detection and tracking.
  • Vehicle-infrastructure cooperative perception.

The benchmark uses globally consistent annotations so predictions from different sensors can be compared in one frame.

This is important for measuring whether a cooperative model improves global intersection perception rather than only local agent perception.

Training and Evaluation

Training uses synchronized multi-sensor sequences with calibration and 3D annotations.

Evaluation reports detection and tracking metrics appropriate to each task.

Typical signals include:

  • 3D detection AP.
  • Orientation-aware detection metrics for camera tasks.
  • MOTA, IDF1, or related identity-continuity metrics for tracking.
  • Performance differences between single-sensor, multi-sensor, and cooperative modes.

HoloVIC also supports held-out intersection evaluation, which is useful for measuring deployment transfer across sensor layouts.

Strengths

  • Large-scale real-world infrastructure dataset.
  • Multiple intersections with different sensor layouts.
  • Heterogeneous sensor coverage, including fisheye cameras.
  • Global tracking annotations make it more useful for operations than frame-only detection.
  • Supports both infrastructure-only and vehicle-infrastructure cooperative evaluation.
  • Better match to fixed-site perception than vehicle-only datasets such as nuScenes or Waymo.

Failure Modes

  • Global performance can hide local blind spots near conflict points.
  • Sensor synchronization quality constrains tracking and fusion accuracy.
  • Fisheye and camera calibration errors can produce systematic 3D localization bias.
  • Road traffic classes and behaviors do not cover aircraft or ramp-service operations.
  • Intersections have clearer lane structure than apron stands.
  • Privacy processing or visual degradation may affect camera-only baselines.

Airside AV Fit

HoloVIC is highly relevant for instrumented airport stands and service-road intersections.

The key transferable idea is a fixed multi-sensor site model with global identities.

Airside use cases:

  • Holographic stand monitoring from terminal, mast, and jet-bridge sensors.
  • Long-lived tracking of GSE around an aircraft turnaround.
  • Cooperative perception between an autonomous baggage tractor and stand infrastructure.
  • Global occupancy and object state for ramp-control dashboards.

For safety use, HoloVIC would need an airport version with surveyed stand frames, aircraft geometry, service-equipment classes, worker pose or visibility tags, and night/weather splits.

Implementation Notes

  • Treat HoloVIC as a benchmark pattern for fixed-site perception, not as a drop-in airport model.
  • Preserve per-sensor calibration metadata in any derived data loader.
  • Do not collapse all camera views before checking which sensor actually contributes to each object.
  • For airside, add per-zone metrics: under-wing area, equipment staging area, stand lead-in line, pushback lane, and service-road crossing.
  • Use tracking IDs to evaluate handoff between infrastructure and onboard perception.
  • Add latency and dropped-frame fields if adapting HoloVIC to real-time airport monitoring.

Sources

Public research notes collected from public sources.