MultiCorrupt

What It Is

MultiCorrupt is a multi-modal robustness dataset and benchmark for LiDAR-camera 3D object detection.
The full paper title is "MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection."
It was presented at IEEE Intelligent Vehicles Symposium 2024.
The benchmark is built around corrupted nuScenes data generation.
It tests how fusion detectors respond to weather, sensor loss, sampling issues, and alignment errors.
The official repository includes generation code, benchmark material, and dataset instructions.

Start from clean nuScenes samples.
Generate repeatable camera and LiDAR corruptions at three severity levels.
Evaluate strong camera-LiDAR fusion detectors under each corruption.
Analyze which fusion strategies remain robust to which perturbation types.
Cover both sensor content corruption and sensor relationship corruption.
Make the generation pipeline open so teams can compile corrupted datasets locally.
Use the benchmark to expose brittle dependence on dense LiDAR, complete camera coverage, and precise calibration.

Inputs are clean nuScenes data, selected corruption type, severity level, and random seed where applicable.
Camera corruptions modify image data.
LiDAR corruptions modify point cloud data and sweeps.
Outputs are corrupted nuScenes-format datasets.
Benchmark outputs are detection scores such as mAP and nuScenes Detection Score.
The dataset supports model-by-model robustness comparisons.
It does not output a new perception architecture; it is an evaluation and data-generation resource.

MultiCorrupt defines ten distinct corruption types.
The repository lists missing camera, motion blur, points reducing, snow, temporal misalignment, spatial misalignment, beams reducing, brightness, dark, and fog.
Each corruption is provided at severity levels 1, 2, and 3.
Image corruptions can be generated with img_converter.py.
LiDAR corruptions can be generated with corresponding point cloud conversion scripts.
Snow simulation uses LiDAR snow simulation resources.
The repository supports Hugging Face dataset download and local compilation from nuScenes.

The paper evaluates five state-of-the-art multi-modal 3D detectors.
Models are tested on corrupted validation data to measure robustness by corruption type.
The benchmark is intended for evaluation, but the generated corruptions can also be used for robustness training.
The analysis compares resistance ability across detector fusion strategies.
The protocol highlights that robustness is corruption-specific; a model can be strong for one failure and weak for another.
nuScenes format compatibility reduces integration friction for existing mmdetection3d and OpenPCDet workflows.
Evaluation should include clean nuScenes scores to avoid rewarding robustness with unacceptable clean accuracy.

The corruption list is practical and close to common sensor degradation modes.
Three severity levels make it useful for capability-curve development.
Open generation code is valuable for reproducibility and local adaptation.
It covers both spatial and temporal misalignment, which are common silent fusion failures.
It is narrower than MSC-Bench but deeper for LiDAR-camera 3D detection.
It can be plugged into many existing nuScenes training and evaluation stacks.

It inherits nuScenes road-scene bias and object taxonomy.
Simulated weather and sensor faults may not match specific hardware physics.
It does not include radar, thermal, or infrastructure sensors.
Severity levels are benchmark-defined and may not correspond to physical measurements such as mm/hour rain or lens occlusion percentage.
It can underrepresent correlated real failures, for example rain plus glare plus dirty lens plus LiDAR spray.
A detector that performs well on MultiCorrupt can still fail on real apron sensor anomalies.
Download and dataset compilation require substantial storage and compute.

MultiCorrupt is directly useful for first-pass robustness testing of camera-LiDAR airside perception.
Missing camera, dark, fog, snow, beam reduction, and temporal misalignment map to apron operations.
Points reducing and beams reducing are relevant to spray, dust, de-icing fluid, and partial LiDAR obstruction.
Airside adaptation should add aircraft occlusion, reflective metal surfaces, ground markings, cones, chocks, tow bars, and personnel.
It should be extended with radar and real airport weather captures for production safety evidence.
Severity levels should be calibrated against actual airport sensor logs where possible.

Use the official repository scripts rather than reimplementing corruptions by hand.
Pin the nuScenes version, corruption severity, and random seeds for reproducible regression tests.
Store corrupted datasets separately from clean nuScenes to avoid accidental training contamination.
Track per-sensor availability and corruption metadata alongside each evaluation sample.
Add airside-specific corruption modules only after baseline MultiCorrupt results are reproducible.
Use the benchmark as a gate for fusion changes that claim improved robustness.