MSC-Bench

What It Is

MSC-Bench is a multi-sensor corruption benchmark for autonomous driving perception.
The full paper title is "MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception."
It evaluates robustness for both 3D object detection and HD map construction.
The benchmark focuses on camera-LiDAR fusion systems.
It was released as a 2025 arXiv work with a public project page.
The benchmark exists because clean nuScenes validation scores hide severe degradation under sensor faults.

Define a controlled set of camera and LiDAR corruptions.
Apply corruptions individually and concurrently to multi-sensor inputs.
Evaluate the same models under clean data and under increasing corruption severity.
Compare not only accuracy but resilience across corruption families.
Cover weather, interior sensor effects, and sensor failure scenarios.
Expose which fusion designs rely too strongly on one sensor or on perfect synchronization.
Extend robustness evaluation beyond a single task by including HD map construction.

Inputs are nuScenes-style multi-camera and LiDAR samples.
Corruption inputs include type, severity level, and whether the corruption applies to one or multiple sensors.
The benchmark produces corrupted perception datasets and evaluation configs.
For 3D object detection, outputs are NDS, mAP, and related detection metrics.
For HD map construction, outputs are map AP metrics.
The project also reports resilience scores and relative resilience scores.
The benchmark assumes models already support camera, LiDAR, or camera-LiDAR inputs.

MSC-Bench defines 16 corruption types or combinations.
Listed corruptions include fog, snow, motion blur, spatial misalignment, temporal misalignment, camera crash, frame lost, cross sensor, cross talk, and incomplete echo.
Combined scenarios include camera crash or frame lost paired with cross sensor, cross talk, or incomplete echo.
Corruptions are grouped into weather, interior, and sensor failure categories.
The 3D object detection benchmark reports NDS over all corruption types and severities.
The HD map benchmark reports mAP and class-specific map AP variants.
Relative robustness visualizations compare methods to BEVFusion for detection and MapTR for maps.

The paper evaluates six 3D object detection models.
It also evaluates four HD map construction models.
Baselines are grouped by input modality, backbone, BEV encoder, and image resolution.
Evaluation is performed on the official nuScenes validation set with corruptions applied.
Clean-set performance is reported alongside corrupted performance.
Resilience Score is computed from metric degradation across severity levels.
The key finding is substantial degradation under adverse weather and sensor failure, even for strong fusion models.

Covers both object detection and map construction, two core AV perception outputs.
Includes concurrent corruptions, not just one sensor fault at a time.
The project page makes the benchmark definition and corruption examples easy to inspect.
Resilience scores help compare methods with different clean-set accuracy.
It directly tests camera-LiDAR assumptions such as synchronization and sensor completeness.
It is useful for regression testing because corruption types and severities are repeatable.

It is still a corruption benchmark built on an existing road dataset.
Synthetic corruptions can miss real sensor physics, maintenance issues, and weather artifacts.
nuScenes geography and object classes do not cover airside operations.
HD map construction metrics do not directly measure apron lane markings, stand boundaries, or aircraft service zones.
The benchmark may overemphasize modeled corruption families and underemphasize rare real failures.
A high resilience score does not guarantee safe planning behavior.
It does not replace field logging of actual sensor faults.

MSC-Bench is useful as a template for airside perception robustness test matrices.
Its combined failure cases map to realistic apron events such as fog plus camera dropout or vibration plus frame loss.
The HD map portion is relevant to stand markings, stop lines, service roads, and no-go zones.
Airside adaptation should add apron-specific classes, reflective aircraft, low-profile tow bars, and ground crew PPE.
Weather corruptions should include de-icing spray, jet exhaust shimmer, wet pavement glare, and nighttime floodlights.
Use MSC-Bench-style scoring to decide when autonomy should degrade to reduced speed or teleoperation.

Treat MSC-Bench as an offline validation suite, not a training-only augmentation set.
Report clean accuracy and resilience score together; one alone is misleading.
Run per-class and per-range breakdowns because small objects are often safety critical.
Add planner-facing metrics such as false negatives inside braking distance.
If reproducing, pin the benchmark version because corruption definitions may evolve.
For airside transfer, clone the protocol rather than assuming nuScenes corruptions are sufficient.