Skip to content

Perception Coverage Audit and Backlog

The perception library is broad, but the May 2026 agent sweeps found several method families that are still hidden inside larger documents or missing as first-class pages. Treat this file as the backlog for turning perception research into a method-level library similar to the SLAM coverage audit.

Current Status

ItemStatus
Dedicated perception files27 top-level perception synthesis/audit files, 20 dataset/benchmark pages, and 93 atomic method files in Perception Method Library.
Strongest existing coverageProduction sensor suites, CenterPoint/OpenPCDet, BEV basics, method-level camera BEV/occupancy, LiDAR-camera occupancy fusion, dynamic occupancy/free-space, spatiotemporal occupancy flow, Gaussian/3DGS/4DGS perception, LiDAR MOS, scene flow, LiDAR artifact removal, learned denoising, adverse-weather datasets, FOD synthetic benchmarks, motion/static separation benchmarks, radar-camera/radar-LiDAR fusion, 4D radar-camera occupancy, radar/event/FMCW, open-world/OOD, open-vocabulary attributes, panoptic/open-vocabulary occupancy, robust fusion, V2X, latency, data engines, model compression, uncertainty, and thermal fusion.
Most severe structural gapRemaining P0/P1 items still need atomic pages: DR-REMOVER/ExelMap-style map-change methods, newer cooperative compression, radar/occupancy follow-ons such as DepthOcc and LinkOcc, EvOcc/evidential occupancy, SAM4D-style multimodal stream segmentation, and public-data gaps for dust, de-icing mist, steam, glycol film, and wet-apron multipath.
How to use this auditUse the method library for promoted atomic pages. Use the P0/P1/P2 rows below as discovery clusters and backlog for remaining splits. Update this audit whenever a missing perception method becomes a dedicated file.

Multi-Agent Discovery Sweep (2026-05-08)

Six parallel research agents audited camera/BEV perception, LiDAR/radar/thermal perception, foundation/open-world perception, temporal tracking, cooperative/V2X perception, and deployment robustness. A second targeted sweep then checked 2025-2026 latest-method and benchmark gaps, followed by lightweight dataset scouts for adverse-weather, event, radar, FMCW LiDAR, V2X, OOD, and latency benchmarks.

Method-Level Promotion Wave (2026-05-08)

Five parallel writing agents converted the highest-value discovery clusters into atomic method pages under methods/. Existing broad perception files should now link into this library rather than expanding into larger catch-all documents.

ClusterPromoted method files
Camera BEV and occupancyBEVDet, BEVDepth, BEVStereo, SOLOFusion, TPVFormer, SurroundOcc, SparseOcc, FlashOcc, SelfOcc, RenderOcc
LiDAR motion, radar, event, and FMCWLiDAR-MOS, 4DMOS, InsMOS, StreamMOS, 4DSegStreamer, SegNet4D, Mask4D, Instantaneous Motion Perception, RadarPillars, K-Radar, V2X-Radar, Ev-3DOD, AevaScenes
Open-world and open-vocabularyOpenAD, OP3Det, WildDet3D, DetAny3D, OW-OVD, Clipomaly, S2M, SAM 3, 3D-AVS, Mosaic3D, OpenVox
Robust fusion, calibration, and validationMoME, GraphBEV, SOAC, RC-AutoCalib, ASF, MSC-Bench, MultiCorrupt, S2R-Bench, Occluded nuScenes, Conformal Boxes
Cooperative, latency, closed-loop, and data enginesRCooper, HoloVIC, CoInfra, V2X-ReaLO, CoHFF, CoSDH, CoopTrack, LASP, Fail2Drive, AIDE

Deep-Dive Promotion Wave (2026-05-09)

Fresh parallel discovery and writing agents promoted the first Gaussian/4DGS and latest radar-camera/sparse-query gaps into atomic pages. The rows below are now the primary pages for those methods; older backlog rows remain as provenance for related follow-ons.

ClusterPromoted method files
Gaussian and 3DGS perceptionSplatAD, GaussianFormer, GaussianOcc, Streaming Gaussian Occupancy
4D occupancy and sparse-query perceptionCam4DOcc, StreamingFlow, Sparse4D
Radar-camera fusionTacoDepth, RaCFormer

LIORNet and Adverse-Weather Removal Wave (2026-05-09)

The next loop broadened the LIORNet question from one snow-removal method into a full removal stack: learned LiDAR denoising, classical outlier filters, weather artifact removal, ghost/multipath failure modes, dynamic map cleaning, validation evidence, and weather robustness datasets.

ClusterPromoted files
Learned LiDAR denoising/removalLIORNet, LiSnowNet, SLiDE, TripleMixer, 3D-KNN Blind-Spot Desnowing, 3D-OutDet, AdverseNet, DenoiseCP-Net
Classical and broad artifact removalClassical LiDAR Outlier Removal, LiDAR Weather Artifact Removal, LiDAR Artifact Removal Techniques, LiDAR Ghost and Multipath Artifacts
Weather robustness datasetsWeather Robustness Datasets, WADS, CADC/CADC+, SemanticSTF, REHEARSE-3D, RainSense, SemanticSpray, RADIATE, Seeing Through Fog/DENSE
Safety and map-cleaning bridgeLiDAR Artifact Removal Validation, ERASOR, Removert, LiDAR Map Cleaning and Dynamic Removal

Dynamic/Static Removal and Flow Benchmark Wave (2026-05-09)

The latest loop broadened removal from "delete noisy LiDAR points" into map hygiene: dynamic objects, static objects that do not belong in the persistent map, moved objects, scene flow, moving/static separation, and validation against false deletion.

ClusterPromoted files
MOS and scene-flow methodsMotionSeg3D, MambaMOS, Neural Scene Flow Priors, Scene Flow for Dynamic Object Removal
Motion/static and 4D occupancy benchmarksMoving/Static Separation MOS Datasets, Occupancy-Flow and 4D Occupancy Benchmarks, Scene-Flow Datasets and Benchmarks
Map-cleaning validation bridgeDynamic Map Cleaning Benchmarks, Moved-Object and Map-Change Datasets, Airside Dynamic Map-Cleaning Benchmark

Web Gap Expansion Wave (2026-05-09)

Five web-search scouts and six writing agents promoted a new batch of radar, neural-field, dataset, and validation gaps into first-class pages.

ClusterPromoted files
Radar, 4D radar, and FMCW perceptionCVFusion, 4D Radar-Camera Occupancy, POD FMCW LiDAR Predictive Detection
Dynamic Gaussian, neural-field, and temporal occupancyDrivingGaussian, HUGS, SplatFlow, DistillNeRF, TrackOcc, Cross-Domain LiDAR Scene Flow
World-model bridgeSelf-Supervised Occupancy Flow, UniScene Occupancy-Centric Generation
Datasets and validationMUSES, Sensor Corruption Robustness Benchmarks, Open-World OOD and Anomaly Segmentation Benchmarks, FOD and Airport Apron Detection Datasets, FOD Perception Validation, Knowledge-Base Evaluation Protocol

Reliability and Open-World Gap Wave (2026-05-09)

Six fresh web-search scouts and six writing agents promoted another set of non-duplicate perception gaps into first-class files.

ClusterPromoted files
Occupancy, freespace, and adverse-weather fusionLiDAR-Camera Occupancy Fusion, Dynamic Occupancy and Freespace, Spatiotemporal Memory Occupancy Flow, Adverse-Weather Radar-LiDAR 3D Detection, RobuRCDet, SAMFusion
Open-world and open-vocabulary coverageOVAD/OVODA Open-Vocabulary 3D Attributes, Open-Vocabulary Panoptic Occupancy
Dataset and benchmark coverageSTU 3D LiDAR Anomaly Segmentation, Airside FOD Synthetic Multimodal Benchmarks, RCP-Bench Cooperative Corruption Robustness, V2X Large-Range Sequential Datasets

Next perception promotion queue: EvOcc, DepthOcc, LinkOcc, missing-view resilient occupancy, Gaussian-rendered occupancy, 4D radar road-boundary/freespace detection, SparseBEV, DETR4D, ForeSight, DriveBench, SAM4D, Open3DTrack, indoor open-vocabulary 3D instance segmentation, embodied robotics 3D perception benchmarks, Airport-FOD3S data-engine coverage, and airside-specific dust/de-icing/steam validation datasets.

Suggested fileMethod or techniqueCategoryWhy it mattersPrimary sources
camera-bev-depth-aware-encoders.mdBEVDet, BEVDepth, BEVStereo, BEVPoolv2, BEVDet4D, SOLOFusionDepth-aware camera BEVThe current BEV doc jumps from LSS to BEVFormer; depth-supervised LSS-family encoders are more practical for LiDAR-supervised airside camera fallback.https://arxiv.org/abs/2206.10092, https://arxiv.org/abs/2209.10248, https://arxiv.org/abs/2203.17054, https://openreview.net/forum?id=H3HcEJA2Um
camera-only-3d-occupancy.mdTPVFormer, SurroundOcc, Occ3D/OpenOcc, FB-OCC, FlashOcc, SparseOcc, SelfOcc, RenderOccCamera semantic occupancyCamera occupancy is now a core planning-facing alternative to boxes for arbitrary shapes, FOD, personnel, and unknown obstacles.https://openaccess.thecvf.com/content/CVPR2023/html/Huang_Tri-Perspective_View_for_Vision-Based_3D_Semantic_Occupancy_Prediction_CVPR_2023_paper.html, https://openaccess.thecvf.com/content/ICCV2023/html/Wei_SurroundOcc_Multi-camera_3D_Occupancy_Prediction_for_Autonomous_Driving_ICCV_2023_paper.html, https://arxiv.org/abs/2304.14365, https://openaccess.thecvf.com/content/CVPR2024/html/Tang_SparseOcc_Rethinking_Sparse_Latent_Representation_for_Vision-Based_Semantic_Occupancy_Prediction_CVPR_2024_paper.html
lidar-moving-object-segmentation.mdLiDAR-MOS, 4DMOS, InsMOS, MF-MOS, MV-MOS/CV-MOS, SegNet4D, Mask4D, HeLiMOSMoving-object segmentation and 4D panoptic LiDARAirside needs point-level static/dynamic separation for parked-vs-moving GSE, map cleanup, track birth, and prediction gating.https://arxiv.org/abs/2105.08971, https://arxiv.org/abs/2206.04129, https://arxiv.org/abs/2401.17023, https://arxiv.org/abs/2408.10602, https://arxiv.org/abs/2406.16279, https://github.com/url-kaist/HeLiMOS-PointCloud-Toolbox
streaming-mos-4d-panoptic.mdStreamMOS and 4DSegStreamerStreaming temporal segmentationAdds memory across inferences, not only within one scan window; useful for high-FPS streams and occlusion consistency.https://arxiv.org/abs/2407.17905, https://llada60.github.io/4DSegStreamer/
instantaneous-motion-perception.mdInstantaneous perception of moving objects in 3DSubtle motion estimationAirside has creeping tugs and quasi-static objects; this fills the gap between single-frame detection and track-derived velocity.https://openaccess.thecvf.com/content/CVPR2024/html/Liu_Instantaneous_Perception_of_Moving_Objects_in_3D_CVPR_2024_paper.html, https://arxiv.org/abs/2405.02781
4d-occupancy-forecasting.mdCam4DOcc, StreamingFlow, UnO, DFIT-OccWorld, Drive-OccWorldPerception-to-prediction handoffTracks alone cannot represent unknown, partially observed, or occluded movers; 4D occupancy/flow is a stronger planner interface.https://openaccess.thecvf.com/content/CVPR2024/html/Ma_Cam4DOcc_Benchmark_for_Camera-Only_4D_Occupancy_Forecasting_in_Autonomous_Driving_CVPR_2024_paper.html, https://openaccess.thecvf.com/content/CVPR2024/html/Shi_StreamingFlow_Streaming_Occupancy_Forecasting_with_Asynchronous_Multi-modal_Data_Streams_via_CVPR_2024_paper.html, https://openaccess.thecvf.com/content/CVPR2024/html/Agro_UnO_Unsupervised_Occupancy_Fields_for_Perception_and_Forecasting_CVPR_2024_paper.html, https://arxiv.org/abs/2412.13772
4d-radar-perception-methods.mdK-Radar, View of Delft, RadarPillars, V2X-Radar4D radar perceptionRadar is covered mainly as hardware/fusion; the perception stack needs tensor-vs-point representations, Doppler/RCS encoding, and benchmark limits.https://github.com/kaist-avelab/K-Radar, https://intelligent-vehicles.org/datasets/view-of-delft/, https://arxiv.org/abs/2408.05020, https://arxiv.org/abs/2411.10962
multi-sensor-corruption-benchmarks.mdMSC-Bench and MultiCorruptRobustness and corruption evaluationAdds systematic camera+LiDAR corruptions, missing inputs, misalignment, temporal mismatch, adverse weather, and sensor failures.https://arxiv.org/abs/2501.01037, https://arxiv.org/abs/2402.11677, https://github.com/ika-rwth-aachen/MultiCorrupt
real-sensor-anomaly-robustness.mdS2R-Bench and Occluded nuScenesReal sensor anomaly and occlusion validationAdds real adverse-weather/sensor-anomaly data and parameterized camera/radar/LiDAR occlusion tests beyond synthetic corruption overlays.https://www.nature.com/articles/s41597-025-06255-3, https://arxiv.org/abs/2505.18631, https://arxiv.org/abs/2510.18552
open-world-ood-perception-benchmarks.mdOpenAD and S2M OOD object segmentationOpen-world/OOD evaluationCurrent docs cover generic OOD scores; these add benchmarkable 3D corner cases and mask-level unknown-object localization.https://arxiv.org/abs/2411.17761, https://github.com/VDIGPKU/OpenAD, https://openaccess.thecvf.com/content/CVPR2024/html/Zhao_Segment_Every_Out-of-Distribution_Object_CVPR_2024_paper.html
open-world-open-vocab-detection.mdOW-OVDOpen-world plus open-vocabulary 2D detectionExisting open-vocab coverage emphasizes prompted known classes; OW-OVD adds unknown-object recall and incremental learning.https://openaccess.thecvf.com/content/CVPR2025/html/Xi_OW-OVD_Unified_Open_World_and_Open_Vocabulary_Object_Detection_CVPR_2025_paper.html
open-world-3d-objectness.mdOpenAD, OP3Det, WildDet3D, DetAny3DPrompt-free and promptable 3D objectnessFills the gap between closed-set 3D detection, text-prompted 3D detection, and true unknown-object discovery in 3D.https://openreview.net/forum?id=T9UDyN5Tw6, https://arxiv.org/abs/2411.17761, https://openreview.net/forum?id=wEOmS8Aw1W, https://arxiv.org/abs/2510.17686, https://arxiv.org/abs/2604.08626, https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Detect_Anything_3D_in_the_Wild_ICCV_2025_paper.pdf
open-world-anomaly-segmentation-driving.mdClipomaly, S2M, SegmentMeIfYouCanUnknown-region and anomaly segmentationDedicated anomaly stack for OOD masks, RoadObstacle/RoadAnomaly metrics, and airside FOD adaptation.https://arxiv.org/abs/2512.01427, https://arxiv.org/abs/2311.16516, https://segmentmeifyoucan.com/
sam3-concept-segmentation-airside.mdSAM 3 / SAM 3.1 concept segmentation and trackingFoundation-model auto-labelingCurrent docs stop at SAM/SAM2/Grounded-SAM2; SAM 3 adds open-vocabulary concept prompts, exemplars, and video tracking for GSE/FOD labeling.https://arxiv.org/abs/2511.16719, https://github.com/facebookresearch/sam3
resilient-fusion-sensor-failure.mdMoME / nuScenes-R failure routingRobust fusion under sensor failureDirectly handles LiDAR drop, camera drop, limited FOV, beam reduction, and occlusion as perception architecture, not only robustness theory.https://openaccess.thecvf.com/content/CVPR2025/html/Park_Resilient_Sensor_Fusion_Under_Adverse_Sensor_Failures_via_Multi-Modal_Expert_CVPR_2025_paper.html, https://github.com/konyul/MoME
robust-fusion-asynchrony-calibration.mdStaleness-aware fusion, GraphBEV, SOAC, RC-AutoCalibTemporal sync and calibration driftCovers silent fusion failure from stale sensors, extrinsic drift, LiDAR-camera misalignment, radar-camera calibration, and asynchronous inputs.https://arxiv.org/abs/2506.05780, https://arxiv.org/abs/2403.11848, https://github.com/adept-thu/GraphBEV, https://openaccess.thecvf.com/content/CVPR2024/html/Herau_SOAC_Spatio-Temporal_Overlap-Aware_Multi-Sensor_Calibration_using_Neural_Radiance_Fields_CVPR_2024_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Luu_RC-AutoCalib_An_End-to-End_Radar-Camera_Automatic_Calibration_Network_CVPR_2025_paper.html
roadside-infrastructure-cooperative-datasets.mdRCooper and HoloVICRoadside/infrastructure cooperative perception datasetsAdds real multi-RSPU and multi-layout infrastructure perception, closer to airport fixed-sensor coverage than V2V-only datasets.https://openaccess.thecvf.com/content/CVPR2024/html/Hao_RCooper_A_Real-world_Large-scale_Dataset_for_Roadside_Cooperative_Perception_CVPR_2024_paper.html, https://github.com/AIR-THU/DAIR-RCooper, https://openaccess.thecvf.com/content/CVPR2024/html/Ma_HoloVIC_Large-scale_Dataset_and_Benchmark_for_Multi-Sensor_Holographic_Intersection_and_CVPR_2024_paper.html, https://arxiv.org/abs/2403.02640
infrastructure-adverse-weather-cooperative-perception.mdCoInfraMulti-node infrastructure plus adverse weatherStrong airport analogue: synchronized infrastructure nodes, real weather, 5G delay-aware fusion, OTA, monitoring, and system docs.https://arxiv.org/abs/2507.02245, https://github.com/NingMingHao/CoInfra
v2x-online-realtime-benchmarks.mdV2X-ReaLOOnline V2X framework and datasetExisting docs are mostly offline-benchmark oriented; this targets synchronized ROS bags, latency, and online intermediate fusion.https://arxiv.org/abs/2503.10034
cooperative-occupancy-perception.mdCoHFF collaborative semantic occupancyCooperative occupancyFills a major gap: dense 3D semantic occupancy, not only boxes, for irregular cargo, unknown obstacles, and FOD.https://openaccess.thecvf.com/content/CVPR2024/html/Song_Collaborative_Semantic_Occupancy_Prediction_with_Hybrid_Feature_Fusion_in_Connected_CVPR_2024_paper.html, https://github.com/rruisong/CoHFF
communication-efficient-cooperative-fusion.mdCoSDHBandwidth-efficient hybrid fusionSupply-demand region selection plus intermediate/late hybridization is more deployment-oriented than pure intermediate fusion.https://openaccess.thecvf.com/content/CVPR2025/html/Xu_CoSDH_Communication-Efficient_Collaborative_Perception_via_Supply-Demand_Awareness_and_Intermediate-Late_Hybridization_CVPR_2025_paper.html
cooperative-sequential-tracking.mdCoopTrackCooperative MOT and sequential perceptionExisting cooperative docs emphasize detection; airport planners need stable IDs and motion continuity across agents.https://openaccess.thecvf.com/content/ICCV2025/html/Zhong_CoopTrack_Exploring_End-to-End_Learning_for_Efficient_Cooperative_Sequential_Perception_ICCV_2025_paper.html, https://arxiv.org/abs/2507.19239
uncertainty-detection-conformal-abstention.mdConformal bounding boxes and conformal abstentionDetection-specific uncertaintyUpgrades conformal coverage from class sets to localization intervals and explicit abstain/degrade decisions.https://arxiv.org/abs/2403.07263, https://arxiv.org/abs/2502.07255
30-autonomy-stack/perception/methods/aide.mdAIDE automatic data engineLong-tail detection data engineConcrete loop for issue mining, auto-labeling, scenario generation, and open-world detection evaluation.https://openaccess.thecvf.com/content/CVPR2024/html/Liang_AIDE_An_Automatic_Data_Engine_for_Object_Detection_in_Autonomous_CVPR_2024_paper.html, https://arxiv.org/abs/2403.17373
30-autonomy-stack/perception/methods/vespa-open-world-pointcloud-labeling.mdVESPA and VLM-assisted 3D auto-labelingOpen-world point-cloud labelingMultimodal pseudo-labeling without GT annotations or HD maps directly addresses airside 3D labeling bottlenecks.https://arxiv.org/abs/2507.20397, https://www.sciencedirect.com/science/article/pii/S0968090X25004334

P1 Remaining Atomic Pages or Major Sections

Suggested fileMethod or techniqueCategoryWhy it mattersPrimary sources
camera-panoptic-open-vocab-occupancy.mdPanoOcc, VEON, LangOcc, OpenOccCamera panoptic/open-vocabulary occupancyAdds instance-level and open-vocabulary 3D scene understanding for rare GSE, barriers, FOD, and long-tail objects.https://openaccess.thecvf.com/content/CVPR2024/html/Wang_PanoOcc_Unified_Occupancy_Representation_for_Camera-based_3D_Panoptic_Segmentation_CVPR_2024_paper.html, https://arxiv.org/abs/2407.12294, https://openreview.net/forum?id=KhjlXNbYea, https://github.com/boschresearch/LangOcc, https://arxiv.org/abs/2403.11796
camera-4d-occupancy-forecasting.mdCam4DOcc, OccProphet, ALOccTemporal camera occupancyCurrent temporal perception coverage is stronger on detection/tracking than camera 4D occupancy.https://openaccess.thecvf.com/content/CVPR2024/html/Ma_Cam4DOcc_Benchmark_for_Camera-Only_4D_Occupancy_Forecasting_in_Autonomous_Driving_CVPR_2024_paper.html, https://openreview.net/forum?id=vC7AlY1ytz, https://huggingface.co/papers/2411.07725
streaming-gaussian-occupancy.mdGaussianWorld and S2GOStreaming Gaussian occupancyMoves from per-frame occupancy toward persistent sparse Gaussian/query world state for real-time temporal occupancy.https://openaccess.thecvf.com/content/CVPR2025/html/Zuo_GaussianWorld_Gaussian_World_Model_for_Streaming_3D_Occupancy_Prediction_CVPR_2025_paper.html, https://openreview.net/forum?id=z8ggdMlSco, https://arxiv.org/abs/2506.05473
generalized-uncalibrated-occupancy.mdOccAny and ViGTGeneralized / calibration-free occupancyTargets uncalibrated, out-of-domain, monocular, sequential, and surround-view occupancy for heterogeneous camera rigs.https://arxiv.org/abs/2603.23502, https://github.com/valeoai/OccAny, https://arxiv.org/abs/2602.05573
missing-view-resilient-occupancy.mdM2-OccIncomplete multi-camera robustnessDirectly tests missing camera/view dropout for semantic occupancy.https://arxiv.org/abs/2603.09737
occupancy-uncertainty-ood-evidence.mdEvOcc and ProOODOccupancy uncertainty / OODHandles unknown objects, occluded voxels, contradictory evidence, and long-tail classes at occupancy level.https://openaccess.thecvf.com/content/CVPR2025/html/Kalble_EvOcc_Accurate_Semantic_Occupancy_for_Automated_Driving_Using_Evidence_Theory_CVPR_2025_paper.html, https://arxiv.org/abs/2604.01081
satellite-assisted-occupancy.mdSA-OccSatellite / overhead-map assisted occupancyEspecially relevant to airside because overhead imagery and GPS/IMU can reduce occlusion and distant-region weakness.https://openaccess.thecvf.com/content/ICCV2025/html/Chen_SA-Occ_Satellite-Assisted_3D_Occupancy_Prediction_in_Real_World_ICCV_2025_paper.html
spatiotemporal-memory-occupancy-flow.mdST-Occ and STCOccTemporal occupancy and scene flowAdds memory, uncertainty-aware aggregation, sparse state refinement, and scene-flow prediction as a planner-facing interface.https://openaccess.thecvf.com/content/ICCV2025/html/Leng_Occupancy_Learning_with_Spatiotemporal_Memory_ICCV_2025_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Liao_STCOcc_Sparse_Spatial-Temporal_Cascade_Renovation_for_3D_Occupancy_and_Scene_CVPR_2025_paper.html
sparse-query-camera-3d-detection.mdSparse4D, SparseBEV, DETR4D, object-query temporal memorySparse multi-view camera detectionFocused taxonomy of sparse query methods versus dense BEV methods for edge deployment.https://arxiv.org/abs/2211.10581, https://arxiv.org/abs/2308.09244, https://arxiv.org/abs/2212.07849, https://arxiv.org/abs/2602.00450
camera-depth-stereo-foundation-models.mdDepth Pro, Video Depth Anything, FoundationStereoCamera depth and stereo foundation modelsUpdates fallback perception beyond Depth Anything v2, Metric3D, and UniDepth with metric, temporal, and zero-shot stereo options.https://arxiv.org/abs/2410.02073, https://arxiv.org/abs/2501.12375, https://github.com/DepthAnything/Video-Depth-Anything, https://arxiv.org/abs/2501.09898, https://research.nvidia.com/labs/lpr/publication/stereoanything2025/
camera-lidar-fusion-interfaces.mdFUTR3D, CMT, DeepInteraction, AutoAlignV2, DAL, SAMFusion, MS-OccCamera-LiDAR fusion interfacesExisting docs cover BEVFusion but underrepresent query fusion, calibration-tolerant alignment, dynamic modality dropout, and occupancy fusion.https://arxiv.org/abs/2203.10642, https://arxiv.org/abs/2301.01283, https://proceedings.neurips.cc/paper_files/paper/2022/hash/0d18ab3b5fabfa6fe47c62e711af02f0-Abstract-Conference.html, https://arxiv.org/abs/2207.10316, https://arxiv.org/abs/2311.07152, https://arxiv.org/abs/2508.16408, https://arxiv.org/abs/2504.15888
camera-bev-robustness-validation.mdRoboBEV, BEV corruptions, viewpoint robustness, missing camera inputs, occupancy uncertaintyCamera BEV validationAirside camera rigs, soiling, glare, missing cameras, and calibration changes need benchmark coverage.https://arxiv.org/abs/2304.06719, https://arxiv.org/abs/2405.17426, https://arxiv.org/abs/2309.05192, https://arxiv.org/abs/2406.11021, https://arxiv.org/abs/2603.09737
adverse-weather-radar-lidar-3d-detection.mdLiDAR-4D radar fusion, L4DR, V2X-RRadar-LiDAR adverse-weather detectionDirectly addresses when radar should rescue LiDAR detection under fog/rain.https://openaccess.thecvf.com/content/CVPR2024/papers/Chae_Towards_Robust_3D_Object_Detection_with_LiDAR_and_4D_Radar_CVPR_2024_paper.pdf, https://arxiv.org/abs/2408.03677, https://openaccess.thecvf.com/content/CVPR2025/papers/Huang_V2X-R_Cooperative_LiDAR-4D_Radar_Fusion_with_Denoising_Diffusion_for_3D_CVPR_2025_paper.pdf
Radar-camera follow-onsDepthOcc, LinkOcc, and related temporal radar-camera occupancy variantsRadar-camera depth, BEV, and temporal fusionTacoDepth, RaCFormer, CVFusion, 4D radar-camera occupancy, RobuRCDet, and SAMFusion now have atomic files; remaining radar-camera work should focus on temporal association, occupancy follow-ons, and repeatable multi-dataset validation.https://openaccess.thecvf.com/content/CVPR2025/html/Wang_TacoDepth_Towards_Efficient_Radar-Camera_Depth_Estimation_with_One-stage_Fusion_CVPR_2025_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Chu_RaCFormer_Towards_High-Quality_3D_Object_Detection_via_Query-based_Radar-Camera_Fusion_CVPR_2025_paper.html, https://arxiv.org/abs/2502.13071
Promoted: 4D Radar-Camera Occupancy4DRC-OCC and RadarOccRadar-camera semantic occupancyDense occupancy using 4D radar plus camera is distinct from LiDAR-camera occupancy and generic dynamic occupancy; keep this row as provenance and watch radar-occupancy follow-ons.https://arxiv.org/abs/2603.07794
availability-aware-sensor-fusion.mdAvailability-aware Sensor Fusion via Unified Canonical SpaceSensor-failure perceptionDirectly targets missing/degraded camera, LiDAR, and 4D radar availability instead of assuming all modalities are valid.https://arxiv.org/abs/2503.07029
event-camera-3d-detection.mdEv-3DOD / DSEC-3DOD and eAPEvent-camera 3D perceptionEvent cameras now support 3D detection and time-to-collision, not just 2D detection or VIO.https://arxiv.org/abs/2502.19630, https://openaccess.thecvf.com/content/CVPR2025/papers/Cho_Ev-3DOD_Pushing_the_Temporal_Boundaries_of_3D_Object_Detection_with_CVPR_2025_paper.pdf, https://arxiv.org/abs/2603.16303
fmcw-lidar-perception.mdAevaScenes and POD predictive object detectionFMCW / 4D LiDAR perceptionPer-point velocity changes detection, scene-flow, and predictive-detection design compared with pulsed LiDAR.https://www.aeva.com/press/aeva-introduces-aevascenes-the-first-open-access-fmcw-4d-lidar-and-camera-dataset-for-autonomous-vehicle-research/, https://arxiv.org/abs/2504.05649
4d-radar-road-boundaries-freespace.md4DRadarRBDRadar freespace and road-boundary detectionRadar-based boundary/freespace perception is separate from object detection and occupancy, especially in poor visibility.https://arxiv.org/abs/2503.01930, https://www.frontiersin.org/articles/10.3389/frsip.2025.1667789/
fully-sparse-lidar-detectors.mdVoxelNeXt, DSVT, fully sparse detector upgrade pathLiDAR 3D detectionopenpcdet-centerpoint.md is CenterPoint-heavy; modern candidates should include sparse detectors with better dense-BEV cost profiles.https://github.com/open-mmlab/OpenPCDet, https://arxiv.org/abs/2303.11301, https://arxiv.org/abs/2301.06051
thermal-lidar-dense-perception.mdThermal-LiDAR depth completion, RGB-T-LiDAR segmentation, dense night freespaceThermal/night dense perceptionCurrent thermal doc is detection/safety-use-case oriented; dense depth and segmentation matter for freespace and occupancy.https://arxiv.org/abs/2504.02356, https://doi.org/10.1016/j.isprsjprs.2026.01.008, https://oem.flir.com/en-gb/solutions/automotive/adas-dataset-form/, https://soonminhwang.github.io/rgbt-ped-detection/, https://github.com/bupt-ai-cz/LLVIP
dynamic-occupancy-freespace.mdDIO, MS-Occ, UniOccCurrent-state occupancy and freespacePerception docs need the current-state safety view that planners consume, distinct from predictive world models.https://openaccess.thecvf.com/content/CVPR2025/papers/Diehl_DIO_Decomposable_Implicit_4D_Occupancy-Flow_World_Model_CVPR_2025_paper.pdf, https://arxiv.org/abs/2504.15888, https://arxiv.org/abs/2503.24381
lidar-camera-occupancy-fusion.mdSDG-OCC and MS-OccLiDAR-camera occupancy fusionOccupancy-specific fusion uses LiDAR geometry plus camera semantics through depth guidance, distillation, and multi-stage fusion.https://openaccess.thecvf.com/content/CVPR2025/html/Duan_SDGOCC_Semantic_and_Depth-Guided_Birds-Eye_View_Transformation_for_3D_Multimodal_CVPR_2025_paper.html, https://arxiv.org/abs/2504.15888
uncertainty-aware-bev-fusion.mdGaussianLSS and HyperDUMFusion-specific uncertaintyAdds BEV depth uncertainty and low-cost feature-level multimodal uncertainty closer to deployed fusion stacks.https://openaccess.thecvf.com/content/CVPR2025/html/Lu_Toward_Real-world_BEV_Perception_Depth_Uncertainty_Estimation_via_Gaussian_Splatting_CVPR_2025_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Chen_Hyperdimensional_Uncertainty_Quantification_for_Multimodal_Uncertainty_Fusion_in_Autonomous_Vehicles_CVPR_2025_paper.html
night-thermal-event-radar-benchmarks.mdNSAVP and V2X-RadarNight, thermal, event, radar evaluationUpdates night perception beyond FLIR/KAIST-style references with stereo thermal, event cameras, 4D radar, rain/night/dusk cooperative data.https://umautobots.github.io/nsavp, https://arxiv.org/abs/2401.13853, https://arxiv.org/abs/2411.10962
edge-real-time-bev-scheduling.mdRT-BEV and JigsawDeadline-aware BEV deploymentCompression doc is TensorRT-heavy; add P99/WCET scheduling and dual-SoC partitioning.https://rtcl.eecs.umich.edu/rtclweb/assets/publications/2024/rtss24-liu.pdf, https://2024.rtss.org/accepted-papers/index.html, https://dblp.org/rec/conf/rtss/SunLHHX0BSRG24.html
40-runtime-systems/ros-autoware/sensor-in-memory-ros2-av-middleware.mdSIM shared-memory transport for ROS 2 / AutowareDeployment latency and middlewareTargets perception-to-decision tail latency, not only model inference time.https://arxiv.org/abs/2510.11448
ood-data-engine.mdDriveGEN and active OOD generationActive data and evaluationComplements uncertainty-triggered logging with controllable OOD augmentation and scenario generation.https://openaccess.thecvf.com/content/CVPR2025/html/Lin_DriveGEN_Generalized_and_Robust_3D_Detection_in_Driving_via_Controllable_CVPR_2025_paper.html
uncertainty-aware-end-to-end-3d-mot.mdS2-TrackTracking uncertaintyQuery initialization, probabilistic decoder, and uncertainty as downstream prediction signal are currently underdocumented.https://proceedings.mlr.press/v267/tang25p.html, https://openreview.net/forum?id=vHr9cdeFfu, https://arxiv.org/abs/2406.02147
relation-aware-3d-mot.mdGRAE-3DMOTRelation-aware trackingComplements MCTrack/3DMOTFormer with relation-aware association for dense scenes.https://openaccess.thecvf.com/content/CVPR2025/html/Kim_GRAE-3DMOT_Geometry_Relation-Aware_Encoder_for_Online_3D_Multi-Object_Tracking_CVPR_2025_paper.html
latency-aware-streaming-perception.mdLASP online benchmark, Transtreaming, multi-modal streaming 3D detectionLatency compensation and online evaluationSeparate acquisition latency, inference latency, output timestamp compensation, and time-stale quality on edge hardware.https://arxiv.org/abs/2504.19115, https://arxiv.org/abs/2409.06584, https://arxiv.org/abs/2209.04966
4d-world-model-pretraining.mdDriveWorldTemporal representation pretrainingConcrete 4D pretraining bridge across detection, tracking, forecasting, occupancy, and planning.https://arxiv.org/abs/2405.04390, https://openaccess.thecvf.com/content/CVPR2024/html/Min_DriveWorld_4D_Pre-trained_Scene_Understanding_via_World_Models_for_Autonomous_CVPR_2024_paper.html
object-query-cooperative-perception.mdCoopDETRSparse cooperative communicationObject-query sharing is interpretable and extremely bandwidth-light versus region/BEV maps.https://arxiv.org/abs/2502.19313
collaboration-robust-cooperative-fusion.mdmmCooperRobust multi-stage fusionTargets bandwidth limits plus calibration/misalignment errors in cooperative perception.https://openaccess.thecvf.com/content/ICCV2025/html/Liu_mmCooper_A_Multi-agent_Multi-stage_Communication-efficient_and_Collaboration-robust_Cooperative_Perception_Framework_ICCV_2025_paper.html
spatiotemporal-cooperative-perception.mdCoSTSpace-time cooperative fusionUnifies multi-agent and temporal fusion; avoids retransmitting static-object features repeatedly.https://openaccess.thecvf.com/content/ICCV2025/html/Tang_CoST_Efficient_Collaborative_Perception_From_Unified_Spatiotemporal_Perspective_ICCV_2025_paper.html
end-to-end-v2x-cooperative-driving.mdV2Xverse / CoDriving, UniV2X, Select2DriveClosed-loop V2X perception-to-planningMoves evaluation from mAP-only to driving score, collision rate, latency, and communication-aware planning.https://collaborativeperception.github.io/V2Xverse/, https://github.com/CollaborativePerception/V2Xverse, https://arxiv.org/abs/2404.00717
privacy-adaptation-v2x-collaboration.mdUnknown-collaborator privacy and CoPEFTPrivacy/domain adaptationCovers separately trained agents, privacy-preserving fusion, and low-cost per-site adaptation for multi-vendor fleets.https://mlanthology.org/aaai/2025/lu2025aaai-privacy/, https://ojs.aaai.org/index.php/AAAI/article/view/34502
auto-vocabulary-lidar-segmentation.md3D-AVSLiDAR open-vocabulary segmentationDistinguishes LiDAR foundation pretraining from runtime vocabulary discovery for point clouds.https://openaccess.thecvf.com/content/CVPR2025/html/Wei_3D-AVS_LiDAR-based_3D_Auto-Vocabulary_Segmentation_CVPR_2025_paper.html, https://arxiv.org/abs/2406.09126
mosaic3d-open-vocab-3d-segmentation.mdMosaic3DOpen-vocabulary 3D segmentationFoundation dataset/model for open-vocabulary 3D semantic and instance segmentation; fills a gap beyond 2D open-vocabulary segmentation.https://research.nvidia.com/labs/lpr/publication/choy2025mosaic/, https://openaccess.thecvf.com/content/CVPR2025/papers/Lee_Mosaic3D_Foundation_Dataset_and_Model_for_Open-Vocabulary_3D_Segmentation_CVPR_2025_paper.pdf
openvox-open-vocab-voxel-mapping.mdOpenVoxOpen-vocabulary voxel mappingReal-time instance-level probabilistic voxel memory is relevant for persistent apron maps and language-queryable scene memory.https://open-vox.github.io/
ago-open-world-3d-occupancy.mdAGO adaptive groundingOpen-world 3D occupancyKnown/unknown semantic occupancy using VLM grounding is a stronger safety-layer fit than box-only open-world detection.https://arxiv.org/abs/2504.10117, https://openaccess.thecvf.com/content/ICCV2025/papers/Li_AGO_Adaptive_Grounding_for_Open_World_3D_Occupancy_Prediction_ICCV_2025_paper.pdf
open3dtrack-open-vocab-3d-tracking.mdOpen3DTrackOpen-vocabulary 3D trackingFirst-pass gaps covered 3D detection/objectness but not persistent open-vocabulary 3D tracks.https://arxiv.org/abs/2410.01678
open-vocab-monocular-3d-detection.mdOVM3D-DetOpen-vocab monocular 3D detectionUseful as offline/assistive camera fallback, with safety caveats.https://ovm3d-det.github.io/, https://arxiv.org/abs/2411.15657
dense-open-vocab-segmentation.mdFlorence-2, CAT-Seg, ProxyCLIP, AutoSeg/AVSDense open-vocabulary labelsGrounded-SAM is not the whole dense-labeling story; pixel-level OV segmentation helps mask QA and rare-class discovery.https://arxiv.org/abs/2311.06242, https://huggingface.co/microsoft/Florence-2-large, https://openaccess.thecvf.com/content/CVPR2024/html/Cho_CAT-Seg_Cost_Aggregation_for_Open-Vocabulary_Semantic_Segmentation_CVPR_2024_paper.html, https://arxiv.org/abs/2408.04883, https://openaccess.thecvf.com/content/ICCV2025/html/Ulger_Auto-Vocabulary_Semantic_Segmentation_ICCV_2025_paper.html
ovad-open-vocab-3d-attributes.mdOVAD / OVODAOpen-vocabulary 3D attributesAdds attribute-level open-vocabulary 3D detection, which matters for airside state labels such as cones, tugs, carts, lights, covers, and unusual vehicle configurations.https://arxiv.org/abs/2508.16812
drivebench-vlm-reliability.mdDriveBenchVLM driving reliability under corruptionsTests clean, corrupted, and text-only driving QA to catch visually ungrounded VLM perception or monitoring outputs.https://drive-bench.github.io/, https://github.com/worldbench/DriveBench
airside-fod-open-world-benchmarks.mdFOD-A, FAA FOD framing, Airport-FOD3S, RDD5000, DualFOD/FOD-UASAirside FOD and runway benchmarksExisting docs say zero-shot FOD is weak but lack a benchmark plan for tiny-object recall, RGB/thermal runway visibility, synthetic FOD generation, and OOD-vs-detector comparison.https://arxiv.org/abs/2110.03072, https://github.com/FOD-UNOmaha/FOD-data, https://www.faa.gov/airports/airport_safety/fod, https://www.mdpi.com/1424-8220/25/15/4565, https://www.mdpi.com/2072-4292/17/4/669, https://www.mdpi.com/2504-446X/10/3/225
50-cloud-fleet/data-platform/airport-fod3s-synthetic-data.mdAirport-FOD3SSynthetic FOD data engineSynthetic FOD generation and blending directly addresses rare small-object data scarcity beyond FOD-A.https://www.mdpi.com/1424-8220/25/15/4565
30-autonomy-stack/perception/methods/fail2drive.mdFail2DriveClosed-loop OOD validationPaired-route distribution-shift scenarios expose perception-linked failures under closed-loop drift.https://arxiv.org/abs/2604.08535, https://github.com/autonomousvision/fail2drive

P2 or Watchlist

Method or techniqueWhy watchCurrent concernSources
VLMFusionOcc3D, OccVLA, O3NVLM-assisted and open-vocabulary occupancy are moving quickly.Promising but not yet safety perception primitives; validate hallucination, latency, and reproducibility.https://arxiv.org/abs/2603.02609, https://huggingface.co/papers/2509.05578, https://arxiv.org/abs/2603.12144
RayD3D, Dr.Occ, SuperOcc, Gau-Occ, GaussTR, GaussRenderUseful 2025-2026 camera/multimodal/Gaussian occupancy signals.Mostly preprint-stage or narrow-benchmark; track code and Occ3D/OpenOccupancy reproducibility.https://huggingface.co/papers/2603.22852, https://arxiv.org/abs/2603.01007, https://openaccess.thecvf.com/content/CVPR2025/html/Oh_3D_Occupancy_Prediction_with_Low-Resolution_Queries_via_Prototype-aware_View_Transformation_CVPR_2025_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Jiang_GaussTR_Foundation_Model-Aligned_Gaussian_Transformer_for_Self-Supervised_3D_Spatial_Understanding_CVPR_2025_paper.html, https://openaccess.thecvf.com/content/ICCV2025/html/Chambon_GaussRender_Learning_3D_Occupancy_with_Gaussian_Rendering_ICCV_2025_paper.html
EventFly, EvDET200K, TUMTraf EMOT, PRE-MambaEvent perception is expanding into dense segmentation, high-definition detection, MOT, and rain removal.Useful follow-ons for event-camera pages; promote after dataset/code adoption and hardware fit are clearer.https://openaccess.thecvf.com/content/CVPR2025/html/Kong_EventFly_Event_Camera_Perception_from_Ground_to_the_Sky_CVPR_2025_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Wang_Object_Detection_using_Event_Camera_A_MoE_Heat_Conduction_based_CVPR_2025_paper.html, https://arxiv.org/abs/2512.14595, https://arxiv.org/abs/2505.05307
DSERT-RoLLStereo event/RGB/thermal, 4D radar, and dual-LiDAR dataset for diverse driving conditions.Newly posted in 2026; wait for data/code availability before promotion.https://arxiv.org/abs/2604.03685
HyperDet, DRIFT, SD4R, R4Det2026 radar-only/radar-camera perception line.Needs code, repeatable benchmarks, and multi-dataset validation.https://arxiv.org/abs/2602.11554, https://arxiv.org/abs/2603.09695, https://arxiv.org/abs/2602.20653, https://arxiv.org/abs/2603.11566
SiMO and single-modality-operable cooperative perceptionDirectly relevant to robust collaborative perception under sensor failures.Too fresh; compare against MoME, HEAL, mmCooper, and CoSDH before promotion.https://gist.science/paper/2603.08240
SparseCoop, CoDS, JigsawComm, QuantV2X, TruckV2X2025-2026 cooperative perception wave around sparse queries, semantic communication, compression, quantization, and truck-centered V2X data.Useful but not yet mature enough for core docs; compare against CoSDH, mmCooper, CoST, and V2X-ReaLO first.https://arxiv.org/abs/2512.06838, https://arxiv.org/abs/2512.22513, https://arxiv.org/abs/2511.17843, https://huggingface.co/papers/2509.03704, https://huggingface.co/papers/2507.09505
LightDiff and JarvisIRRestoration can help low-light/weather preprocessing.Treat as monitored advisory preprocessing, not safety authority, because restoration can hallucinate.https://openaccess.thecvf.com/content/CVPR2024/html/Li_Light_the_Night_A_Multi-Condition_Diffusion_Framework_for_Unpaired_Low-Light_CVPR_2024_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Lin_JarvisIR_Elevating_Autonomous_Driving_Perception_with_Intelligent_Image_Restoration_CVPR_2025_paper.html
T-Rex2 and OWLv2 visual-prompt detectionVisual exemplars help airport-specific GSE variants where text prompts are brittle.Better for onboarding and annotation acceleration than runtime safety perception.https://arxiv.org/abs/2403.14610, https://github.com/IDEA-Research/T-Rex, https://arxiv.org/abs/2306.09683
DySS, ForeSight, GaussianDet3D, OffsetOcc, SAM4D, Diffusion-FSUseful follow-ons for streaming camera detection, Gaussian camera detection, panoptic completion, promptable LiDAR-camera segmentation, and corridor prediction.Useful research directions, but merge into existing sparse-query, occupancy, SAM/data-engine, or freespace pages unless adoption grows.https://arxiv.org/abs/2506.10242, https://openaccess.thecvf.com/content/ICCV2025/html/Papais_ForeSight_Multi-View_Streaming_Joint_Object_Detection_and_Trajectory_Forecasting_ICCV_2025_paper.html, https://openreview.net/forum?id=qByiOX1j9C, https://cvpr.thecvf.com/virtual/2025/35771, https://openaccess.thecvf.com/content/ICCV2025/html/Xu_SAM4D_Segment_Anything_in_Camera_and_LiDAR_Streams_ICCV_2025_paper.html, https://arxiv.org/abs/2507.18763

Benchmark and Dataset Gaps

Dataset or benchmarkWhy it should be addedSources
OpenADReal open-world autonomous-driving benchmark for 3D object detection across corner cases and unseen categories.https://openreview.net/forum?id=T9UDyN5Tw6, https://arxiv.org/abs/2411.17761, https://github.com/VDIGPKU/OpenAD
OVAD / OVODAOpen-vocabulary multimodal 3D detection plus attribute labels for long-tail object state reasoning.https://arxiv.org/abs/2508.16812
DriveBenchDriving VLM reliability benchmark under clean, corrupted, and text-only inputs.https://drive-bench.github.io/, https://github.com/worldbench/DriveBench
FOD-A and FAA FOD guidancePromoted into FOD and Airport Apron Detection Datasets and FOD Perception Validation; keep extending with wet/dark splits, OOD handling, and operational hazard severity.https://arxiv.org/abs/2110.03072, https://github.com/FOD-UNOmaha/FOD-data, https://www.faa.gov/airports/airport_safety/fod
Airport-FOD3S, RDD5000, DualFOD / FOD-UASSynthetic FOD generation, RGB/IR runway detection, and synchronized RGB/thermal FOD imagery are now listed in the FOD dataset page; still need a separate synthetic data-engine page if FOD generation becomes a training workstream.https://www.mdpi.com/1424-8220/25/15/4565, https://www.mdpi.com/2072-4292/17/4/669, https://www.mdpi.com/2504-446X/10/3/225
RCooper and HoloVICFixed infrastructure, multi-RSPU, and holographic intersection perception datasets useful for airport stand/terminal layouts.https://openaccess.thecvf.com/content/CVPR2024/html/Hao_RCooper_A_Real-world_Large-scale_Dataset_for_Roadside_Cooperative_Perception_CVPR_2024_paper.html, https://openaccess.thecvf.com/content/CVPR2024/html/Ma_HoloVIC_Large-scale_Dataset_and_Benchmark_for_Multi-Sensor_Holographic_Intersection_and_CVPR_2024_paper.html
V2X-Real, TUMTraf-V2X, V2X-ReaLO, V2XScenes, UrbanIng-V2X, CoInfraReal V2X, online realism, detection/tracking, large-range V2I, multi-intersection generalization, weather, 5G delay, and infrastructure fusion.https://github.com/ucla-mobility/V2X-Real, https://arxiv.org/abs/2403.16034, https://tum-traffic-dataset.github.io/tumtraf-v2x/, https://arxiv.org/abs/2403.01316, https://arxiv.org/abs/2503.10034, https://openaccess.thecvf.com/content/ICCV2025/html/Wang_V2XScenes_A_Multiple_Challenging_Traffic_Conditions_Dataset_for_Large-Range_Vehicle-Infrastructure_ICCV_2025_paper.html, https://arxiv.org/abs/2510.23478, https://arxiv.org/abs/2507.02245
V2X-RadarCooperative 4D radar plus camera/LiDAR for rainy, night, and dusk scenarios.https://openreview.net/forum?id=sTKsFIVqik, https://arxiv.org/abs/2411.10962, https://github.com/yanglei18/V2X-Radar, https://huggingface.co/datasets/yanglei18/V2X-Radar
NSAVPNight stereo-thermal/event-style adverse-visibility benchmark support.https://umautobots.github.io/nsavp, https://arxiv.org/abs/2401.13853
DSEC-3DOD, CoSEC, eTraM, EvTTC, SEVD, SPECTRAEvent-camera 3D detection, event-frame alignment, traffic detection, TTC, synthetic event data, and dense real driving labels.https://arxiv.org/abs/2502.19630, https://arxiv.org/abs/2408.08500, https://eventbasedvision.github.io/eTraM/, https://arxiv.org/abs/2403.19976, https://nail-hnu.github.io/EvTTC/, https://github.com/eventbasedvision/SEVD, https://openreview.net/forum?id=mqCUxEsfsI
AevaScenesFMCW 4D LiDAR plus camera dataset; per-point velocity changes detection, scene-flow, and predictive-perception evaluation.https://scenes.aeva.com, https://www.aeva.com/press/aeva-introduces-aevascenes-the-first-open-access-fmcw-4d-lidar-and-camera-dataset-for-autonomous-vehicle-research/, https://github.com/aevainc
L-RadSetLong-range camera, LiDAR, and 4D-radar dataset with annotated long-range radar targets.https://ieeexplore.ieee.org/document/10591452/, https://github.com/crrasjtu/L-RadSet
RainSense and REHEARSE-3DNatural rain intensity labels plus point-cloud deraining support for LiDAR/radar adverse-weather perception.https://saemobilus.sae.org/papers/rainsense-autonomous-driving-environmental-perception-dataset-rain-intensity-annotations-2025-01-7311, https://arxiv.org/abs/2504.21699
CMHTRadar, infrared, LiDAR, and camera ROS 2 data for rain/night multimodal perception.https://www.sciencedirect.com/science/article/pii/S2352340925002847
DSERT-RoLLBroad multimodal driving dataset with stereo event, RGB, thermal, 4D radar, and dual LiDAR.https://jeongyh98.github.io/dsert-roll/, https://huggingface.co/datasets/jeongyh98/DSERT-RoLL, https://arxiv.org/abs/2604.03685
K-Radar and View of DelftCore 4D radar perception benchmarks for radar-native detection and fusion.https://github.com/kaist-avelab/K-Radar, https://intelligent-vehicles.org/datasets/view-of-delft/
Dual Radar, Boreas-RT, HeRCULES, SNAIL RadarRadar/FMCW LiDAR bridge datasets for detection, tracking, odometry, place recognition, and heavy-rain/night stress testing.https://www.nature.com/articles/s41597-025-04698-2, https://github.com/adept-thu/Dual-Radar, https://www.boreas.utias.utoronto.ca, https://arxiv.org/abs/2602.16870, https://sites.google.com/view/herculesdataset, https://arxiv.org/abs/2502.01946, https://snail-radar.github.io/
IDD-AW, RASMD, TWICERGB/NIR/SWIR and digital-twin adverse-weather datasets for rain, fog, snow, glare, low light, and controlled HIL testing.https://iddaw.github.io/, https://arxiv.org/abs/2504.07603, https://twicedataset.github.io/site/
R-LiViT, RTPSeg, SemanticTHAB, Pi3DETThermal/LiDAR roadside VRU, RGB-thermal LiDAR segmentation, high-resolution LiDAR segmentation, and cross-platform 3D detection generalization.https://openaccess.thecvf.com/content/ICCV2025/html/Mirlach_R-LiViT_A_LiDAR-Visual-Thermal_Dataset_Enabling_Vulnerable_Road_User_Focused_Roadside_ICCV_2025_paper.html, https://arxiv.org/abs/2503.17122, https://github.com/sssssyf/RTPSeg, https://zenodo.org/records/14677779, https://github.com/kav-institute/SemanticLiDAR, https://pi3det.github.io/
LiDARWeather, SemanticSTF, Robo3D, RADIATEWeather and corruption benchmarks for LiDAR segmentation/detection and radar-heavy adverse-weather perception.https://arxiv.org/abs/2407.02286, https://github.com/engineerJPark/LiDARWeather, https://github.com/xiaoaoran/SemanticSTF, https://arxiv.org/abs/2303.17597, https://pro.hw.ac.uk/radiate/
MSC-Bench and MultiCorruptMulti-sensor corruptions, missing modalities, misalignment, temporal mismatch, and sensor failures.https://arxiv.org/abs/2501.01037, https://arxiv.org/abs/2402.11677
MapBenchCamera/LiDAR corruption benchmark for HD map construction robustness, complementary to object-detection corruption suites.https://mapbench.github.io/
S2R-Bench and Occluded nuScenesReal sensor anomaly, adverse weather, and parameterized sensor occlusion tests for camera/radar/LiDAR fusion.https://www.nature.com/articles/s41597-025-06255-3, https://arxiv.org/abs/2505.18631, https://arxiv.org/abs/2510.18552
RCP-BenchCooperative-perception corruption benchmark covering ego/CAV/infrastructure degradation and temporal misalignment.https://openaccess.thecvf.com/content/CVPR2025/papers/Du_RCP-Bench_Benchmarking_Robustness_for_Collaborative_Perception_Under_Diverse_Corruptions_CVPR_2025_paper.pdf, https://github.com/LuckyDush/RCP-Bench
LASP online benchmark and PLM-NetLatency-aware streaming perception and control-delay evaluation for time-stale 3D detection quality and vision-based AV control under delay.https://arxiv.org/abs/2504.19115, https://awskhalil.github.io/plm-net/, https://arxiv.org/abs/2407.16740
Fail2DriveClosed-loop paired-route benchmark for distribution-shift failures that can expose perception-linked compounding errors.https://arxiv.org/abs/2604.08535, https://github.com/autonomousvision/fail2drive
SegmentMeIfYouCan / RoadObstacle / RoadAnomalyOOD segmentation and unknown obstacle evaluation for camera-side anomaly detection.https://segmentmeifyoucan.com/
HeLiMOSHeterogeneous LiDAR moving-object segmentation benchmark/tooling.https://github.com/url-kaist/HeLiMOS-PointCloud-Toolbox
Occ3D, OpenOccupancy, OpenScene, Cam4DOcc, UniOcc, nuCraft, CarlaOcc / ADMeshOccupancy prediction, high-resolution occupancy, panoptic synthetic occupancy, and forecasting benchmarks that should be cross-linked from perception, not only world models.https://arxiv.org/abs/2304.14365, https://github.com/OpenDriveLab/OpenScene, https://openaccess.thecvf.com/content/CVPR2024/html/Ma_Cam4DOcc_Benchmark_for_Camera-Only_4D_Occupancy_Forecasting_in_Autonomous_Driving_CVPR_2024_paper.html, https://uniocc.github.io/, https://arxiv.org/abs/2503.24381, https://benjin.me/publication/eccv2024_nucraft/, https://arxiv.org/abs/2603.27238, https://mias.group/CarlaOcc

Already Covered, But Needs Better Discoverability

Existing fileAdd aliases or cross-links for
BEV Encoding ArchitecturesBEVDet, BEVDepth, BEVStereo, BEVPoolv2, BEVDet4D, SOLOFusion, DuoSpaceNet, DenseBEV, SEPatch3D.
Camera-Only Degraded PerceptionDepth Pro, Video Depth Anything, FoundationStereo, OVM3D-Det, camera occupancy fallback, MultiCorrupt/MSC-Bench degradation tests.
Open-Vocabulary and Zero-Shot DetectionSplit open-vocabulary known-class detection from open-world unknown-object discovery; add OW-OVD, SAM 3, OP3Det, 3D-AVS, Clipomaly, S2M, T-Rex2.
Vision Foundation ModelsFlorence-2, CAT-Seg, ProxyCLIP, AutoSeg/AVS, SAM 3, data-engine use cases, and OOD segmentation.
LiDAR Foundation Models3D-AVS, OpenAD, OP3Det, open-world objectness, and auto-vocabulary point-cloud segmentation.
LiDAR Semantic SegmentationLiDAR-MOS, 4DMOS, MotionSeg3D, MambaMOS, SegNet4D, Mask4D, ALPINE, HeLiMOS, moving/static labels, scene-flow signals, and 4D panoptic outputs.
OpenPCDet and CenterPointVoxelNeXt, DSVT, fully sparse detectors, RadarPillars, and camera-LiDAR/radar fusion interfaces.
Streaming Temporal PerceptionStreamMOS, 4DSegStreamer, MotionSeg3D, MambaMOS, Neural Scene Flow Priors, Cam4DOcc, UnO, DFIT-OccWorld, Drive-OccWorld, LASP, Transtreaming, acquisition-vs-inference latency.
Multi-Object TrackingS2-Track, GRAE-3DMOT, CoopTrack, uncertainty/existence output contracts, occlusion age, source modality, track history quality.
Infrastructure Cooperative PerceptionRCooper, HoloVIC, CoInfra, V2X-ReaLO, V2XScenes, UrbanIng-V2X, V2X-Radar, online realism and latency traces.
Collaborative Fleet PerceptionCoSDH, mmCooper, CoST, CoopDETR, CoopTrack, CoHFF, V2Xverse, privacy/adaptation for unknown collaborators.
Uncertainty QuantificationConformal boxes, conformal abstention, GaussianLSS, HyperDUM, segmentation/occupancy/freespace uncertainty, modality health.
Model Compression and Edge DeploymentRT-BEV, Jigsaw, P99/WCET gates, multi-model contention, compression-vs-robustness regression checks.
Night Operations and Thermal FusionNSAVP, V2X-Radar, thermal-LiDAR dense perception, event cameras, thermal calibration drift, hot-engine clutter, rain-on-lens.
Production Perception SystemsProduction validation matrix: missing modality, stale modality, miscalibration, sensor occlusion, darkness, fog/rain, degraded compute.
Data Flywheel and Data EnginesAIDE, SAM 3, Florence-2, OOD triggers, cooperative metadata, FOD-specific QA gates.

Guardrail Process

When adding or revising perception research:

  1. Check whether the method is already a dedicated file, an existing section, or a backlog item in this audit.
  2. If it is P0, create one atomic file under 30-autonomy-stack/perception/methods/ before expanding lower-priority coverage.
  3. If it is P1, either create one atomic method file or add aliases and cross-links from the closest existing method page.
  4. If it is P2/watchlist, add aliases only after primary sources, code, or dataset availability are confirmed.
  5. Keep family synthesis in top-level perception docs and detailed method evidence in methods/.
  6. After every perception expansion, update this audit, Research Index, README, and any related safety/data-engine docs.

Public research notes collected from public sources.