Perception Coverage Audit and Backlog
The perception library is broad, but the May 2026 agent sweeps found several method families that are still hidden inside larger documents or missing as first-class pages. Treat this file as the backlog for turning perception research into a method-level library similar to the SLAM coverage audit.
Current Status
| Item | Status |
|---|---|
| Dedicated perception files | 27 top-level perception synthesis/audit files, 20 dataset/benchmark pages, and 93 atomic method files in Perception Method Library. |
| Strongest existing coverage | Production sensor suites, CenterPoint/OpenPCDet, BEV basics, method-level camera BEV/occupancy, LiDAR-camera occupancy fusion, dynamic occupancy/free-space, spatiotemporal occupancy flow, Gaussian/3DGS/4DGS perception, LiDAR MOS, scene flow, LiDAR artifact removal, learned denoising, adverse-weather datasets, FOD synthetic benchmarks, motion/static separation benchmarks, radar-camera/radar-LiDAR fusion, 4D radar-camera occupancy, radar/event/FMCW, open-world/OOD, open-vocabulary attributes, panoptic/open-vocabulary occupancy, robust fusion, V2X, latency, data engines, model compression, uncertainty, and thermal fusion. |
| Most severe structural gap | Remaining P0/P1 items still need atomic pages: DR-REMOVER/ExelMap-style map-change methods, newer cooperative compression, radar/occupancy follow-ons such as DepthOcc and LinkOcc, EvOcc/evidential occupancy, SAM4D-style multimodal stream segmentation, and public-data gaps for dust, de-icing mist, steam, glycol film, and wet-apron multipath. |
| How to use this audit | Use the method library for promoted atomic pages. Use the P0/P1/P2 rows below as discovery clusters and backlog for remaining splits. Update this audit whenever a missing perception method becomes a dedicated file. |
Multi-Agent Discovery Sweep (2026-05-08)
Six parallel research agents audited camera/BEV perception, LiDAR/radar/thermal perception, foundation/open-world perception, temporal tracking, cooperative/V2X perception, and deployment robustness. A second targeted sweep then checked 2025-2026 latest-method and benchmark gaps, followed by lightweight dataset scouts for adverse-weather, event, radar, FMCW LiDAR, V2X, OOD, and latency benchmarks.
Method-Level Promotion Wave (2026-05-08)
Five parallel writing agents converted the highest-value discovery clusters into atomic method pages under methods/. Existing broad perception files should now link into this library rather than expanding into larger catch-all documents.
| Cluster | Promoted method files |
|---|---|
| Camera BEV and occupancy | BEVDet, BEVDepth, BEVStereo, SOLOFusion, TPVFormer, SurroundOcc, SparseOcc, FlashOcc, SelfOcc, RenderOcc |
| LiDAR motion, radar, event, and FMCW | LiDAR-MOS, 4DMOS, InsMOS, StreamMOS, 4DSegStreamer, SegNet4D, Mask4D, Instantaneous Motion Perception, RadarPillars, K-Radar, V2X-Radar, Ev-3DOD, AevaScenes |
| Open-world and open-vocabulary | OpenAD, OP3Det, WildDet3D, DetAny3D, OW-OVD, Clipomaly, S2M, SAM 3, 3D-AVS, Mosaic3D, OpenVox |
| Robust fusion, calibration, and validation | MoME, GraphBEV, SOAC, RC-AutoCalib, ASF, MSC-Bench, MultiCorrupt, S2R-Bench, Occluded nuScenes, Conformal Boxes |
| Cooperative, latency, closed-loop, and data engines | RCooper, HoloVIC, CoInfra, V2X-ReaLO, CoHFF, CoSDH, CoopTrack, LASP, Fail2Drive, AIDE |
Deep-Dive Promotion Wave (2026-05-09)
Fresh parallel discovery and writing agents promoted the first Gaussian/4DGS and latest radar-camera/sparse-query gaps into atomic pages. The rows below are now the primary pages for those methods; older backlog rows remain as provenance for related follow-ons.
| Cluster | Promoted method files |
|---|---|
| Gaussian and 3DGS perception | SplatAD, GaussianFormer, GaussianOcc, Streaming Gaussian Occupancy |
| 4D occupancy and sparse-query perception | Cam4DOcc, StreamingFlow, Sparse4D |
| Radar-camera fusion | TacoDepth, RaCFormer |
LIORNet and Adverse-Weather Removal Wave (2026-05-09)
The next loop broadened the LIORNet question from one snow-removal method into a full removal stack: learned LiDAR denoising, classical outlier filters, weather artifact removal, ghost/multipath failure modes, dynamic map cleaning, validation evidence, and weather robustness datasets.
| Cluster | Promoted files |
|---|---|
| Learned LiDAR denoising/removal | LIORNet, LiSnowNet, SLiDE, TripleMixer, 3D-KNN Blind-Spot Desnowing, 3D-OutDet, AdverseNet, DenoiseCP-Net |
| Classical and broad artifact removal | Classical LiDAR Outlier Removal, LiDAR Weather Artifact Removal, LiDAR Artifact Removal Techniques, LiDAR Ghost and Multipath Artifacts |
| Weather robustness datasets | Weather Robustness Datasets, WADS, CADC/CADC+, SemanticSTF, REHEARSE-3D, RainSense, SemanticSpray, RADIATE, Seeing Through Fog/DENSE |
| Safety and map-cleaning bridge | LiDAR Artifact Removal Validation, ERASOR, Removert, LiDAR Map Cleaning and Dynamic Removal |
Dynamic/Static Removal and Flow Benchmark Wave (2026-05-09)
The latest loop broadened removal from "delete noisy LiDAR points" into map hygiene: dynamic objects, static objects that do not belong in the persistent map, moved objects, scene flow, moving/static separation, and validation against false deletion.
| Cluster | Promoted files |
|---|---|
| MOS and scene-flow methods | MotionSeg3D, MambaMOS, Neural Scene Flow Priors, Scene Flow for Dynamic Object Removal |
| Motion/static and 4D occupancy benchmarks | Moving/Static Separation MOS Datasets, Occupancy-Flow and 4D Occupancy Benchmarks, Scene-Flow Datasets and Benchmarks |
| Map-cleaning validation bridge | Dynamic Map Cleaning Benchmarks, Moved-Object and Map-Change Datasets, Airside Dynamic Map-Cleaning Benchmark |
Web Gap Expansion Wave (2026-05-09)
Five web-search scouts and six writing agents promoted a new batch of radar, neural-field, dataset, and validation gaps into first-class pages.
| Cluster | Promoted files |
|---|---|
| Radar, 4D radar, and FMCW perception | CVFusion, 4D Radar-Camera Occupancy, POD FMCW LiDAR Predictive Detection |
| Dynamic Gaussian, neural-field, and temporal occupancy | DrivingGaussian, HUGS, SplatFlow, DistillNeRF, TrackOcc, Cross-Domain LiDAR Scene Flow |
| World-model bridge | Self-Supervised Occupancy Flow, UniScene Occupancy-Centric Generation |
| Datasets and validation | MUSES, Sensor Corruption Robustness Benchmarks, Open-World OOD and Anomaly Segmentation Benchmarks, FOD and Airport Apron Detection Datasets, FOD Perception Validation, Knowledge-Base Evaluation Protocol |
Reliability and Open-World Gap Wave (2026-05-09)
Six fresh web-search scouts and six writing agents promoted another set of non-duplicate perception gaps into first-class files.
| Cluster | Promoted files |
|---|---|
| Occupancy, freespace, and adverse-weather fusion | LiDAR-Camera Occupancy Fusion, Dynamic Occupancy and Freespace, Spatiotemporal Memory Occupancy Flow, Adverse-Weather Radar-LiDAR 3D Detection, RobuRCDet, SAMFusion |
| Open-world and open-vocabulary coverage | OVAD/OVODA Open-Vocabulary 3D Attributes, Open-Vocabulary Panoptic Occupancy |
| Dataset and benchmark coverage | STU 3D LiDAR Anomaly Segmentation, Airside FOD Synthetic Multimodal Benchmarks, RCP-Bench Cooperative Corruption Robustness, V2X Large-Range Sequential Datasets |
Next perception promotion queue: EvOcc, DepthOcc, LinkOcc, missing-view resilient occupancy, Gaussian-rendered occupancy, 4D radar road-boundary/freespace detection, SparseBEV, DETR4D, ForeSight, DriveBench, SAM4D, Open3DTrack, indoor open-vocabulary 3D instance segmentation, embodied robotics 3D perception benchmarks, Airport-FOD3S data-engine coverage, and airside-specific dust/de-icing/steam validation datasets.
P0 Discovery Clusters To Split Or Link
| Suggested file | Method or technique | Category | Why it matters | Primary sources |
|---|---|---|---|---|
camera-bev-depth-aware-encoders.md | BEVDet, BEVDepth, BEVStereo, BEVPoolv2, BEVDet4D, SOLOFusion | Depth-aware camera BEV | The current BEV doc jumps from LSS to BEVFormer; depth-supervised LSS-family encoders are more practical for LiDAR-supervised airside camera fallback. | https://arxiv.org/abs/2206.10092, https://arxiv.org/abs/2209.10248, https://arxiv.org/abs/2203.17054, https://openreview.net/forum?id=H3HcEJA2Um |
camera-only-3d-occupancy.md | TPVFormer, SurroundOcc, Occ3D/OpenOcc, FB-OCC, FlashOcc, SparseOcc, SelfOcc, RenderOcc | Camera semantic occupancy | Camera occupancy is now a core planning-facing alternative to boxes for arbitrary shapes, FOD, personnel, and unknown obstacles. | https://openaccess.thecvf.com/content/CVPR2023/html/Huang_Tri-Perspective_View_for_Vision-Based_3D_Semantic_Occupancy_Prediction_CVPR_2023_paper.html, https://openaccess.thecvf.com/content/ICCV2023/html/Wei_SurroundOcc_Multi-camera_3D_Occupancy_Prediction_for_Autonomous_Driving_ICCV_2023_paper.html, https://arxiv.org/abs/2304.14365, https://openaccess.thecvf.com/content/CVPR2024/html/Tang_SparseOcc_Rethinking_Sparse_Latent_Representation_for_Vision-Based_Semantic_Occupancy_Prediction_CVPR_2024_paper.html |
lidar-moving-object-segmentation.md | LiDAR-MOS, 4DMOS, InsMOS, MF-MOS, MV-MOS/CV-MOS, SegNet4D, Mask4D, HeLiMOS | Moving-object segmentation and 4D panoptic LiDAR | Airside needs point-level static/dynamic separation for parked-vs-moving GSE, map cleanup, track birth, and prediction gating. | https://arxiv.org/abs/2105.08971, https://arxiv.org/abs/2206.04129, https://arxiv.org/abs/2401.17023, https://arxiv.org/abs/2408.10602, https://arxiv.org/abs/2406.16279, https://github.com/url-kaist/HeLiMOS-PointCloud-Toolbox |
streaming-mos-4d-panoptic.md | StreamMOS and 4DSegStreamer | Streaming temporal segmentation | Adds memory across inferences, not only within one scan window; useful for high-FPS streams and occlusion consistency. | https://arxiv.org/abs/2407.17905, https://llada60.github.io/4DSegStreamer/ |
instantaneous-motion-perception.md | Instantaneous perception of moving objects in 3D | Subtle motion estimation | Airside has creeping tugs and quasi-static objects; this fills the gap between single-frame detection and track-derived velocity. | https://openaccess.thecvf.com/content/CVPR2024/html/Liu_Instantaneous_Perception_of_Moving_Objects_in_3D_CVPR_2024_paper.html, https://arxiv.org/abs/2405.02781 |
4d-occupancy-forecasting.md | Cam4DOcc, StreamingFlow, UnO, DFIT-OccWorld, Drive-OccWorld | Perception-to-prediction handoff | Tracks alone cannot represent unknown, partially observed, or occluded movers; 4D occupancy/flow is a stronger planner interface. | https://openaccess.thecvf.com/content/CVPR2024/html/Ma_Cam4DOcc_Benchmark_for_Camera-Only_4D_Occupancy_Forecasting_in_Autonomous_Driving_CVPR_2024_paper.html, https://openaccess.thecvf.com/content/CVPR2024/html/Shi_StreamingFlow_Streaming_Occupancy_Forecasting_with_Asynchronous_Multi-modal_Data_Streams_via_CVPR_2024_paper.html, https://openaccess.thecvf.com/content/CVPR2024/html/Agro_UnO_Unsupervised_Occupancy_Fields_for_Perception_and_Forecasting_CVPR_2024_paper.html, https://arxiv.org/abs/2412.13772 |
4d-radar-perception-methods.md | K-Radar, View of Delft, RadarPillars, V2X-Radar | 4D radar perception | Radar is covered mainly as hardware/fusion; the perception stack needs tensor-vs-point representations, Doppler/RCS encoding, and benchmark limits. | https://github.com/kaist-avelab/K-Radar, https://intelligent-vehicles.org/datasets/view-of-delft/, https://arxiv.org/abs/2408.05020, https://arxiv.org/abs/2411.10962 |
multi-sensor-corruption-benchmarks.md | MSC-Bench and MultiCorrupt | Robustness and corruption evaluation | Adds systematic camera+LiDAR corruptions, missing inputs, misalignment, temporal mismatch, adverse weather, and sensor failures. | https://arxiv.org/abs/2501.01037, https://arxiv.org/abs/2402.11677, https://github.com/ika-rwth-aachen/MultiCorrupt |
real-sensor-anomaly-robustness.md | S2R-Bench and Occluded nuScenes | Real sensor anomaly and occlusion validation | Adds real adverse-weather/sensor-anomaly data and parameterized camera/radar/LiDAR occlusion tests beyond synthetic corruption overlays. | https://www.nature.com/articles/s41597-025-06255-3, https://arxiv.org/abs/2505.18631, https://arxiv.org/abs/2510.18552 |
open-world-ood-perception-benchmarks.md | OpenAD and S2M OOD object segmentation | Open-world/OOD evaluation | Current docs cover generic OOD scores; these add benchmarkable 3D corner cases and mask-level unknown-object localization. | https://arxiv.org/abs/2411.17761, https://github.com/VDIGPKU/OpenAD, https://openaccess.thecvf.com/content/CVPR2024/html/Zhao_Segment_Every_Out-of-Distribution_Object_CVPR_2024_paper.html |
open-world-open-vocab-detection.md | OW-OVD | Open-world plus open-vocabulary 2D detection | Existing open-vocab coverage emphasizes prompted known classes; OW-OVD adds unknown-object recall and incremental learning. | https://openaccess.thecvf.com/content/CVPR2025/html/Xi_OW-OVD_Unified_Open_World_and_Open_Vocabulary_Object_Detection_CVPR_2025_paper.html |
open-world-3d-objectness.md | OpenAD, OP3Det, WildDet3D, DetAny3D | Prompt-free and promptable 3D objectness | Fills the gap between closed-set 3D detection, text-prompted 3D detection, and true unknown-object discovery in 3D. | https://openreview.net/forum?id=T9UDyN5Tw6, https://arxiv.org/abs/2411.17761, https://openreview.net/forum?id=wEOmS8Aw1W, https://arxiv.org/abs/2510.17686, https://arxiv.org/abs/2604.08626, https://openaccess.thecvf.com/content/ICCV2025/papers/Zhang_Detect_Anything_3D_in_the_Wild_ICCV_2025_paper.pdf |
open-world-anomaly-segmentation-driving.md | Clipomaly, S2M, SegmentMeIfYouCan | Unknown-region and anomaly segmentation | Dedicated anomaly stack for OOD masks, RoadObstacle/RoadAnomaly metrics, and airside FOD adaptation. | https://arxiv.org/abs/2512.01427, https://arxiv.org/abs/2311.16516, https://segmentmeifyoucan.com/ |
sam3-concept-segmentation-airside.md | SAM 3 / SAM 3.1 concept segmentation and tracking | Foundation-model auto-labeling | Current docs stop at SAM/SAM2/Grounded-SAM2; SAM 3 adds open-vocabulary concept prompts, exemplars, and video tracking for GSE/FOD labeling. | https://arxiv.org/abs/2511.16719, https://github.com/facebookresearch/sam3 |
resilient-fusion-sensor-failure.md | MoME / nuScenes-R failure routing | Robust fusion under sensor failure | Directly handles LiDAR drop, camera drop, limited FOV, beam reduction, and occlusion as perception architecture, not only robustness theory. | https://openaccess.thecvf.com/content/CVPR2025/html/Park_Resilient_Sensor_Fusion_Under_Adverse_Sensor_Failures_via_Multi-Modal_Expert_CVPR_2025_paper.html, https://github.com/konyul/MoME |
robust-fusion-asynchrony-calibration.md | Staleness-aware fusion, GraphBEV, SOAC, RC-AutoCalib | Temporal sync and calibration drift | Covers silent fusion failure from stale sensors, extrinsic drift, LiDAR-camera misalignment, radar-camera calibration, and asynchronous inputs. | https://arxiv.org/abs/2506.05780, https://arxiv.org/abs/2403.11848, https://github.com/adept-thu/GraphBEV, https://openaccess.thecvf.com/content/CVPR2024/html/Herau_SOAC_Spatio-Temporal_Overlap-Aware_Multi-Sensor_Calibration_using_Neural_Radiance_Fields_CVPR_2024_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Luu_RC-AutoCalib_An_End-to-End_Radar-Camera_Automatic_Calibration_Network_CVPR_2025_paper.html |
roadside-infrastructure-cooperative-datasets.md | RCooper and HoloVIC | Roadside/infrastructure cooperative perception datasets | Adds real multi-RSPU and multi-layout infrastructure perception, closer to airport fixed-sensor coverage than V2V-only datasets. | https://openaccess.thecvf.com/content/CVPR2024/html/Hao_RCooper_A_Real-world_Large-scale_Dataset_for_Roadside_Cooperative_Perception_CVPR_2024_paper.html, https://github.com/AIR-THU/DAIR-RCooper, https://openaccess.thecvf.com/content/CVPR2024/html/Ma_HoloVIC_Large-scale_Dataset_and_Benchmark_for_Multi-Sensor_Holographic_Intersection_and_CVPR_2024_paper.html, https://arxiv.org/abs/2403.02640 |
infrastructure-adverse-weather-cooperative-perception.md | CoInfra | Multi-node infrastructure plus adverse weather | Strong airport analogue: synchronized infrastructure nodes, real weather, 5G delay-aware fusion, OTA, monitoring, and system docs. | https://arxiv.org/abs/2507.02245, https://github.com/NingMingHao/CoInfra |
v2x-online-realtime-benchmarks.md | V2X-ReaLO | Online V2X framework and dataset | Existing docs are mostly offline-benchmark oriented; this targets synchronized ROS bags, latency, and online intermediate fusion. | https://arxiv.org/abs/2503.10034 |
cooperative-occupancy-perception.md | CoHFF collaborative semantic occupancy | Cooperative occupancy | Fills a major gap: dense 3D semantic occupancy, not only boxes, for irregular cargo, unknown obstacles, and FOD. | https://openaccess.thecvf.com/content/CVPR2024/html/Song_Collaborative_Semantic_Occupancy_Prediction_with_Hybrid_Feature_Fusion_in_Connected_CVPR_2024_paper.html, https://github.com/rruisong/CoHFF |
communication-efficient-cooperative-fusion.md | CoSDH | Bandwidth-efficient hybrid fusion | Supply-demand region selection plus intermediate/late hybridization is more deployment-oriented than pure intermediate fusion. | https://openaccess.thecvf.com/content/CVPR2025/html/Xu_CoSDH_Communication-Efficient_Collaborative_Perception_via_Supply-Demand_Awareness_and_Intermediate-Late_Hybridization_CVPR_2025_paper.html |
cooperative-sequential-tracking.md | CoopTrack | Cooperative MOT and sequential perception | Existing cooperative docs emphasize detection; airport planners need stable IDs and motion continuity across agents. | https://openaccess.thecvf.com/content/ICCV2025/html/Zhong_CoopTrack_Exploring_End-to-End_Learning_for_Efficient_Cooperative_Sequential_Perception_ICCV_2025_paper.html, https://arxiv.org/abs/2507.19239 |
uncertainty-detection-conformal-abstention.md | Conformal bounding boxes and conformal abstention | Detection-specific uncertainty | Upgrades conformal coverage from class sets to localization intervals and explicit abstain/degrade decisions. | https://arxiv.org/abs/2403.07263, https://arxiv.org/abs/2502.07255 |
30-autonomy-stack/perception/methods/aide.md | AIDE automatic data engine | Long-tail detection data engine | Concrete loop for issue mining, auto-labeling, scenario generation, and open-world detection evaluation. | https://openaccess.thecvf.com/content/CVPR2024/html/Liang_AIDE_An_Automatic_Data_Engine_for_Object_Detection_in_Autonomous_CVPR_2024_paper.html, https://arxiv.org/abs/2403.17373 |
30-autonomy-stack/perception/methods/vespa-open-world-pointcloud-labeling.md | VESPA and VLM-assisted 3D auto-labeling | Open-world point-cloud labeling | Multimodal pseudo-labeling without GT annotations or HD maps directly addresses airside 3D labeling bottlenecks. | https://arxiv.org/abs/2507.20397, https://www.sciencedirect.com/science/article/pii/S0968090X25004334 |
P1 Remaining Atomic Pages or Major Sections
| Suggested file | Method or technique | Category | Why it matters | Primary sources |
|---|---|---|---|---|
camera-panoptic-open-vocab-occupancy.md | PanoOcc, VEON, LangOcc, OpenOcc | Camera panoptic/open-vocabulary occupancy | Adds instance-level and open-vocabulary 3D scene understanding for rare GSE, barriers, FOD, and long-tail objects. | https://openaccess.thecvf.com/content/CVPR2024/html/Wang_PanoOcc_Unified_Occupancy_Representation_for_Camera-based_3D_Panoptic_Segmentation_CVPR_2024_paper.html, https://arxiv.org/abs/2407.12294, https://openreview.net/forum?id=KhjlXNbYea, https://github.com/boschresearch/LangOcc, https://arxiv.org/abs/2403.11796 |
camera-4d-occupancy-forecasting.md | Cam4DOcc, OccProphet, ALOcc | Temporal camera occupancy | Current temporal perception coverage is stronger on detection/tracking than camera 4D occupancy. | https://openaccess.thecvf.com/content/CVPR2024/html/Ma_Cam4DOcc_Benchmark_for_Camera-Only_4D_Occupancy_Forecasting_in_Autonomous_Driving_CVPR_2024_paper.html, https://openreview.net/forum?id=vC7AlY1ytz, https://huggingface.co/papers/2411.07725 |
streaming-gaussian-occupancy.md | GaussianWorld and S2GO | Streaming Gaussian occupancy | Moves from per-frame occupancy toward persistent sparse Gaussian/query world state for real-time temporal occupancy. | https://openaccess.thecvf.com/content/CVPR2025/html/Zuo_GaussianWorld_Gaussian_World_Model_for_Streaming_3D_Occupancy_Prediction_CVPR_2025_paper.html, https://openreview.net/forum?id=z8ggdMlSco, https://arxiv.org/abs/2506.05473 |
generalized-uncalibrated-occupancy.md | OccAny and ViGT | Generalized / calibration-free occupancy | Targets uncalibrated, out-of-domain, monocular, sequential, and surround-view occupancy for heterogeneous camera rigs. | https://arxiv.org/abs/2603.23502, https://github.com/valeoai/OccAny, https://arxiv.org/abs/2602.05573 |
missing-view-resilient-occupancy.md | M2-Occ | Incomplete multi-camera robustness | Directly tests missing camera/view dropout for semantic occupancy. | https://arxiv.org/abs/2603.09737 |
occupancy-uncertainty-ood-evidence.md | EvOcc and ProOOD | Occupancy uncertainty / OOD | Handles unknown objects, occluded voxels, contradictory evidence, and long-tail classes at occupancy level. | https://openaccess.thecvf.com/content/CVPR2025/html/Kalble_EvOcc_Accurate_Semantic_Occupancy_for_Automated_Driving_Using_Evidence_Theory_CVPR_2025_paper.html, https://arxiv.org/abs/2604.01081 |
satellite-assisted-occupancy.md | SA-Occ | Satellite / overhead-map assisted occupancy | Especially relevant to airside because overhead imagery and GPS/IMU can reduce occlusion and distant-region weakness. | https://openaccess.thecvf.com/content/ICCV2025/html/Chen_SA-Occ_Satellite-Assisted_3D_Occupancy_Prediction_in_Real_World_ICCV_2025_paper.html |
spatiotemporal-memory-occupancy-flow.md | ST-Occ and STCOcc | Temporal occupancy and scene flow | Adds memory, uncertainty-aware aggregation, sparse state refinement, and scene-flow prediction as a planner-facing interface. | https://openaccess.thecvf.com/content/ICCV2025/html/Leng_Occupancy_Learning_with_Spatiotemporal_Memory_ICCV_2025_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Liao_STCOcc_Sparse_Spatial-Temporal_Cascade_Renovation_for_3D_Occupancy_and_Scene_CVPR_2025_paper.html |
sparse-query-camera-3d-detection.md | Sparse4D, SparseBEV, DETR4D, object-query temporal memory | Sparse multi-view camera detection | Focused taxonomy of sparse query methods versus dense BEV methods for edge deployment. | https://arxiv.org/abs/2211.10581, https://arxiv.org/abs/2308.09244, https://arxiv.org/abs/2212.07849, https://arxiv.org/abs/2602.00450 |
camera-depth-stereo-foundation-models.md | Depth Pro, Video Depth Anything, FoundationStereo | Camera depth and stereo foundation models | Updates fallback perception beyond Depth Anything v2, Metric3D, and UniDepth with metric, temporal, and zero-shot stereo options. | https://arxiv.org/abs/2410.02073, https://arxiv.org/abs/2501.12375, https://github.com/DepthAnything/Video-Depth-Anything, https://arxiv.org/abs/2501.09898, https://research.nvidia.com/labs/lpr/publication/stereoanything2025/ |
camera-lidar-fusion-interfaces.md | FUTR3D, CMT, DeepInteraction, AutoAlignV2, DAL, SAMFusion, MS-Occ | Camera-LiDAR fusion interfaces | Existing docs cover BEVFusion but underrepresent query fusion, calibration-tolerant alignment, dynamic modality dropout, and occupancy fusion. | https://arxiv.org/abs/2203.10642, https://arxiv.org/abs/2301.01283, https://proceedings.neurips.cc/paper_files/paper/2022/hash/0d18ab3b5fabfa6fe47c62e711af02f0-Abstract-Conference.html, https://arxiv.org/abs/2207.10316, https://arxiv.org/abs/2311.07152, https://arxiv.org/abs/2508.16408, https://arxiv.org/abs/2504.15888 |
camera-bev-robustness-validation.md | RoboBEV, BEV corruptions, viewpoint robustness, missing camera inputs, occupancy uncertainty | Camera BEV validation | Airside camera rigs, soiling, glare, missing cameras, and calibration changes need benchmark coverage. | https://arxiv.org/abs/2304.06719, https://arxiv.org/abs/2405.17426, https://arxiv.org/abs/2309.05192, https://arxiv.org/abs/2406.11021, https://arxiv.org/abs/2603.09737 |
adverse-weather-radar-lidar-3d-detection.md | LiDAR-4D radar fusion, L4DR, V2X-R | Radar-LiDAR adverse-weather detection | Directly addresses when radar should rescue LiDAR detection under fog/rain. | https://openaccess.thecvf.com/content/CVPR2024/papers/Chae_Towards_Robust_3D_Object_Detection_with_LiDAR_and_4D_Radar_CVPR_2024_paper.pdf, https://arxiv.org/abs/2408.03677, https://openaccess.thecvf.com/content/CVPR2025/papers/Huang_V2X-R_Cooperative_LiDAR-4D_Radar_Fusion_with_Denoising_Diffusion_for_3D_CVPR_2025_paper.pdf |
| Radar-camera follow-ons | DepthOcc, LinkOcc, and related temporal radar-camera occupancy variants | Radar-camera depth, BEV, and temporal fusion | TacoDepth, RaCFormer, CVFusion, 4D radar-camera occupancy, RobuRCDet, and SAMFusion now have atomic files; remaining radar-camera work should focus on temporal association, occupancy follow-ons, and repeatable multi-dataset validation. | https://openaccess.thecvf.com/content/CVPR2025/html/Wang_TacoDepth_Towards_Efficient_Radar-Camera_Depth_Estimation_with_One-stage_Fusion_CVPR_2025_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Chu_RaCFormer_Towards_High-Quality_3D_Object_Detection_via_Query-based_Radar-Camera_Fusion_CVPR_2025_paper.html, https://arxiv.org/abs/2502.13071 |
| Promoted: 4D Radar-Camera Occupancy | 4DRC-OCC and RadarOcc | Radar-camera semantic occupancy | Dense occupancy using 4D radar plus camera is distinct from LiDAR-camera occupancy and generic dynamic occupancy; keep this row as provenance and watch radar-occupancy follow-ons. | https://arxiv.org/abs/2603.07794 |
availability-aware-sensor-fusion.md | Availability-aware Sensor Fusion via Unified Canonical Space | Sensor-failure perception | Directly targets missing/degraded camera, LiDAR, and 4D radar availability instead of assuming all modalities are valid. | https://arxiv.org/abs/2503.07029 |
event-camera-3d-detection.md | Ev-3DOD / DSEC-3DOD and eAP | Event-camera 3D perception | Event cameras now support 3D detection and time-to-collision, not just 2D detection or VIO. | https://arxiv.org/abs/2502.19630, https://openaccess.thecvf.com/content/CVPR2025/papers/Cho_Ev-3DOD_Pushing_the_Temporal_Boundaries_of_3D_Object_Detection_with_CVPR_2025_paper.pdf, https://arxiv.org/abs/2603.16303 |
fmcw-lidar-perception.md | AevaScenes and POD predictive object detection | FMCW / 4D LiDAR perception | Per-point velocity changes detection, scene-flow, and predictive-detection design compared with pulsed LiDAR. | https://www.aeva.com/press/aeva-introduces-aevascenes-the-first-open-access-fmcw-4d-lidar-and-camera-dataset-for-autonomous-vehicle-research/, https://arxiv.org/abs/2504.05649 |
4d-radar-road-boundaries-freespace.md | 4DRadarRBD | Radar freespace and road-boundary detection | Radar-based boundary/freespace perception is separate from object detection and occupancy, especially in poor visibility. | https://arxiv.org/abs/2503.01930, https://www.frontiersin.org/articles/10.3389/frsip.2025.1667789/ |
fully-sparse-lidar-detectors.md | VoxelNeXt, DSVT, fully sparse detector upgrade path | LiDAR 3D detection | openpcdet-centerpoint.md is CenterPoint-heavy; modern candidates should include sparse detectors with better dense-BEV cost profiles. | https://github.com/open-mmlab/OpenPCDet, https://arxiv.org/abs/2303.11301, https://arxiv.org/abs/2301.06051 |
thermal-lidar-dense-perception.md | Thermal-LiDAR depth completion, RGB-T-LiDAR segmentation, dense night freespace | Thermal/night dense perception | Current thermal doc is detection/safety-use-case oriented; dense depth and segmentation matter for freespace and occupancy. | https://arxiv.org/abs/2504.02356, https://doi.org/10.1016/j.isprsjprs.2026.01.008, https://oem.flir.com/en-gb/solutions/automotive/adas-dataset-form/, https://soonminhwang.github.io/rgbt-ped-detection/, https://github.com/bupt-ai-cz/LLVIP |
dynamic-occupancy-freespace.md | DIO, MS-Occ, UniOcc | Current-state occupancy and freespace | Perception docs need the current-state safety view that planners consume, distinct from predictive world models. | https://openaccess.thecvf.com/content/CVPR2025/papers/Diehl_DIO_Decomposable_Implicit_4D_Occupancy-Flow_World_Model_CVPR_2025_paper.pdf, https://arxiv.org/abs/2504.15888, https://arxiv.org/abs/2503.24381 |
lidar-camera-occupancy-fusion.md | SDG-OCC and MS-Occ | LiDAR-camera occupancy fusion | Occupancy-specific fusion uses LiDAR geometry plus camera semantics through depth guidance, distillation, and multi-stage fusion. | https://openaccess.thecvf.com/content/CVPR2025/html/Duan_SDGOCC_Semantic_and_Depth-Guided_Birds-Eye_View_Transformation_for_3D_Multimodal_CVPR_2025_paper.html, https://arxiv.org/abs/2504.15888 |
uncertainty-aware-bev-fusion.md | GaussianLSS and HyperDUM | Fusion-specific uncertainty | Adds BEV depth uncertainty and low-cost feature-level multimodal uncertainty closer to deployed fusion stacks. | https://openaccess.thecvf.com/content/CVPR2025/html/Lu_Toward_Real-world_BEV_Perception_Depth_Uncertainty_Estimation_via_Gaussian_Splatting_CVPR_2025_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Chen_Hyperdimensional_Uncertainty_Quantification_for_Multimodal_Uncertainty_Fusion_in_Autonomous_Vehicles_CVPR_2025_paper.html |
night-thermal-event-radar-benchmarks.md | NSAVP and V2X-Radar | Night, thermal, event, radar evaluation | Updates night perception beyond FLIR/KAIST-style references with stereo thermal, event cameras, 4D radar, rain/night/dusk cooperative data. | https://umautobots.github.io/nsavp, https://arxiv.org/abs/2401.13853, https://arxiv.org/abs/2411.10962 |
edge-real-time-bev-scheduling.md | RT-BEV and Jigsaw | Deadline-aware BEV deployment | Compression doc is TensorRT-heavy; add P99/WCET scheduling and dual-SoC partitioning. | https://rtcl.eecs.umich.edu/rtclweb/assets/publications/2024/rtss24-liu.pdf, https://2024.rtss.org/accepted-papers/index.html, https://dblp.org/rec/conf/rtss/SunLHHX0BSRG24.html |
40-runtime-systems/ros-autoware/sensor-in-memory-ros2-av-middleware.md | SIM shared-memory transport for ROS 2 / Autoware | Deployment latency and middleware | Targets perception-to-decision tail latency, not only model inference time. | https://arxiv.org/abs/2510.11448 |
ood-data-engine.md | DriveGEN and active OOD generation | Active data and evaluation | Complements uncertainty-triggered logging with controllable OOD augmentation and scenario generation. | https://openaccess.thecvf.com/content/CVPR2025/html/Lin_DriveGEN_Generalized_and_Robust_3D_Detection_in_Driving_via_Controllable_CVPR_2025_paper.html |
uncertainty-aware-end-to-end-3d-mot.md | S2-Track | Tracking uncertainty | Query initialization, probabilistic decoder, and uncertainty as downstream prediction signal are currently underdocumented. | https://proceedings.mlr.press/v267/tang25p.html, https://openreview.net/forum?id=vHr9cdeFfu, https://arxiv.org/abs/2406.02147 |
relation-aware-3d-mot.md | GRAE-3DMOT | Relation-aware tracking | Complements MCTrack/3DMOTFormer with relation-aware association for dense scenes. | https://openaccess.thecvf.com/content/CVPR2025/html/Kim_GRAE-3DMOT_Geometry_Relation-Aware_Encoder_for_Online_3D_Multi-Object_Tracking_CVPR_2025_paper.html |
latency-aware-streaming-perception.md | LASP online benchmark, Transtreaming, multi-modal streaming 3D detection | Latency compensation and online evaluation | Separate acquisition latency, inference latency, output timestamp compensation, and time-stale quality on edge hardware. | https://arxiv.org/abs/2504.19115, https://arxiv.org/abs/2409.06584, https://arxiv.org/abs/2209.04966 |
4d-world-model-pretraining.md | DriveWorld | Temporal representation pretraining | Concrete 4D pretraining bridge across detection, tracking, forecasting, occupancy, and planning. | https://arxiv.org/abs/2405.04390, https://openaccess.thecvf.com/content/CVPR2024/html/Min_DriveWorld_4D_Pre-trained_Scene_Understanding_via_World_Models_for_Autonomous_CVPR_2024_paper.html |
object-query-cooperative-perception.md | CoopDETR | Sparse cooperative communication | Object-query sharing is interpretable and extremely bandwidth-light versus region/BEV maps. | https://arxiv.org/abs/2502.19313 |
collaboration-robust-cooperative-fusion.md | mmCooper | Robust multi-stage fusion | Targets bandwidth limits plus calibration/misalignment errors in cooperative perception. | https://openaccess.thecvf.com/content/ICCV2025/html/Liu_mmCooper_A_Multi-agent_Multi-stage_Communication-efficient_and_Collaboration-robust_Cooperative_Perception_Framework_ICCV_2025_paper.html |
spatiotemporal-cooperative-perception.md | CoST | Space-time cooperative fusion | Unifies multi-agent and temporal fusion; avoids retransmitting static-object features repeatedly. | https://openaccess.thecvf.com/content/ICCV2025/html/Tang_CoST_Efficient_Collaborative_Perception_From_Unified_Spatiotemporal_Perspective_ICCV_2025_paper.html |
end-to-end-v2x-cooperative-driving.md | V2Xverse / CoDriving, UniV2X, Select2Drive | Closed-loop V2X perception-to-planning | Moves evaluation from mAP-only to driving score, collision rate, latency, and communication-aware planning. | https://collaborativeperception.github.io/V2Xverse/, https://github.com/CollaborativePerception/V2Xverse, https://arxiv.org/abs/2404.00717 |
privacy-adaptation-v2x-collaboration.md | Unknown-collaborator privacy and CoPEFT | Privacy/domain adaptation | Covers separately trained agents, privacy-preserving fusion, and low-cost per-site adaptation for multi-vendor fleets. | https://mlanthology.org/aaai/2025/lu2025aaai-privacy/, https://ojs.aaai.org/index.php/AAAI/article/view/34502 |
auto-vocabulary-lidar-segmentation.md | 3D-AVS | LiDAR open-vocabulary segmentation | Distinguishes LiDAR foundation pretraining from runtime vocabulary discovery for point clouds. | https://openaccess.thecvf.com/content/CVPR2025/html/Wei_3D-AVS_LiDAR-based_3D_Auto-Vocabulary_Segmentation_CVPR_2025_paper.html, https://arxiv.org/abs/2406.09126 |
mosaic3d-open-vocab-3d-segmentation.md | Mosaic3D | Open-vocabulary 3D segmentation | Foundation dataset/model for open-vocabulary 3D semantic and instance segmentation; fills a gap beyond 2D open-vocabulary segmentation. | https://research.nvidia.com/labs/lpr/publication/choy2025mosaic/, https://openaccess.thecvf.com/content/CVPR2025/papers/Lee_Mosaic3D_Foundation_Dataset_and_Model_for_Open-Vocabulary_3D_Segmentation_CVPR_2025_paper.pdf |
openvox-open-vocab-voxel-mapping.md | OpenVox | Open-vocabulary voxel mapping | Real-time instance-level probabilistic voxel memory is relevant for persistent apron maps and language-queryable scene memory. | https://open-vox.github.io/ |
ago-open-world-3d-occupancy.md | AGO adaptive grounding | Open-world 3D occupancy | Known/unknown semantic occupancy using VLM grounding is a stronger safety-layer fit than box-only open-world detection. | https://arxiv.org/abs/2504.10117, https://openaccess.thecvf.com/content/ICCV2025/papers/Li_AGO_Adaptive_Grounding_for_Open_World_3D_Occupancy_Prediction_ICCV_2025_paper.pdf |
open3dtrack-open-vocab-3d-tracking.md | Open3DTrack | Open-vocabulary 3D tracking | First-pass gaps covered 3D detection/objectness but not persistent open-vocabulary 3D tracks. | https://arxiv.org/abs/2410.01678 |
open-vocab-monocular-3d-detection.md | OVM3D-Det | Open-vocab monocular 3D detection | Useful as offline/assistive camera fallback, with safety caveats. | https://ovm3d-det.github.io/, https://arxiv.org/abs/2411.15657 |
dense-open-vocab-segmentation.md | Florence-2, CAT-Seg, ProxyCLIP, AutoSeg/AVS | Dense open-vocabulary labels | Grounded-SAM is not the whole dense-labeling story; pixel-level OV segmentation helps mask QA and rare-class discovery. | https://arxiv.org/abs/2311.06242, https://huggingface.co/microsoft/Florence-2-large, https://openaccess.thecvf.com/content/CVPR2024/html/Cho_CAT-Seg_Cost_Aggregation_for_Open-Vocabulary_Semantic_Segmentation_CVPR_2024_paper.html, https://arxiv.org/abs/2408.04883, https://openaccess.thecvf.com/content/ICCV2025/html/Ulger_Auto-Vocabulary_Semantic_Segmentation_ICCV_2025_paper.html |
ovad-open-vocab-3d-attributes.md | OVAD / OVODA | Open-vocabulary 3D attributes | Adds attribute-level open-vocabulary 3D detection, which matters for airside state labels such as cones, tugs, carts, lights, covers, and unusual vehicle configurations. | https://arxiv.org/abs/2508.16812 |
drivebench-vlm-reliability.md | DriveBench | VLM driving reliability under corruptions | Tests clean, corrupted, and text-only driving QA to catch visually ungrounded VLM perception or monitoring outputs. | https://drive-bench.github.io/, https://github.com/worldbench/DriveBench |
airside-fod-open-world-benchmarks.md | FOD-A, FAA FOD framing, Airport-FOD3S, RDD5000, DualFOD/FOD-UAS | Airside FOD and runway benchmarks | Existing docs say zero-shot FOD is weak but lack a benchmark plan for tiny-object recall, RGB/thermal runway visibility, synthetic FOD generation, and OOD-vs-detector comparison. | https://arxiv.org/abs/2110.03072, https://github.com/FOD-UNOmaha/FOD-data, https://www.faa.gov/airports/airport_safety/fod, https://www.mdpi.com/1424-8220/25/15/4565, https://www.mdpi.com/2072-4292/17/4/669, https://www.mdpi.com/2504-446X/10/3/225 |
50-cloud-fleet/data-platform/airport-fod3s-synthetic-data.md | Airport-FOD3S | Synthetic FOD data engine | Synthetic FOD generation and blending directly addresses rare small-object data scarcity beyond FOD-A. | https://www.mdpi.com/1424-8220/25/15/4565 |
30-autonomy-stack/perception/methods/fail2drive.md | Fail2Drive | Closed-loop OOD validation | Paired-route distribution-shift scenarios expose perception-linked failures under closed-loop drift. | https://arxiv.org/abs/2604.08535, https://github.com/autonomousvision/fail2drive |
P2 or Watchlist
| Method or technique | Why watch | Current concern | Sources |
|---|---|---|---|
| VLMFusionOcc3D, OccVLA, O3N | VLM-assisted and open-vocabulary occupancy are moving quickly. | Promising but not yet safety perception primitives; validate hallucination, latency, and reproducibility. | https://arxiv.org/abs/2603.02609, https://huggingface.co/papers/2509.05578, https://arxiv.org/abs/2603.12144 |
| RayD3D, Dr.Occ, SuperOcc, Gau-Occ, GaussTR, GaussRender | Useful 2025-2026 camera/multimodal/Gaussian occupancy signals. | Mostly preprint-stage or narrow-benchmark; track code and Occ3D/OpenOccupancy reproducibility. | https://huggingface.co/papers/2603.22852, https://arxiv.org/abs/2603.01007, https://openaccess.thecvf.com/content/CVPR2025/html/Oh_3D_Occupancy_Prediction_with_Low-Resolution_Queries_via_Prototype-aware_View_Transformation_CVPR_2025_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Jiang_GaussTR_Foundation_Model-Aligned_Gaussian_Transformer_for_Self-Supervised_3D_Spatial_Understanding_CVPR_2025_paper.html, https://openaccess.thecvf.com/content/ICCV2025/html/Chambon_GaussRender_Learning_3D_Occupancy_with_Gaussian_Rendering_ICCV_2025_paper.html |
| EventFly, EvDET200K, TUMTraf EMOT, PRE-Mamba | Event perception is expanding into dense segmentation, high-definition detection, MOT, and rain removal. | Useful follow-ons for event-camera pages; promote after dataset/code adoption and hardware fit are clearer. | https://openaccess.thecvf.com/content/CVPR2025/html/Kong_EventFly_Event_Camera_Perception_from_Ground_to_the_Sky_CVPR_2025_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Wang_Object_Detection_using_Event_Camera_A_MoE_Heat_Conduction_based_CVPR_2025_paper.html, https://arxiv.org/abs/2512.14595, https://arxiv.org/abs/2505.05307 |
| DSERT-RoLL | Stereo event/RGB/thermal, 4D radar, and dual-LiDAR dataset for diverse driving conditions. | Newly posted in 2026; wait for data/code availability before promotion. | https://arxiv.org/abs/2604.03685 |
| HyperDet, DRIFT, SD4R, R4Det | 2026 radar-only/radar-camera perception line. | Needs code, repeatable benchmarks, and multi-dataset validation. | https://arxiv.org/abs/2602.11554, https://arxiv.org/abs/2603.09695, https://arxiv.org/abs/2602.20653, https://arxiv.org/abs/2603.11566 |
| SiMO and single-modality-operable cooperative perception | Directly relevant to robust collaborative perception under sensor failures. | Too fresh; compare against MoME, HEAL, mmCooper, and CoSDH before promotion. | https://gist.science/paper/2603.08240 |
| SparseCoop, CoDS, JigsawComm, QuantV2X, TruckV2X | 2025-2026 cooperative perception wave around sparse queries, semantic communication, compression, quantization, and truck-centered V2X data. | Useful but not yet mature enough for core docs; compare against CoSDH, mmCooper, CoST, and V2X-ReaLO first. | https://arxiv.org/abs/2512.06838, https://arxiv.org/abs/2512.22513, https://arxiv.org/abs/2511.17843, https://huggingface.co/papers/2509.03704, https://huggingface.co/papers/2507.09505 |
| LightDiff and JarvisIR | Restoration can help low-light/weather preprocessing. | Treat as monitored advisory preprocessing, not safety authority, because restoration can hallucinate. | https://openaccess.thecvf.com/content/CVPR2024/html/Li_Light_the_Night_A_Multi-Condition_Diffusion_Framework_for_Unpaired_Low-Light_CVPR_2024_paper.html, https://openaccess.thecvf.com/content/CVPR2025/html/Lin_JarvisIR_Elevating_Autonomous_Driving_Perception_with_Intelligent_Image_Restoration_CVPR_2025_paper.html |
| T-Rex2 and OWLv2 visual-prompt detection | Visual exemplars help airport-specific GSE variants where text prompts are brittle. | Better for onboarding and annotation acceleration than runtime safety perception. | https://arxiv.org/abs/2403.14610, https://github.com/IDEA-Research/T-Rex, https://arxiv.org/abs/2306.09683 |
| DySS, ForeSight, GaussianDet3D, OffsetOcc, SAM4D, Diffusion-FS | Useful follow-ons for streaming camera detection, Gaussian camera detection, panoptic completion, promptable LiDAR-camera segmentation, and corridor prediction. | Useful research directions, but merge into existing sparse-query, occupancy, SAM/data-engine, or freespace pages unless adoption grows. | https://arxiv.org/abs/2506.10242, https://openaccess.thecvf.com/content/ICCV2025/html/Papais_ForeSight_Multi-View_Streaming_Joint_Object_Detection_and_Trajectory_Forecasting_ICCV_2025_paper.html, https://openreview.net/forum?id=qByiOX1j9C, https://cvpr.thecvf.com/virtual/2025/35771, https://openaccess.thecvf.com/content/ICCV2025/html/Xu_SAM4D_Segment_Anything_in_Camera_and_LiDAR_Streams_ICCV_2025_paper.html, https://arxiv.org/abs/2507.18763 |
Benchmark and Dataset Gaps
Already Covered, But Needs Better Discoverability
| Existing file | Add aliases or cross-links for |
|---|---|
| BEV Encoding Architectures | BEVDet, BEVDepth, BEVStereo, BEVPoolv2, BEVDet4D, SOLOFusion, DuoSpaceNet, DenseBEV, SEPatch3D. |
| Camera-Only Degraded Perception | Depth Pro, Video Depth Anything, FoundationStereo, OVM3D-Det, camera occupancy fallback, MultiCorrupt/MSC-Bench degradation tests. |
| Open-Vocabulary and Zero-Shot Detection | Split open-vocabulary known-class detection from open-world unknown-object discovery; add OW-OVD, SAM 3, OP3Det, 3D-AVS, Clipomaly, S2M, T-Rex2. |
| Vision Foundation Models | Florence-2, CAT-Seg, ProxyCLIP, AutoSeg/AVS, SAM 3, data-engine use cases, and OOD segmentation. |
| LiDAR Foundation Models | 3D-AVS, OpenAD, OP3Det, open-world objectness, and auto-vocabulary point-cloud segmentation. |
| LiDAR Semantic Segmentation | LiDAR-MOS, 4DMOS, MotionSeg3D, MambaMOS, SegNet4D, Mask4D, ALPINE, HeLiMOS, moving/static labels, scene-flow signals, and 4D panoptic outputs. |
| OpenPCDet and CenterPoint | VoxelNeXt, DSVT, fully sparse detectors, RadarPillars, and camera-LiDAR/radar fusion interfaces. |
| Streaming Temporal Perception | StreamMOS, 4DSegStreamer, MotionSeg3D, MambaMOS, Neural Scene Flow Priors, Cam4DOcc, UnO, DFIT-OccWorld, Drive-OccWorld, LASP, Transtreaming, acquisition-vs-inference latency. |
| Multi-Object Tracking | S2-Track, GRAE-3DMOT, CoopTrack, uncertainty/existence output contracts, occlusion age, source modality, track history quality. |
| Infrastructure Cooperative Perception | RCooper, HoloVIC, CoInfra, V2X-ReaLO, V2XScenes, UrbanIng-V2X, V2X-Radar, online realism and latency traces. |
| Collaborative Fleet Perception | CoSDH, mmCooper, CoST, CoopDETR, CoopTrack, CoHFF, V2Xverse, privacy/adaptation for unknown collaborators. |
| Uncertainty Quantification | Conformal boxes, conformal abstention, GaussianLSS, HyperDUM, segmentation/occupancy/freespace uncertainty, modality health. |
| Model Compression and Edge Deployment | RT-BEV, Jigsaw, P99/WCET gates, multi-model contention, compression-vs-robustness regression checks. |
| Night Operations and Thermal Fusion | NSAVP, V2X-Radar, thermal-LiDAR dense perception, event cameras, thermal calibration drift, hot-engine clutter, rain-on-lens. |
| Production Perception Systems | Production validation matrix: missing modality, stale modality, miscalibration, sensor occlusion, darkness, fog/rain, degraded compute. |
| Data Flywheel and Data Engines | AIDE, SAM 3, Florence-2, OOD triggers, cooperative metadata, FOD-specific QA gates. |
Guardrail Process
When adding or revising perception research:
- Check whether the method is already a dedicated file, an existing section, or a backlog item in this audit.
- If it is P0, create one atomic file under
30-autonomy-stack/perception/methods/before expanding lower-priority coverage. - If it is P1, either create one atomic method file or add aliases and cross-links from the closest existing method page.
- If it is P2/watchlist, add aliases only after primary sources, code, or dataset availability are confirmed.
- Keep family synthesis in top-level perception docs and detailed method evidence in
methods/. - After every perception expansion, update this audit, Research Index, README, and any related safety/data-engine docs.