Neural Online Mapping: State of the Art (2024-2026)
Beyond MapTR/MapTRv2 -- The Next Generation of Online HD Map Construction
Table of Contents
- Field Overview and Evolution
- Temporal Fusion Methods
- Geometry and Representation Innovations
- Neural Map Priors and SD Map Integration
- Topological and Scene Graph Approaches
- Map-Free End-to-End Methods
- Foundation Model Approaches
- Mamba/SSM Approaches
- Benchmarks and Metrics
- Comprehensive Performance Comparison
- Airside Applicability Analysis
- Research Roadmap
1. Field Overview and Evolution
1.1 The Paradigm After MapTR
MapTR (ICLR 2023) and MapTRv2 (IJCV 2024) established the dominant paradigm: DETR-style transformer decoders with permutation-equivalent point queries for single-stage, parallel vectorized map construction. Every major method since builds on or responds to this foundation.
The field has since branched into several distinct research directions:
MapTR/v2 (2023-2024): Foundation paradigm
|
+---> Temporal fusion: StreamMapNet, MapTracker, MemFusionMap, MambaMap
|
+---> Better geometry: GeMap, HIMap, MGMap, BeMapNet, PivotNet
|
+---> Map priors: NeuralMapPrior, PriorDrive, P-MapNet, MapEX
|
+---> Topology reasoning: TopoNet, TopoMLP, LaneSegNet, T2SG, TopoLogic, TLSD
|
+---> Training/scaling: MapNeXt, MapExpert, SQD-MapNet, ADMap
|
+---> State-space models: MambaMap
|
+---> Map-free E2E: VAD/VADv2, UniAD, Tesla FSD v12+, Wayve
|
+---> Future prediction: PredMapNet, AMap1.2 Key Shifts Since MapTR
- Temporal modeling is now mandatory. Single-frame methods cannot handle occlusion or maintain consistency. Every top-performing method in 2024-2025 uses multi-frame temporal fusion.
- Consistency metrics emerged. MapTracker (ECCV 2024) exposed that high mAP does not imply temporal consistency, proposing C-mAP (Consistency-aware mAP) as a new evaluation dimension.
- Map priors complement perception. Pure sensor-only approaches struggle at road far-ends and under heavy occlusion. SD map priors (from OpenStreetMap or equivalent) provide cheap but effective global context.
- Topology reasoning is a distinct capability. Detecting map elements is insufficient; understanding how lanes connect is critical for planning. OpenLane-V2 benchmark formalized this.
- mAP has reached diminishing returns on nuScenes. MapNeXt-Huge achieved 78.5 mAP; further gains require new representations, longer temporal context, or prior integration.
2. Temporal Fusion Methods
2.1 StreamMapNet
| Paper | StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction |
| Venue | WACV 2024 |
| Authors | Yuan et al. |
| Code | https://github.com/yuantianyuan01/StreamMapNet |
Architecture:
StreamMapNet introduces streaming temporal fusion into MapTR-style online mapping. The core innovation is Multi-Point Attention -- rather than standard deformable attention at a single reference point per query, it distributes attention across multiple reference points along predicted map elements, capturing long-range dependencies critical for elongated elements (lane dividers, road boundaries).
Multi-view images (t, t-1, t-2, ...)
|
v
Image Backbone (ResNet-50)
|
v
BEV Encoder (GKT/BEVFormer)
|
+---> Temporal BEV Fusion: warp + concatenate historical BEV features
|
v
DETR-like Decoder with Multi-Point Attention
|
+---> Query Propagation: high-confidence queries retained across frames
|
v
Vectorized map elements (polylines with class labels)Temporal fusion strategies:
- BEV feature fusion: Historical BEV features are ego-motion warped and concatenated/fused with current features
- Query propagation: Element queries with confidence above threshold are propagated to the next frame, maintaining identity
Performance (nuScenes):
| Setting | Result | Notes |
|---|---|---|
| 60x30m range | +13.0 mAP over prior SOTA | Significant gain |
| 100x50m range | +10.2 mAP over prior SOTA | Extended range |
| Inference speed | 14.2 FPS | Real-time capable |
Benchmark contribution: StreamMapNet exposed significant geographic bias in existing nuScenes and Argoverse 2 evaluation splits (train and test sets share overlapping geographic regions), and proposed new geographically non-overlapping splits. On these harder splits, all methods show substantially lower performance, revealing that prior results were inflated by geographic memorization.
Airside relevance: HIGH. Temporal fusion is essential for airside operations where aircraft and GSE frequently occlude taxiway markings. The 100x50m perception range is particularly valuable for airport aprons where stand layouts extend over large areas.
2.2 MapTracker
| Paper | MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping |
| Venue | ECCV 2024 (Oral) |
| Authors | Chen, Wu, Tan, Ma, Furukawa (Simon Fraser University / Wayve) |
| Code | https://github.com/woodfrog/maptracker |
Core insight: Online mapping should be formulated as a tracking problem, not just a detection problem. Map elements persist across frames and should be explicitly tracked with identity preservation.
Architecture:
MapTracker maintains dual memory buffers of two latent types:
Sensor stream (multi-view cameras)
|
v
Image Backbone (ResNet-50, ImageNet pretrained)
|
v
BEV Feature Extraction
|
+---> Raster (BEV) Memory Buffer
| - 50x100x256 dimensional latent image in BEV space
| - Accumulates spatial context across frames
| - Indexed by world coordinates
|
+---> Vector Memory Buffer
| - 512-dim latent vectors per tracked road element
| - Maintains element identities across frames
| - Types: lane dividers, road boundaries, pedestrian crossings
|
v
Strided Memory Fusion
|
+---> Select 4 memory entries at ~1m, 5m, 10m, 15m distance strides
| (distance from current vehicle position, not temporal distance)
|
v
VEC Module (Vector Element Construction)
|
+---> PropMLP: transforms previous-frame vectors using rotation/translation
| quaternions encoded as positional embeddings
|
+---> Queries = propagated latents (tracked) + new candidate latents (fresh)
|
v
Transformer Decoder
|
v
Vectorized map elements with tracking IDsStrided Memory Fusion:
Rather than using all historical frames (computationally expensive and redundant for nearby frames), MapTracker selects exactly 4 memory entries positioned at approximately 1m, 5m, 10m, and 15m from the current vehicle location. This provides multi-scale temporal context: recent high-detail memory plus older coarse context, while keeping computation bounded.
Consistency-aware Evaluation (C-mAP):
MapTracker's major contribution beyond architecture is a new evaluation methodology:
- Standard mAP evaluates each frame independently -- a method can produce completely different map elements per frame and still score well
- C-mAP penalizes temporal inconsistency by checking whether matched elements in consecutive frames maintain ancestral correspondence
- The authors also re-processed nuScenes and Argoverse 2 ground truth to ensure temporal alignment of annotations
Performance (consistent ground truth splits):
| Dataset | Metric | MapTracker | StreamMapNet | Improvement |
|---|---|---|---|---|
| nuScenes | mAP | 76.1 | 70.4 | +5.7 |
| nuScenes | C-mAP | 69.1 | 56.4 | +12.7 |
| Argoverse 2 | mAP | 76.9 | 70.3 | +6.6 |
| Argoverse 2 | C-mAP | 68.3 | 57.5 | +10.8 |
The massive gap between mAP and C-mAP improvements (+12.7 vs +5.7 on nuScenes) demonstrates that MapTracker's tracking formulation specifically improves temporal consistency, not just per-frame accuracy.
Airside relevance: VERY HIGH. Consistent map reconstruction is critical for airside operations:
- Aircraft slowly entering/leaving stands create prolonged occlusions; tracking-based mapping maintains knowledge of markings behind the aircraft
- Baggage tractors traversing long apron routes need globally consistent maps, not flickering per-frame detections
- The strided memory at distance intervals maps well to the structured repetitive layouts of airport stands
2.3 MemFusionMap
| Paper | MemFusionMap: Working Memory Fusion for Online Vectorized HD Map Construction |
| Venue | WACV 2025 |
| Authors | Song et al. |
| Code | https://github.com/Song-Jingyu/MemFusionMap |
Key innovation: A simpler, more practical alternative to MapTracker's tracking-based approach. Uses only 4 past frames with a lightweight working memory fusion module, avoiding the need for ground-truth tracking supervision or complex post-processing.
Architecture components:
Working Memory Fusion Module: Maintains a fixed-lag buffer of 4 recent BEV features. Historical features are ego-motion warped to current frame and fused via dilated convolutions (2x2 dilation) to capture elongated map elements like lane markings.
Temporal Overlap Heatmap: A novel single-channel BEV map tracking how many times each grid cell has appeared in the field-of-view across frames. Recursively propagated using warping, incremented by 1 for newly visible regions. This implicitly encodes vehicle trajectory patterns and speed, helping distinguish high-confidence (frequently observed) from uncertain (newly seen) regions.
Performance (StreamMapNet geographic splits):
| Dataset | Range | MemFusionMap | StreamMapNet | MapTracker |
|---|---|---|---|---|
| nuScenes | 60x30m | 38.0 mAP | 34.1 | 40.3* |
| nuScenes | 100x50m | 27.6 mAP | 22.2 | -- |
| Argoverse 2 | 60x30m | 60.6 mAP | 58.3 | -- |
| Argoverse 2 | 100x50m | 54.3 mAP | 51.5 | -- |
*MapTracker requires additional GT tracking supervision and 72 training epochs vs 24 for MemFusionMap
Technical specifications:
- Backbone: ResNet-50
- BEV encoder: BEVFormer
- Feature dimension: 256 channels
- Memory capacity: 4 frames
- Inference: ~15 FPS on NVIDIA A6000
- Training: 8x V100 GPUs, 24 epochs
Airside relevance: HIGH. The practical advantages -- no tracking supervision needed, short training schedule, simple architecture -- make this suitable for airside deployment where training data is scarce. The 100x50m range is particularly relevant for wide apron areas. The temporal overlap heatmap naturally handles the case where markings are briefly occluded by passing vehicles.
2.4 SQD-MapNet (Stream Query Denoising)
| Paper | Stream Query Denoising for Vectorized HD-Map Construction |
| Venue | ECCV 2024 |
| Authors | Wang et al. |
| Code | https://github.com/shuowang666/SQD-MapNet |
Core idea: Applies the denoising training paradigm (from DN-DETR/DINO) to temporal map construction. During training, noised ground truth from the previous frame is used to generate denoising queries, and the network learns to reconstruct the current frame's ground truth from these noisy temporal inputs. This explicitly teaches temporal consistency without requiring tracking labels.
Key advantage: Can be applied as a plug-in to both static (single-frame) and temporal methods, improving any base architecture. Extensive experiments on nuScenes and Argoverse 2 show consistent improvements.
Airside relevance: MEDIUM. Provides a training technique rather than a deployment architecture, but could improve any online mapping model's temporal consistency.
2.5 IC-Mapper
| Paper | IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction |
| Venue | ACM Multimedia 2024 |
| Code | Not yet released |
Innovation: Combines instance-centric temporal association (measuring detection queries across frames in both feature and geometric dimensions) with spatial fusion for end-to-end detection, tracking, and fusion of map instances. Facilitates online reconstruction of global vectorized maps.
Performance: +1.1 mAP over StreamMapNet at 60x30m range, +1.6 mAP at 100x50m range.
Airside relevance: MEDIUM. The instance-centric tracking and global map reconstruction capability is relevant for building coherent airside maps, but limited performance gains over simpler methods reduce urgency.
2.6 PredMapNet
| Paper | PredMapNet: Future and Historical Reasoning for Consistent Online HD Vectorized Map Construction |
| Venue | arXiv 2025 |
| Code | Not yet released |
Innovation: First to introduce short-term future prediction into online HD map construction. Three key components:
- Semantic-Aware Query Generator: Initializes queries with spatially aligned semantic masks for scene-level context
- History-Map Guidance Module (HMG): Incorporates historical rasterized map information for smoother, more continuous detection
- Short-Term Future Guidance Module (STFG): Predicts near-future positions of map instances, providing anticipatory context
Achieves new SOTA on two representative benchmarks.
Airside relevance: MEDIUM-HIGH. Future prediction could anticipate taxiway markings around corners or behind slow-moving aircraft before they are directly visible. The historical map guidance concept aligns well with AIXM-based coarse priors.
3. Geometry and Representation Innovations
3.1 GeMap (Geometry-Aware Mapping)
| Paper | Online Vectorized HD Map Construction using Geometry |
| Venue | ECCV 2024 |
| Authors | Zhang et al. |
| Code | https://github.com/cnzzx/GeMap |
Core innovation: Explicitly learns Euclidean geometric properties of map elements -- angles between segments, distances between parallel lines, perpendicularity of intersecting elements -- going beyond pure coordinate regression.
Key technical contributions:
Geometric Loss: Based on angle and magnitude decomposition of predicted vectors, robust to rigid transformations (rotation/translation) of driving scenarios. Forces the network to learn structural relationships rather than memorizing absolute coordinates.
Decoupled Self-Attention: Separates self-attention into two mechanisms:
- Shape attention: Learns intra-element geometry (how points within a single polyline relate)
- Relation attention: Learns inter-element geometry (how different map elements relate to each other -- e.g., parallel lane boundaries)
Performance:
| Dataset | Backbone | mAP | vs MapTRv2 |
|---|---|---|---|
| nuScenes | ResNet-50 (camera) | 69.4 | +0.7 |
| Argoverse 2 | ResNet-50 (camera) | 71.8 | +4.4 |
| nuScenes | Updated model | 76.0 | +7.3 |
Airside relevance: HIGH. Airport surface markings follow strict geometric rules (ICAO Annex 14): taxiway centerlines are parallel to edges, stand markings follow specific angular patterns, holding position markings are perpendicular to centerlines. A geometry-aware model can exploit these structural constraints even when individual markings are partially occluded or faded.
3.2 HIMap (Hybrid Representation)
| Paper | HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction |
| Venue | CVPR 2024 |
| Authors | Zhou et al. (Samsung Research) |
| Code | https://github.com/BritaryZhou/HIMap |
Core insight: MapTR and its descendants use point-level representation -- each query represents individual polyline points. This misses element-level information (overall shape, topology). HIMap learns both simultaneously.
Architecture:
- HIQuery: Hybrid queries encoding both point-level positions and element-level shapes
- Point-Element Interactor: Cross-attention mechanism that bidirectionally exchanges information between point-level and element-level representations within each decoder layer
- Multiple decoder layers, each with point-element interactor + self-attention + FFN + prediction heads
Performance (nuScenes):
- HIMap: 68.5+ mAP (camera-only, ResNet-50)
- Consistently exceeds prior SOTA under both easy and hard settings
Airside relevance: MEDIUM. The hybrid point/element representation could better capture airport-specific elements like curved stand lead-in lines (which have both local point detail and global curvature properties). However, the gains over pure point-level methods are modest.
3.3 MGMap (Mask-Guided)
| Paper | MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction |
| Venue | CVPR 2024 |
| Authors | Liu et al. |
| Code | https://github.com/xiaolul2/MGMap |
Core innovation: Uses predicted instance masks to guide map element query initialization and point refinement, providing global structural context that pure query-based methods lack.
Architecture (three components):
- Enhanced Multi-Level (EML) Neck: Hybrid channel and spatial attention for multi-scale BEV features
- Mask-Activated Instance (MAI) Decoder: Dynamically initializes element queries using learned instance masks, incorporating global instance and structural information
- Position-Guided Mask Patch Refinement (PG-MPR): Uses binary masks to extract patch-based features around predicted points for fine-grained localization
Performance (nuScenes, camera-only, ResNet-50):
| Setting | MGMap | MapTR | BeMapNet | Gain over MapTR |
|---|---|---|---|---|
| 30 epochs | 61.4 mAP | 51.1 | 59.8 | +10.3 |
| LiDAR | 67.9 mAP | 55.6 | -- | +12.3 |
| Camera+LiDAR | 71.7 mAP | 62.5 | -- | +9.2 |
Per-category (camera, 30 epochs):
| Category | MGMap | MapTR | Gain |
|---|---|---|---|
| Pedestrian Crossing | 57.4 | 45.2 | +12.2 |
| Lane Divider | 63.5 | 53.8 | +9.7 |
| Road Boundary | 63.3 | 54.3 | +9.0 |
Argoverse 2: 62.8 mAP (+5.4 over MapTR)
Airside relevance: MEDIUM-HIGH. The mask-guided approach could be particularly effective for airport markings that have distinctive shapes (T-bar stand markings, ladder-style ILS markings) where instance masks provide strong structural priors.
3.4 BeMapNet (Bezier Curves)
| Paper | End-to-End Vectorized HD-map Construction with Piecewise Bezier Curve |
| Venue | CVPR 2023 |
| Authors | Qiao et al. |
| Code | https://github.com/er-muyue/BeMapNet |
Innovation: Represents map elements as piecewise Bezier curves rather than fixed-point polylines. Key components:
- IPM-PE Align module: Injects inverse perspective mapping geometric priors into BEV features
- Piecewise Bezier Head: Classification branch determines curve segment count, regression branch predicts control points
- Point-Curve-Region Loss: Progressive supervision from point-level to curve-level to region-level
Advantage: Bezier curves naturally model smooth, continuous map elements with fewer control points (64% reduction). This produces cleaner outputs without post-processing.
Performance (nuScenes): 57.7-62.6 mAP depending on training epochs, surpassing prior SOTA by 18.0+ mAP at time of publication.
Airside relevance: MEDIUM. Taxiway centerlines and curved taxi routes are naturally modeled by Bezier curves. Stand lead-in lines with their characteristic curves would benefit from smooth curve representation over piecewise linear approximations.
3.5 PivotNet (Dynamic Pivot Points)
| Paper | PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction |
| Venue | ICCV 2023 |
| Authors | Ding et al. |
| Code | https://github.com/wenjie710/PivotNet |
Innovation: Uses dynamic pivot points instead of fixed-count point sets. Key components:
- Point-to-Line Mask Module: Encodes subordinate and geometrical point-line priors
- Pivot Dynamic Matching Module: Models topology in dynamic point sequences via sequence matching
- Dynamic Vectorized Sequence Loss: Supervises both position and topology of vectorized predictions
Performance (nuScenes): 53.8 mAP with ResNet-50, surpassing prior SOTA by 5.9+ mAP at time of publication.
Airside relevance: LOW-MEDIUM. The dynamic point count is useful for elements of variable complexity, but superseded by later methods.
3.6 MapNeXt (Training and Scaling)
| Paper | MapNeXt: Revisiting Training and Scaling Practices for Online Vectorized HD Map Construction |
| Venue | arXiv 2024 |
| Authors | Li et al. |
Key contribution: Not a new architecture but a systematic study of training recipes and scaling laws for MapTR-family models.
Onboard model improvements (no architecture changes):
- Augmented map element decoder queries with richer initialization
- Dedicated image encoder pre-training
- MapNeXt-Tiny: 54.8 mAP (up from MapTR-Tiny's 49.0), zero inference cost increase
Offboard model scaling rules:
- Increased query count requires proportionally larger decoder networks
- Larger backbones steadily improve accuracy without diminishing returns (up to tested scale)
MapNeXt-Huge: 78.5 mAP on nuScenes -- first to exceed 78% for vision-only, single-model online map construction. This represents +16% over previous best.
Airside relevance: HIGH (practically). These training and scaling insights directly apply to training any MapTR-family model on airside data. The finding that proper training alone yields +5.8 mAP (MapTR-Tiny to MapNeXt-Tiny) without architectural changes is immediately actionable.
3.7 MapExpert (Sparse Element Experts)
| Paper | MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert |
| Venue | AAAI 2025 |
Innovation: Different map element types (lane boundaries, pedestrian crossings, road edges) have fundamentally different geometric characteristics. MapExpert assigns distinct expert transformer layers to different element types, rather than treating all elements uniformly.
- Learnable Weighted Moving Descent (LWMD): Strengthens BEV features while filtering noise for the decoder
- Sparse Expert Decoder: Distributes element types to specialized experts
- Auxiliary Balance Loss: Ensures even load distribution across experts
SOTA performance on both nuScenes and Argoverse 2.
Airside relevance: HIGH. Airport surfaces have highly distinct element types -- taxiway centerlines, stand markings, safety zone boundaries, holding positions -- each with very different geometric properties. A mixture-of-experts approach that learns type-specific decoders could significantly improve accuracy for airport-specific elements.
3.8 ADMap (Anti-Disturbance)
| Paper | ADMap: Anti-disturbance Framework for Vectorized HD Map Construction |
| Venue | ECCV 2024 |
| Code | https://github.com/hht1996ok/ADMap |
Innovation: Addresses robustness to input disturbances through:
- Multi-scale Perception Neck (MPN): Multi-scale BEV feature processing
- Instance Interactive Attention (IIA): Interaction between map element instances for mutual refinement
- Vector Direction Difference Loss (VDDL): Monitors point sequence prediction quality
SOTA performance on both nuScenes and Argoverse 2.
Airside relevance: MEDIUM-HIGH. Robustness to disturbances (de-icing spray, jet exhaust heat shimmer, night operations, rain on cameras) is directly relevant for airside operations where environmental conditions can be harsh.
3.9 MapUnveiler (Occlusion Handling)
| Paper | Unveiling the Hidden: Online Vectorized HD Map Construction with Clip-Level Token Interaction and Propagation |
| Venue | NeurIPS 2024 |
| Project | https://mapunveiler.github.io/ |
Core innovation: Explicitly handles occluded map elements -- the critical failure mode where vehicles, pedestrians, or obstacles block the view of road markings. Processes video clips (not individual frames) and generates compact clip tokens that capture temporal map information.
Key mechanism:
- Clip-level pipeline: Processes multiple frames as a clip, avoiding redundant per-frame computation via temporal stride
- Clip token interaction: Dense BEV features interact with compact clip tokens containing temporal map information
- Inter-clip token propagation: Associates information between clips for long-term temporal context
- Global map relationship: Builds coherent map understanding across the full driving sequence
Performance:
- SOTA on both nuScenes and Argoverse 2
- +10.7% mAP improvement in heavily occluded driving scenes (the key metric)
Airside relevance: VERY HIGH. Occlusion is the dominant challenge for airside online mapping -- aircraft fuselages, GSE equipment, and baggage carts frequently block taxiway markings. A method specifically designed to unveil occluded map elements would be directly valuable. The +10.7% improvement in occluded scenes is the most relevant number in this entire survey for airside applications.
3.10 MapQR (Enhanced Queries)
| Paper | Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction |
| Venue | ECCV 2024 |
Innovation: Redesigns the query representation for vectorized map construction, achieving best mAP and good efficiency on both nuScenes and Argoverse 2. The enhanced query design can be integrated as a plug-in to boost other models.
Airside relevance: LOW-MEDIUM. Primarily an engineering improvement rather than a conceptual advance relevant to airside challenges.
4. Neural Map Priors and SD Map Integration
4.1 Neural Map Prior (NMP)
| Paper | Neural Map Prior for Autonomous Driving |
| Venue | CVPR 2023 |
| Authors | Xiong et al. (Tsinghua MARS Lab) |
| Code | https://github.com/Tsinghua-MARS-Lab/neural_map_prior |
Architecture:
NMP maintains a global neural map stored as sparse map tiles, each corresponding to a real-world location. When the vehicle revisits an area, the stored neural features are retrieved and fused with current observations:
Global Neural Map (stored in memory / disk)
|
+---> Map tiles indexed by geographic coordinates
| Each tile: learned feature tensor (not raw sensor data)
|
v
Current BEV Features (from live perception)
|
v
C2P (Current-to-Prior) Cross-Attention
| - Dynamically weights current vs. prior features
| - High-quality current frame -> more weight on current
| - Poor conditions (rain, night) -> more weight on prior
|
v
Enhanced BEV Features
|
v
GRU (Gated Recurrent Unit) Update
| - Updates global neural map with new observations
| - Incrementally improves the stored prior
|
v
Updated Neural Map Tile (stored back to global map)Key properties:
- Architecture-agnostic: works with HDMapNet, VectorMapNet, and others
- Particularly effective in adverse conditions (rain, night)
- Improves with extended perception range (where current frame has less information)
- The prior gets better over time as the vehicle repeatedly traverses the same area
Performance gains:
- HDMapNet: +5.4 mAP (segmentation), +5.6 mAP (detection)
- Largest gains at night and in rain
Airside relevance: VERY HIGH. Airport vehicles traverse the same routes repeatedly (terminal-to-stand, stand-to-cargo). A neural map prior would rapidly build up rich representations of each route, compensating for temporary occlusions and adverse conditions. The GRU-based incremental update mechanism naturally handles the evolving nature of airport environments (different GSE parked at different times).
Airside deployment strategy:
- Build NMP during initial survey passes (vehicle drives all routes)
- Store NMP on airport edge server (~10MB per 100m tile, full airport ~1-5GB)
- Load relevant tiles at runtime (vehicle downloads tiles for current route segment)
- Continuous improvement: each operational pass updates tiles
- Night/weather resilience: NMP provides strong prior when cameras are degraded
4.2 PriorDrive (Unified Vector Priors)
| Paper | PriorDrive: Enhancing Online HD Mapping with Unified Vector Priors |
| Venue | arXiv 2024 (with 2025 updates) |
Innovation: Integrates multiple types of prior maps into a unified framework:
- OpenStreetMap SD Maps: Coarse road geometry (free, globally available)
- Outdated HD Maps: Previous-generation vendor maps (may be stale but still informative)
- Locally Constructed Maps: From historical vehicle traversals
Key components:
- Hybrid Prior Representation (HPQuery): Standardizes diverse map element representations
- Unified Vector Encoder (UVE): Fused prior embedding with dual encoding mechanism
- Segment-level and point-level pre-training: Learns prior distribution of vector data
Performance improvements from SD map priors (applied to various base methods):
| Base Method | Without Prior | With SD Map Prior | Gain |
|---|---|---|---|
| HDMapNet | baseline | +3.0% mAP | +3.0 |
| VectorMapNet | baseline | +3.9% mAP | +3.9 |
| StreamMapNet | baseline | +5.9% mAP | +5.9 |
| MapTRv2 | baseline | +5.7% mAP | +5.7 |
Tested on nuScenes, Argoverse 2, and OpenLane-V2.
Airside relevance: VERY HIGH. This is one of the most directly applicable methods for airport deployment:
- AIXM/AMXM data (available for most airports) can serve as the SD map prior
- Historical traversal data builds local prior maps automatically
- The unified encoding means all prior types can be combined seamlessly
- Airport layouts are more stable than road networks, making priors more reliable
4.3 P-MapNet (Far-seeing with Map Priors)
| Paper | P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors |
| Venue | arXiv 2024 |
Uses SD map priors as conditional branches processed with masked autoencoders, specifically targeting long-range (far-seeing) map prediction where sensor data is sparse.
Performance: +6.7 mAP over MapTRv2 with SD map prior. +11.2 mAP with HD map prior.
Airside relevance: MEDIUM. Long-range perception is less critical for low-speed airside operations (typical 15-25 km/h) but useful for anticipating stand layouts ahead.
4.4 MapEX (Map Prior Refinement)
| Paper | MapEX |
| Venue | 2023 |
Approach: Categorizes existing maps into three types and refines query-based estimation model matching algorithms for handling priors.
Limitation: Simulates outdated maps by introducing artificial offsets and erasing map elements. This risks leaking ground truth data during training and fails to accurately represent real-world map staleness.
Airside relevance: LOW. The artificial simulation of map degradation does not reflect airport-specific scenarios well.
5. Topological and Scene Graph Approaches
5.1 Why Topology Matters (Especially for Airside)
Standard online mapping produces a set of independent polylines without explicit connectivity information. For planning, the vehicle needs to know:
- Which lanes connect to which (can I go from taxiway A to taxiway B?)
- What traffic signals control which lanes
- Where merge/diverge points are
- What the legal routing options are at each junction
This is critical for airside operations: taxiway junctions must be navigated correctly to avoid runway incursions, and ATC routing instructions reference specific taxiway segments connected in a graph.
5.2 TopoNet (Graph-Based Topology)
| Paper | TopoNet: Graph-based Topology Reasoning for Driving Scenes |
| Venue | 2023 |
| Authors | OpenDriveLab |
| Code | https://github.com/OpenDriveLab/TopoNet |
Innovation: First end-to-end framework for abstracting traffic knowledge beyond perception, specifically reasoning about connections between centerlines and traffic elements from sensor inputs. Uses Graph Neural Networks (GNNs) to model lane connectivity and traffic signal associations.
Performance on OpenLane-V2: 20.0 OLS
Airside relevance: HIGH. The graph representation directly maps to airport taxi routing: taxiway segments as nodes, junctions as connection edges, and airport signage as traffic elements with lane associations.
5.3 TopoMLP (Simple Topology Reasoning)
| Paper | TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning |
| Venue | ICLR 2024 |
| Authors | Wu et al. |
| Code | https://github.com/wudongming97/topomlp |
Philosophy: "First detect, then reason." Rather than jointly learning detection and topology, TopoMLP separates them into sequential stages.
Architecture:
- Lane Detector: Inspired by PETR (3D position embedding in query-based DETR framework)
- Traffic Element Detector: YOLOX-based detection head
- Topology MLP: Two separate MLP heads (3 linear layers + ReLU each) with position embedding
- Lane-lane topology: predicts connectivity between detected lanes
- Lane-traffic topology: predicts which traffic elements control which lanes
Performance on OpenLane-V2:
- 41.2% OLS with ResNet-50 backbone
- Winner of the 1st OpenLane Topology Challenge
Airside relevance: HIGH. The decoupled detect-then-reason approach is practical for airport deployment:
- Standard detectors can be retrained for airport-specific elements (taxiway signs, holding position markings)
- Topology MLPs can learn airport-specific connectivity patterns
- Simple architecture is easier to debug and validate for safety-critical applications
5.4 LaneSegNet (Unified Lane Segments)
| Paper | LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving |
| Venue | ICLR 2024 |
| Authors | OpenDriveLab |
| Code | https://github.com/OpenDriveLab/LaneSegNet |
Core innovation: Introduces the lane segment as a unified map representation that simultaneously encodes geometry AND topology. Each lane segment is defined by:
Lane Segment = {
centerline: [ordered 3D points defining the center path],
left_boundary: [3D points, derived as offset from centerline],
right_boundary: [3D points, derived as offset from centerline],
lane_type: {regular_lane | pedestrian_crossing | ...},
left_line_type: {dashed | solid | non_visible},
right_line_type: {dashed | solid | non_visible},
topology_edges: {predecessor_segments, successor_segments}
}Architecture:
Multi-view Images
|
v
ResNet-50 + FPN (3-stage multi-scale features)
|
v
BEV Feature Map
|
v
Lane Segment Decoder with Lane Attention
|
+---> Heads-to-Regions mechanism:
| - Multiple reference points distributed along predicted boundaries
| - Multi-head attention: each head attends to specific local region
| - 32 sampling locations per reference point, 8 directions
| - Identical initialization for all heads in first layer
|
v
Lane Segments (geometry + type + topology)
|
+---> Topology Branch:
- MLP processes predecessor/successor features
- Concatenation + sigmoid -> connection probability
- Outputs weighted adjacency matrixPerformance on OpenLane-V2:
| Metric | LaneSegNet | TopoNet | Gain over TopoNet |
|---|---|---|---|
| Lane Segment mAP | 32.6 | 23.0 | +9.6 |
| Centerline DET_l | 31.8 | 24.9 | +6.9 |
| Topology TOP_ll | 7.6 | 2.3 | +5.3 |
| OLS | 29.7 | 20.0 | +9.7 |
| FPS | 14.7 | 10.5 | +4.2 FPS |
Also: 37.7% fewer parameters and 31.4% less decoder latency than TopoNet.
Airside relevance: VERY HIGH. The lane segment representation is arguably the most natural fit for airport taxiways:
- Taxiway segments naturally have centerlines (painted yellow) with left/right edges (painted or implicit)
- Connectivity between taxiway segments is exactly the predecessor/successor graph
- Lane type classification maps to taxiway types (taxiway, apron, lead-in)
- Boundary types map to edge marking types (continuous, dashed at holding positions)
- The directed graph output can directly feed into airport routing algorithms
5.5 T2SG (Traffic Topology Scene Graph)
| Paper | T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving |
| Venue | CVPR 2025 |
| Authors | Lv et al. |
Innovation: Defines a novel unified scene graph that explicitly models lanes, the road signals that control/guide them, and topology relationships among all elements.
Architecture (TopoFormer):
Lane Aggregation Layer (LAL): Uses geometric distance among lane centerlines to guide global information aggregation. Lanes that are physically close interact more strongly.
Counterfactual Intervention Layer (CIL): Models reasonable road structures (intersections, straights, merges) through counterfactual reasoning. This layer asks "what would the topology look like if this lane were removed?" to learn robust structural understanding.
Performance on OpenLane-V2: 46.3 OLS (SOTA as of early 2025)
Airside relevance: HIGH. Airport taxiway junctions are exactly the type of complex topology that T2SG handles. The scene graph representation could encode the full airport movement area structure including taxiway guidance signs, stop bars, and their lane associations. The counterfactual reasoning could help understand taxiway closures (NOTAMs).
5.6 TopoLogic (Interpretable Topology)
| Paper | TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes |
| Venue | NeurIPS 2024 |
| Code | https://github.com/Franpin/TopoLogic |
Innovation: Addresses a fundamental weakness of prior topology methods: they use vanilla MLPs to predict connectivity from lane queries, ignoring geometric features intrinsic to lanes and being vulnerable to endpoint detection shifts.
Two-pronged approach:
- Geometric distance topology reasoning: Uses lane centerline geometry (proximity, orientation alignment) to predict connectivity. Robust to endpoint detection noise.
- Lane query similarity: Computes semantic similarity between lane queries as a complementary signal.
Key advantage: Can be incorporated into well-trained models without re-training, acting as a post-processing topology enhancement.
Performance on OpenLane-V2:
- TOP_ll: 23.9 (vs 10.9 for previous SOTA) -- 2.2x improvement in lane-lane topology
- OLS: 44.1 (vs 39.8 for previous SOTA)
Airside relevance: MEDIUM-HIGH. The ability to improve topology reasoning without retraining is valuable for rapid airport-specific deployment. The geometric distance-based reasoning is particularly suitable for airport layouts where taxiway connectivity follows clear geometric patterns.
5.7 TLSD (Breaking Topology Limits)
| Paper | TLSD: Breaking the Limit of Topological Lane Mapping with Graph Knowledge and Distance Awareness |
| Venue | ACML 2025 |
Innovations:
- Iterative refinement scheme within the decoder
- Group-wise one-to-many assignment for faster training convergence
- GNN module that integrates lane segment coordinates for spatial reasoning
- Distance-aware topological post-processing
Performance on OpenLane-V2: 47.7 OLS (latest SOTA)
Efficient variant (eTLSD) with ResNet-18 outperforms ResNet-50-based competitors (+3.3% mAP, +6.6% APped).
5.8 Topology Methods Summary (OpenLane-V2)
| Method | Venue | OLS | TOP_ll | Key Innovation |
|---|---|---|---|---|
| TopoNet | 2023 | 20.0 | 2.3 | First GNN-based |
| LaneSegNet | ICLR 2024 | 29.7 | 7.6 | Unified lane segments |
| TopoMLP | ICLR 2024 | 41.2 | -- | Detect-then-reason |
| TopoLogic | NeurIPS 2024 | 44.1 | 23.9 | Geometric reasoning |
| T2SG | CVPR 2025 | 46.3 | -- | Scene graph + counterfactual |
| TLSD | ACML 2025 | 47.7 | -- | GNN + distance-aware |
The field has more than doubled OLS scores in two years (20.0 to 47.7), with the steepest gains from incorporating geometric reasoning and graph neural networks.
6. Map-Free End-to-End Methods
6.1 Tesla FSD v12+ (Production Implicit Maps)
Architecture evolution:
- FSD v11: ~300,000 lines of C++ control code + 48 neural networks for perception
- FSD v12 (March 2024): End-to-end neural network, ~2,000-3,000 lines of management code
- FSD v13-v14 (2025): Refined end-to-end with neural network vision encoder upgrades
How Tesla eliminates explicit maps:
- 8 cameras provide 360-degree coverage
- 2D camera images are transformed into 3D spatial understanding via BEV transformations and occupancy networks
- Navigation and routing are integrated directly into the vision-based neural network
- Real-time responses to blocked roads and detours without map-based instructions
- Fleet-learned driving patterns replace pre-built route data
FSD v14.2.1 (2025): Major neural network vision encoder upgrade. Navigation routing is now part of the neural pipeline, not a separate module.
Scale: 3+ billion miles driven on FSD (Supervised) by January 2025. Robotaxi service launched in Austin, TX in June 2025.
Airside relevance: The Tesla approach proves that implicit maps work at massive scale on public roads, but its camera-only philosophy (no LiDAR) and the sheer data requirements (billions of miles) make direct adoption impractical for niche airside operations. The architectural principles are relevant: BEV occupancy for navigable space understanding is directly transferable.
6.2 Wayve (GAIA World Models)
GAIA evolution:
- GAIA-1 (2023): 9B parameter generative world model, 4,700 hours UK driving data
- GAIA-2 (2025): Multi-camera, spatio-temporally coherent scene generation, broader geographic coverage
- GAIA-3 (December 2025): 15B parameters (2x GAIA-2), trained on 10x more data spanning 9 countries across 3 continents
Implicit mapping approach: Rather than building explicit maps, Wayve's GAIA models learn to generate realistic driving scenes from their training data. The model encodes road structure, lane topology, and navigable space implicitly in its parameters. GAIA-3 can re-drive authentic driving sequences with parameterized variations while maintaining scene coherence.
Wayve AV2.0: Drives in 500+ cities with a single model, no HD maps. Demonstrated 40x improvement from zero-shot when adding 500 hours of domain-specific incremental data.
Airside relevance: MEDIUM. The demonstrated geographic transfer learning (zero-shot + incremental) is extremely relevant: an airside world model could be pre-trained on road data and incrementally adapted with limited airport-specific data. However, GAIA models are primarily for simulation/evaluation, not direct deployment.
6.3 VAD / VADv2 (Vectorized Autonomous Driving)
| Paper | VAD: Vectorized Scene Representation for Efficient Autonomous Driving |
| Venue | ICCV 2023 (VAD) / ICLR 2026 (VADv2) |
| Authors | Jiang et al. (Huazhong University / Horizon Robotics) |
| Code | https://github.com/hustvl/VAD |
VAD (original): Models the entire driving scene as a fully vectorized representation -- agents, map elements, and ego trajectory are all vectors, not rasterized grids. This eliminates expensive rasterized BEV computation.
Architecture flow:
Multi-view Images
|
v
Image Backbone + BEV Encoder
|
v
Vectorized Scene Learning:
+---> Agent Queries -> Agent Motion Vectors
+---> Map Queries -> Map Element Vectors
|
v
Planning Module:
+---> Agent motion vectors as dynamic constraints
+---> Map element vectors as static constraints
|
v
Planned Trajectory (vectorized output)Performance:
- 48.4% reduction in collision rate over previous best E2E method
- Up to 9.3x faster inference than rasterized methods
- VAD-Tiny: 0.78m average displacement error, 0.38% collision rate
VADv2 (2024): Extends to probabilistic planning -- outputs a distribution over actions rather than a single trajectory. SOTA closed-loop performance on CARLA Town05.
Online mapping role: VAD's map queries serve as an implicit online mapping component: they detect and vectorize map elements not as an end in themselves but as planning constraints. This demonstrates that online mapping can be a learned intermediate representation rather than an explicit output.
Airside relevance: MEDIUM-HIGH. The vectorized planning constraint approach is elegant for airside operations: taxiway boundaries become explicit velocity constraints, stand markings become positioning constraints, and these can be learned end-to-end rather than hand-coded.
6.4 UniAD (Unified Autonomous Driving)
| Paper | Planning-oriented Autonomous Driving |
| Venue | CVPR 2023 (Best Paper Award) |
| Authors | Hu et al. (OpenDriveLab / SenseTime) |
| Code | https://github.com/OpenDriveLab/UniAD |
Online mapping component (MapFormer): UniAD includes a dedicated MapFormer module that constructs online maps in BEV space:
- Segments lanes and road semantics
- Online mapping accuracy improved 30% over prior SOTA at time of publication
- Map features serve as inputs to MotionFormer (motion prediction) for agent-road interaction modeling
Full pipeline: MapFormer -> TrackFormer -> MotionFormer -> OccFormer -> Planner (all transformer decoder-based with query interfaces)
Airside relevance: MEDIUM. UniAD demonstrates the principle that online mapping should be integrated with planning rather than treated as a standalone perception module. For airside, the map-to-motion interaction (how do map elements constrain vehicle behavior?) is more important than raw map accuracy.
7. Foundation Model Approaches
7.1 Current State of Foundation Models for Mapping
Foundation model integration into online mapping is still nascent (as of early 2026). Unlike object detection and segmentation where SAM/DINOv2 have been rapidly adopted, online HD map construction poses unique challenges:
- BEV transformation bottleneck: Foundation models produce 2D perspective features; online mapping requires BEV features. The view transformation step limits direct feature transfer.
- Structured output requirement: Map elements are polylines/curves with class labels and topology, not bounding boxes or masks. Foundation model outputs need significant adaptation.
- Domain gap: SAM/DINOv2 are trained on natural images, not BEV representations.
7.2 SAM for Road Segmentation
Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving (2024) evaluates SAM's zero-shot capabilities:
- Acceptable adversarial robustness under black-box corruptions and white-box attacks
- Promising for road object segmentation without explicit prompts
- Limitation: SAM segments 2D perspective images; converting to vectorized BEV maps requires additional geometric reasoning
SAM for Road Object Segmentation (2025): Comprehensive evaluation showing SAM handles diverse road objects under challenging conditions but lacks the structured output (polylines, topology) needed for HD map construction.
7.3 DINOv2 for Map Feature Extraction
DINOv2's self-supervised visual features provide strong backbones for perception, but direct integration into online mapping faces the same BEV bottleneck. The most promising approach:
- Use DINOv2 as the image backbone (replacing ResNet-50/Swin)
- Apply adapter-mediated feature transfer (LoRA rank 32 optimal, per existing research in this repository)
- Maintain the standard view transform + decoder pipeline
Current gap: No published work specifically uses DINOv2 as a backbone for MapTR-family online mapping with reported metrics. This remains an open research opportunity.
7.4 MapVision (CVPR 2024 Challenge Winner)
| Paper | MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report |
| Venue | CVPR 2024 Workshop |
Winner of the CVPR 2024 Mapless Driving Challenge. Key technical approach:
- Multi-perspective camera images + standard-definition maps as input
- Map encoder pre-training to enhance geometric encoding
- YOLOX for traffic element detection
- Auxiliary tasks from MapTRv2
Key insight: Purely map-free approaches struggle at road far-ends and under occlusion. Lightweight SD map priors (globally available from OpenStreetMap) provide a valuable complement.
7.5 Vision-Language Integration (Emerging)
Early 2025-2026 work explores VLMs for mapping:
- LLM-based approaches for HD map updates (using language descriptions of changes)
- Vision-language agents for driving scenario understanding
- Multimodal reasoning about map structure
Current maturity: Low. No published method demonstrates competitive mAP on standard benchmarks using foundation model / VLM approaches for map construction. The structured geometric output requirement of mapping is poorly served by current VLM architectures.
Airside relevance of foundation models overall: MEDIUM-LONG-TERM. The most promising near-term application is using DINOv2 features as a robust backbone for airport-specific online mapping. Foundation models' true value may be in zero-shot transfer to new airport environments without fine-tuning.
8. Mamba/SSM Approaches
8.1 MambaMap
| Paper | MambaMap: Online Vectorized HD Map Construction using State Space Model |
| Venue | arXiv 2025 |
Innovation: Applies Mamba (S6) state space models to temporal fusion in online map construction, replacing transformer-based attention for processing historical frame sequences.
Architecture:
Current Multi-view Images
|
v
BEV Feature Extraction
|
+---> Memory Bank: stores historical BEV features and instance queries
|
v
BEV Mamba Fusion: enhances current BEV with historical BEV via SSM
|
v
Instance Mamba Fusion: temporal fusion of instance queries via SSM
| - Gating mechanism selectively integrates dependencies
| - Linear complexity O(n) vs transformer's O(n^2)
|
v
Map Element Decoder
|
v
Vectorized HD MapKey advantage: SSM-based temporal fusion is linear in sequence length, enabling efficient processing of long historical sequences. The gating mechanism selectively integrates relevant temporal information while suppressing noise.
Performance: SOTA on nuScenes and Argoverse 2 with strong robustness across diverse scenarios.
Airside relevance: HIGH. The computational efficiency of SSM-based temporal fusion is critical for deployment on edge devices (NVIDIA Orin at 275 TOPS). If MambaMap achieves similar accuracy to transformer-based temporal methods at lower compute cost, it could enable longer temporal memory on hardware-constrained airside vehicles. This aligns with the DriveMamba finding elsewhere in this repository (42% L2 reduction, 3.2x faster, 68.8% less GPU memory).
9. Benchmarks and Metrics
9.1 nuScenes Map Construction Benchmark
Dataset: 1,000 scenes, 6 cameras, LiDAR, radar, IMU, GPS. 28,130 samples.
Standard evaluation (60x30m range):
| Method | Venue | Ped. Cross. | Lane Div. | Road Bound. | Mean mAP |
|---|---|---|---|---|---|
| MapTR | ICLR 2023 | 45.2 | 53.8 | 54.3 | 51.1 |
| MapTRv2 | IJCV 2024 | -- | -- | -- | 68.7 |
| BeMapNet | CVPR 2023 | -- | -- | -- | 57.7-62.6 |
| PivotNet | ICCV 2023 | -- | -- | -- | 53.8 |
| MGMap | CVPR 2024 | 57.4 | 63.5 | 63.3 | 61.4 |
| HIMap | CVPR 2024 | -- | -- | -- | 68.5+ |
| GeMap | ECCV 2024 | -- | -- | -- | 69.4-76.0 |
| MapNeXt-Tiny | arXiv 2024 | -- | -- | -- | 54.8 |
| MapNeXt-Huge | arXiv 2024 | -- | -- | -- | 78.5 |
Temporal methods (StreamMapNet geographic splits, harder):
| Method | 60x30m mAP | 100x50m mAP | FPS |
|---|---|---|---|
| StreamMapNet | 34.1 | 22.2 | 14.2 |
| MemFusionMap | 38.0 | 27.6 | 15.1 |
| MapTracker | 40.3* | -- | -- |
*Requires GT tracking supervision
Consistency-aware (MapTracker splits):
| Method | mAP | C-mAP | Consistency Gap |
|---|---|---|---|
| StreamMapNet | 70.4 | 56.4 | 14.0 |
| MapTracker | 76.1 | 69.1 | 7.0 |
9.2 Argoverse 2 Map Benchmark
Dataset: ~250,000 vector maps, richer annotation than nuScenes. Larger geographic diversity.
| Method | Venue | mAP | Notes |
|---|---|---|---|
| MapTRv2 | IJCV 2024 | 67.4 | Baseline |
| GeMap | ECCV 2024 | 71.8-76.0 | +4.4 over MapTRv2 |
| MapTracker | ECCV 2024 | 76.9 | Consistency-focused |
| MemFusionMap | WACV 2025 | 54.3-60.6 | On harder geographic split |
9.3 OpenLane-V2 Topology Benchmark
Dataset: 2,000 annotated road scenes, 2.1M instance-level annotations, 1.9M positive topology relationships.
Metrics:
- DET_l: Centerline detection score
- TOP_ll: Lane-lane topology accuracy
- TOP_lt: Lane-traffic topology accuracy
- OLS (OpenLane-V2 Score): Combined metric
| Method | Venue | OLS | Key Innovation |
|---|---|---|---|
| TopoNet | 2023 | 20.0 | First GNN-based |
| LaneSegNet | ICLR 2024 | 29.7 | Lane segments |
| TopoMLP | ICLR 2024 | 41.2 | Detect-then-reason |
| TopoLogic | NeurIPS 2024 | 44.1 | Geometric reasoning |
| T2SG | CVPR 2025 | 46.3 | Scene graph |
| TLSD | ACML 2025 | 47.7 | GNN + distance |
9.4 NAVSIM Benchmark
| Paper | NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking |
| Venue | NeurIPS 2024 |
| Code | https://github.com/autonomousvision/navsim |
NAVSIM evaluates planning (not mapping directly) but is relevant because it measures how well map-free end-to-end approaches perform in realistic driving scenarios.
Key finding (CVPR 2024 Challenge, 143 teams, 463 entries): Simple methods with moderate compute (TransFuser) can match large-scale E2E architectures (UniAD), suggesting that raw model complexity is less important than training data quality and evaluation methodology.
NAVSIM v2 (2025): Pseudo-simulation achieves strong correlation with traditional closed-loop simulation at 6x less compute.
9.5 CVPR 2024 Mapless Driving Challenge
Part of the Autonomous Grand Challenge, this track specifically evaluates autonomous driving without HD maps:
- Input: multi-perspective cameras + standard-definition maps
- Winner: MapVision (SD map encoder + YOLOX + MapTRv2 auxiliaries)
- Key insight: SD maps are not HD maps but provide essential coarse structure
10. Comprehensive Performance Comparison
10.1 Taxonomy of Methods (2023-2026)
| Category | Methods | Best mAP (nuScenes) | Key Trade-off |
|---|---|---|---|
| Single-frame | MapTRv2, GeMap, HIMap, MGMap | 78.5 (MapNeXt-Huge) | High accuracy, no consistency |
| Temporal | StreamMapNet, MapTracker, MemFusionMap, MambaMap | 76.1 (MapTracker) | Consistency + occlusion handling |
| Prior-augmented | NMP, PriorDrive, P-MapNet | +3-6% over base | Requires prior availability |
| Topology | LaneSegNet, TopoMLP, T2SG, TLSD | 47.7 OLS | Connectivity reasoning |
| Map-free E2E | VAD, UniAD | N/A (planning metric) | Integrated perception-planning |
| Expert/specialized | MapExpert, BeMapNet | SOTA on both datasets | Element-type awareness |
10.2 Open-Source Availability Summary
| Method | GitHub | Venue | Available |
|---|---|---|---|
| MapTracker | woodfrog/maptracker | ECCV 2024 | Yes |
| MemFusionMap | Song-Jingyu/MemFusionMap | WACV 2025 | Yes |
| SQD-MapNet | shuowang666/SQD-MapNet | ECCV 2024 | Yes |
| GeMap | cnzzx/GeMap | ECCV 2024 | Yes |
| HIMap | BritaryZhou/HIMap | CVPR 2024 | Yes |
| MGMap | xiaolul2/MGMap | CVPR 2024 | Yes |
| BeMapNet | er-muyue/BeMapNet | CVPR 2023 | Yes |
| PivotNet | wenjie710/PivotNet | ICCV 2023 | Yes |
| ADMap | hht1996ok/ADMap | ECCV 2024 | Yes |
| TopoNet | OpenDriveLab/TopoNet | 2023 | Yes |
| TopoMLP | wudongming97/topomlp | ICLR 2024 | Yes |
| LaneSegNet | OpenDriveLab/LaneSegNet | ICLR 2024 | Yes |
| TopoLogic | Franpin/TopoLogic | NeurIPS 2024 | Yes |
| NeuralMapPrior | Tsinghua-MARS-Lab/neural_map_prior | CVPR 2023 | Yes |
| UniAD | OpenDriveLab/UniAD | CVPR 2023 | Yes |
| VAD | hustvl/VAD | ICCV 2023 | Yes |
| NAVSIM | autonomousvision/navsim | NeurIPS 2024 | Yes |
| StreamMapNet | yuantianyuan01/StreamMapNet | WACV 2024 | Yes |
11. Airside Applicability Analysis
11.1 Airport-Specific Challenges vs Road Driving
| Challenge | Road Driving | Airport Airside | Impact on Online Mapping |
|---|---|---|---|
| Lane markings | Abundant, standardized | Sparser, aviation-specific | Need custom element classes |
| Occlusion sources | Vehicles, pedestrians | Aircraft (massive), GSE, containers | Temporal fusion critical |
| Speed | 30-120 km/h | 5-25 km/h | More frames per area, richer temporal data |
| Topology complexity | Complex intersections | Simpler junctions, more repetitive | Topology methods should transfer well |
| Environmental variation | Season, weather, lighting | Same + jet blast heat haze, de-icing fluid | Need robustness to aviation-specific disturbances |
| Map element types | 3-5 types (standard) | 8-10+ types (aviation-specific) | Expert/specialized decoders needed |
| Training data | nuScenes/Argoverse (millions of frames) | None public, must collect | Transfer learning mandatory |
11.2 Recommended Methods for Airside Deployment
Tier 1: Highest priority for evaluation
| Method | Why | Difficulty |
|---|---|---|
| MapTracker | Tracking formulation handles persistent occlusions from aircraft; consistency metrics directly relevant for safety | Medium (code available, well-documented) |
| MapUnveiler | +10.7% in occluded scenes; directly addresses the #1 airside challenge | Medium (NeurIPS 2024) |
| LaneSegNet | Lane segment representation naturally maps to taxiway segments; topology output feeds routing | Medium (code available, OpenDriveLab quality) |
| Neural Map Prior | Repeated traversals of same routes build strong priors; handles weather degradation | Medium (code available) |
Tier 2: Strong candidates with specific advantages
| Method | Why | Difficulty |
|---|---|---|
| MapExpert | Expert decoders for different element types; critical for aviation-specific markings | Medium (AAAI 2025, no code yet) |
| PriorDrive | AIXM/AMXM as SD map prior; exactly the airport use case | Medium-High (no code yet) |
| GeMap | Geometry-aware learning exploits ICAO marking standards | Low (code available) |
| MemFusionMap | Practical: no tracking supervision, short training, real-time | Low (code available) |
| MambaMap | Efficient temporal fusion for Orin deployment | Medium-High (no code yet) |
Tier 3: Architectural insights to adopt
| Method | Insight to Extract |
|---|---|
| MapNeXt | Training recipes: +5.8 mAP from training alone |
| TopoLogic | Post-processing topology enhancement (no retraining) |
| SQD-MapNet | Denoising as plug-in temporal training technique |
| T2SG | Scene graph representation for full airport movement area |
11.3 Proposed Airside Online Mapping Architecture
Combining insights from the surveyed methods:
Input Layer:
Multi-view cameras (existing reference airside AV stack cameras or new)
4-8 RoboSense LiDAR (existing stack)
AIXM/AMXM prior (loaded per airport)
|
v
Feature Extraction:
Image backbone: DINOv2 with LoRA adapters (frozen foundation, airport-specific adapters)
LiDAR backbone: PointPillars (6.84ms on Orin)
BEV fusion: BEVFormer or LSS
|
v
Temporal Fusion (choose one):
Option A: MapTracker-style strided memory (best consistency)
Option B: MemFusionMap working memory (simplest, most practical)
Option C: MambaMap SSM fusion (most efficient for Orin)
|
v
Prior Integration:
NeuralMapPrior: GPS-indexed neural tiles from repeated traversals
PriorDrive-style: AIXM geometry as vector prior encoding
Cross-attention fusion: current BEV features <-> prior features
|
v
Map Element Decoder:
MapExpert-style: specialized expert layers per element type
Airport-specific classes:
- Taxiway centerline
- Taxiway edge
- Holding position marking
- Stand lead-in line
- Safety zone boundary
- ILS critical area marking
- Stand number (requires camera OCR)
|
v
Topology Head:
LaneSegNet-style: lane segments with connectivity graph
Airport extensions:
- Taxiway-to-taxiway connections
- Taxiway guidance sign associations
- NOTAM-aware dynamic topology (closed taxiways removed from graph)
|
v
Output:
Vectorized map elements (polylines/Bezier curves)
Topology graph (adjacency matrix)
Confidence per element
Temporal consistency score
|
v
Integration:
Feed into Frenet planner as drivable space constraints
Feed into global router as navigation graph
Feed into safety monitor as boundary constraints11.4 Data Strategy
- Pre-training: Use nuScenes + Argoverse 2 to train base model on road mapping (abundant data)
- Domain adaptation: Fine-tune with ~500-1,000 annotated frames from target airport (per LoRA transfer learning findings)
- Self-improvement: Deploy Neural Map Prior to accumulate airport-specific knowledge from repeated operations
- Cross-airport transfer: Freeze backbone, retrain adapter layers + prior maps per airport
12. Research Roadmap
12.1 Immediate (0-6 months)
- Reproduce MapTracker on nuScenes; evaluate consistency metrics
- Reproduce MemFusionMap as simpler alternative
- Create airside annotation classes extending nuScenes format for airport markings
- Collect pilot dataset at one airport (100-200 traversals)
- Evaluate PriorDrive with AIXM data as SD map prior
12.2 Near-term (6-18 months)
- Train airport-specific MapTracker or MemFusionMap with annotated data
- Add LaneSegNet topology for taxiway connectivity graph
- Deploy Neural Map Prior for route-level map accumulation
- Benchmark on airport-specific metrics (marking detection, routing accuracy)
- Evaluate MapExpert with airport-specific element experts
12.3 Longer-term (18-36 months)
- Release airport surface mapping benchmark (no public dataset exists -- opportunity)
- Foundation model backbone (DINOv2 or successor) for cross-airport zero-shot transfer
- Integrate with world model for joint mapping + prediction
- Multi-airport transfer learning via PriorDrive + AIXM priors
- Topology-aware planning using T2SG-style scene graphs for ATC instruction compliance
Sources
Temporal Fusion
- Yuan et al. "StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction." WACV 2024
- Chen et al. "MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping." ECCV 2024 (Oral)
- Song et al. "MemFusionMap: Working Memory Fusion for Online Vectorized HD Map Construction." WACV 2025
- Wang et al. "Stream Query Denoising for Vectorized HD-Map Construction." ECCV 2024
- "IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction." ACM MM 2024
- "PredMapNet: Future and Historical Reasoning for Consistent Online HD Vectorized Map Construction." arXiv 2025
Geometry and Representation
- Zhang et al. "Online Vectorized HD Map Construction using Geometry." ECCV 2024
- Zhou et al. "HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction." CVPR 2024
- Liu et al. "MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction." CVPR 2024
- Qiao et al. "End-to-End Vectorized HD-map Construction with Piecewise Bezier Curve." CVPR 2023
- Ding et al. "PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction." ICCV 2023
- Li et al. "MapNeXt: Revisiting Training and Scaling Practices for Online Vectorized HD Map Construction." arXiv 2024
- "MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert." AAAI 2025
- "ADMap: Anti-disturbance Framework for Vectorized HD Map Construction." ECCV 2024
- "MapUnveiler: Online Vectorized HD Map Construction with Clip-Level Token Interaction and Propagation." NeurIPS 2024
- "MapQR: Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction." ECCV 2024
Prior Integration
- Xiong et al. "Neural Map Prior for Autonomous Driving." CVPR 2023
- "PriorDrive: Enhancing Online HD Mapping with Unified Vector Priors." arXiv 2024
- "P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors." arXiv 2024
Topology
- OpenDriveLab. "TopoNet: Graph-based Topology Reasoning for Driving Scenes." 2023
- Wu et al. "TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning." ICLR 2024
- OpenDriveLab. "LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving." ICLR 2024
- Lv et al. "T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving." CVPR 2025
- "TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes." NeurIPS 2024
- "TLSD: Breaking the Limit of Topological Lane Mapping with Graph Knowledge and Distance Awareness." ACML 2025
Map-Free E2E
- Jiang et al. "VAD: Vectorized Scene Representation for Efficient Autonomous Driving." ICCV 2023 / ICLR 2026
- Hu et al. "Planning-oriented Autonomous Driving (UniAD)." CVPR 2023 (Best Paper)
- Tesla FSD v12-v14 technical analysis. 2024-2025
- Wayve GAIA-1/2/3 technical reports. 2023-2025
SSM/Mamba
- "MambaMap: Online Vectorized HD Map Construction using State Space Model." arXiv 2025
Benchmarks and Surveys
- Wang et al. "OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping." NeurIPS 2023
- "NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking." NeurIPS 2024
- Lyu et al. "Online High-Definition Map Construction for Autonomous Vehicles: A Comprehensive Survey." J. Sensor and Actuator Networks, 2025
- "M3TR: A Generalist Model for Real-World HD Map Completion." arXiv 2025
- CVPR 2024 Autonomous Grand Challenge -- Mapless Driving Track