Mapping, Localization, and Map-Free Autonomous Driving: Technical Report with Airport Airside Applicability

Cross-Section Reading Path
Map-Free / Map-Lite Autonomous Driving
Online Mapping and Scene Understanding
Localization Approaches
Spatial Memory and Place Recognition
Airport-Specific Mapping Challenges
Synthesis: Implications for Airport Airside Autonomy

Cross-Section Reading Path

For photoreal 4D reconstruction, Gaussian maps, dynamic neural scene assets, and feed-forward splatting, start with Photoreal City-Scale 4D Reconstruction. That page separates SLAM pose sources from reconstruction assets and links into the relevant knowledge-base, simulation, perception, and world-model pages.

1. Map-Free / Map-Lite Autonomous Driving

1.1 The Problem with HD Maps

High-Definition maps have been the backbone of autonomous driving for a decade, providing centimeter-level lane geometry, traffic signal positions, road boundaries, and semantic landmarks. However, they carry fundamental limitations that make them increasingly untenable at scale:

Production cost: HD maps cost up to $1,000 per kilometer to produce, requiring specialized survey vehicles with surveying-grade LiDAR, cameras, and GNSS equipment, followed by extensive post-processing in data centers.
Update latency: Construction, lane reconfigurations, and infrastructure changes can render maps stale within days, yet update cycles span months or years.
Geographic coverage: Despite a projected market of $2.19 billion by 2032, HD map coverage remains limited to major urban corridors. Rural roads, private facilities, and airport airside environments are largely unmapped.
Brittleness: As Elon Musk articulated at Tesla Autonomy Day 2019, HD map dependency creates systems that are "extremely brittle" -- too dependent on the map and unable to adapt to environmental changes the way humans do.

These factors have driven a research trajectory from full HD map dependence toward map-lite and fully map-free paradigms.

1.2 Taxonomy of Map Approaches

A comprehensive 2025 survey identifies three evolutionary stages:

Stage	Era	Characteristics	Examples
HD Maps	2015--2021	Lane-level precision, centimeter accuracy, manual annotation	TomTom Road DNA, Waymo, HERE
Lite Maps	2021--present	Crowdsourced, automated generation, daily updates	Tesla FSD auto-labeling, Mobileye REM, Huawei RoadCode
Implicit Maps	2023--present	Knowledge encoded in neural network parameters	Tesla FSD v12+, Wayve GAIA, NVIDIA end-to-end

Lite Maps represent a pragmatic middle ground. Rather than deploying specialized survey fleets, they leverage production vehicles with consumer-grade sensors. Raw sensor data is discarded after on-board vectorization (preserving user privacy), and only vectorized elements are uploaded for crowd-sourced map aggregation. This enables urban coverage with significantly reduced production costs and update cycles measured in hours rather than months.

Implicit Maps represent the most radical departure: environmental knowledge is encoded within neural network parameters rather than through traditional explicit map formats. This supports differentiable processing and backpropagation in joint learning systems, enabling end-to-end optimization from perception through planning.

1.3 Key Online Map Construction Methods

MapTR / MapTRv2

MapTR (ICLR 2023 Spotlight) introduced a structured modeling and learning framework for online vectorized HD map construction. Its key innovation is permutation-equivalent modeling -- representing map elements as point sets with groups of equivalent permutations. This accurately describes map element geometry while stabilizing the learning process, solving the fundamental problem of ambiguous point orderings in polyline representations.

MapTRv2 (IJCV 2024) extends MapTR with:

Hierarchical query embedding: Flexibly encodes structured map information at multiple levels
Hierarchical bipartite matching: Stabilizes learning through structured assignment
Auxiliary one-to-many matching: Accelerates convergence with dense supervision
Performance: 4x shorter training schedule, 2.8 higher mAP than MapTR v1, real-time inference speed
State-of-the-art on both nuScenes and Argoverse2 benchmarks

The MapTR family popularized efficient single-stage, parallel decoding with permutation-equivalent queries, establishing the dominant paradigm for online map construction.

HDMapNet

HDMapNet (ICRA 2022) was a foundational work in online HD semantic map learning. Its architecture:

Input Encoding: Images processed through EfficientNet-B0 with a neural view transformer (MLP) for perspective-to-BEV projection; point clouds encoded via PointPillars
BEV Decoder: Produces three simultaneous outputs via MLP heads:
- Semantic segmentation (drivable area, lane markings, crossings)
- Instance embedding (discriminative loss for clustering)
- Direction estimation (lane direction bins)
Vectorization: Instance polylines constructed by clustering embedding maps, followed by greedy polyline tracing based on predicted direction bins

HDMapNet's limitation is its reliance on heuristic post-processing for vectorization, which restricts scalability. This motivated the development of end-to-end approaches like MapTR.

VectorMapNet

VectorMapNet (ICML 2023) introduced end-to-end vectorized HD map learning with a coarse-to-fine, two-stage architecture:

BEV feature extraction from multi-modal sensor data
DETR-like map element detector for coarse element proposals
Polyline generator for fine-grained vectorized output

Map elements are represented as sparse polyline sets that are directly compatible with downstream tasks (motion forecasting, planning), avoiding the segmentation-then-vectorize pipeline of HDMapNet.

Neural Map Prior (NMP)

Neural Map Prior (CVPR 2023) introduces a complementary concept: a learned neural representation of global maps that aids online map prediction. Rather than constructing maps from scratch each frame, NMP maintains incrementally updated global map tiles:

Architecture: Global NMP stored as sparse map tiles, each corresponding to a real-world location
Fusion mechanism: Current BEV features are refined using corresponding NMP prior features through:
- C2P (Current-to-Prior) attention: Cross-attention that selectively weights current vs. prior features
- GRU (Gated Recurrent Unit): Dynamically updates global neural map with enhanced BEV features
Adaptive weighting: When current frame quality is high, the network assigns more weight to current features; when quality is poor (rain, night), it relies more heavily on the prior
Key result: NMP is "particularly useful on rainy days and at night," and substantially improves results as perception range increases

NMP is architecture-agnostic and can be applied to various map segmentation and detection methods, including HDMapNet and VectorMapNet.

StreamMapNet

StreamMapNet (WACV 2024) addresses the critical limitation of single-frame approaches: temporal instability. It introduces a streaming temporal fusion mechanism with two strategies:

Query propagation: Retains high-confidence element queries from frame to frame
BEV fusion: Aligns and fuses BEV features from consecutive frames

This enables long-range perception (up to 100x50 meters vs. 60x30 meters for single-frame methods) with temporal consistency. StreamMapNet surpasses other methods by at least 10.2 mAP on the original nuScenes split.

Follow-up work SQD-MapNet adds stream query denoising for further temporal consistency improvements.

MapExpert

MapExpert (AAAI 2025) addresses the observation that different map element types (lane boundaries, pedestrian crossings, road edges) have distinct geometric characteristics. It employs sparse experts distributed by routers to describe various non-cubic map elements accurately, rather than treating all elements with a unified decoder.

Other Notable Methods

MapNeXt: Revisits training and scaling practices, showing significant gains from proper optimization
DTCLMapper: Dual Temporal Consistent Learning for vectorized construction
HeightMapNet (WACV 2025): Explicit height modeling for end-to-end HD map learning
MambaMap: Applies State Space Models (Mamba) to online map construction
P-MapNet: Encodes SD map priors as conditional branches with masked autoencoders

1.4 MapEX and Prior Map Integration

MapEX categorizes existing maps into three types and refines query-based estimation models' matching algorithms for handling map priors. However, its approach of simulating outdated maps by introducing artificial offsets and erasing elements risks leaking ground truth data and fails to accurately represent real-world map staleness.

More promising approaches for prior integration include:

PriorDrive: Unified vector prior encoding combining multiple prior map types
Compressed Map Priors: 3D perception enhancement through compressed map representations
Performance gains from SD map integration: HDMapNet +3.0% mAP, VectorMapNet +3.9%, StreamMapNet +5.9%, MapTRv2 +5.7%

1.5 How Map-Free Relates to World Models: The Model IS the Map

The convergence of map-free driving and world models represents a fundamental paradigm shift. World models are generative spatio-temporal neural systems that compress multi-sensor physical observations into a compact latent state and roll it forward under hypothetical actions, letting the vehicle rehearse futures before they occur.

Key World Model Architectures

OccWorld (ECCV 2024) learns a 3D occupancy world model that simultaneously predicts ego movement and surrounding scene evolution:

Uses a VQVAE scene tokenizer on 3D occupancy to obtain discrete scene tokens
Employs a GPT-like spatial-temporal generative transformer for future prediction
Can forecast future map elements (drivable areas) and agent movements without instance or map supervision
Demonstrates that the spatial structure traditionally encoded in maps can emerge from learned occupancy representations

GAIA Series (Wayve):

GAIA-1 (2023): 9-billion parameter generative world model treating video, text, and control as one token stream
GAIA-2 (2025): Controllable multi-view generation with lighting-consistent rendering
GAIA-3 (December 2025): 15 billion parameters (2x GAIA-2), trained on 10x more data spanning 9 countries across 3 continents
These models learn road rules and actor intent through unsupervised prediction rather than map annotation

BEV-Based Implicit Maps: Methods like FIERY and MILE compress entire scenes into compact latent codes encoding static structure and moving actors simultaneously, creating a unified top-down representation without explicit cartographic annotation.

Occupancy-Based Encoding: 3D voxel grids where each cell is assigned an occupancy probability yield "far richer geometric detail than 2D projections," enabling fine-grained reasoning about occlusions and spatial relationships.

Neural Volumetric Representations: Methods like UniSim learn photoreal feature grids from a single drive, while implicit-field methods (e.g., UnO) leverage future point clouds to learn NeRF-style 4D occupancy fields. These turn the map into a differentiable oracle for visibility or risk gradients.

The Conceptual Shift

The implicit map thesis holds that a sufficiently powerful world model renders explicit maps redundant:

Traditional Pipeline	World Model Pipeline
Survey -> Annotate -> Store -> Retrieve -> Plan	Observe -> Encode -> Predict -> Plan
Map is an external artifact	Map is an emergent property of the model
Updates require re-surveying	Updates occur through continued observation
Discrete, tile-based	Continuous, differentiable
Static between updates	Dynamic, frame-by-frame

Tesla's FSD v12+ exemplifies this at production scale: the system relies primarily on visual perception using cameras, with neural networks making path decisions based on fleet-learned driving patterns rather than pre-built maps. By January 2025, Tesla customers had driven 3 billion miles on FSD (Supervised), and the company launched its Robotaxi service in Austin, Texas in June 2025.

1.6 The CVPR 2024 Mapless Driving Challenge

The CVPR 2024 Autonomous Grand Challenge included a "Mapless Driving" track, explicitly requiring autonomous driving without HD maps. The winning approach, MapVision, demonstrated:

Multi-perspective camera images combined with Standard-Definition (SD) maps from OpenStreetMap
SD map integration into BEV feature maps via map encoder pre-training
Enhanced traffic element detection via YOLOX
Auxiliary tasks borrowed from MapTRv2 for comprehensive scene understanding

Key insight: purely map-free approaches struggled at road far-ends and under occlusion, suggesting that lightweight SD map priors (available globally from sources like OpenStreetMap) provide a valuable complement to sensor-only perception.

2. Online Mapping and Scene Understanding

2.1 Real-Time Vectorized Map Construction from Sensors

The modern online mapping pipeline follows a consistent architecture:

Multi-Camera Images -> 2D Feature Extraction -> View Transform -> BEV Features -> Map Element Decoder -> Vectorized Output
        |                                                              |
   LiDAR Points -----> Point Cloud Encoding -------------------------+

View Transformation is the critical step converting perspective image features to BEV space. Approaches include:

IPM (Inverse Perspective Mapping): Geometric projection assuming flat ground plane
Lift-Splat-Shoot (LSS): Predicts depth distribution per pixel and "lifts" 2D features to 3D, then "splats" onto BEV grid
BEVFormer: Uses deformable attention with learnable BEV queries to sample spatial features from multi-camera images
Neural view transformer: MLP-based direct spatial transformation (as in HDMapNet)

Decoding strategies have evolved from two-stage (detect then refine) to single-stage parallel decoding:

Two-stage: VectorMapNet's DETR-like detection followed by polyline generation
Single-stage: MapTR's permutation-equivalent queries enabling direct parallel polyline prediction
Streaming: StreamMapNet's temporal propagation for consistent reconstruction across frames

2.2 Lane Detection and Road Boundary Detection

Lane detection in the BEV paradigm has advanced significantly:

Monocular lane detection remains active with deep learning survey (November 2024) covering 100+ methods
BEV-based methods focus on the view transformation challenge from front-view to BEV features
LLFormer4D (2025): LiDAR-based lane detection using temporal feature fusion and sparse transformers
TopoSD (2024): Topology-enhanced lane segment perception with SD map priors
HeightMapNet (WACV 2025): Explicit height modeling addressing the flat-ground assumption that causes errors on slopes and overpasses

Real-time performance has reached practical levels: SparseBEV achieves 67.5 NDS on nuScenes at 23.5 FPS on a single GPU.

2.3 Semantic Scene Understanding

Online map construction methods now handle multiple semantic categories simultaneously:

Drivable area segmentation: Binary classification of navigable surfaces
Lane boundary detection: Vectorized polylines for lane dividers
Pedestrian crossing detection: Both vectorized and instance-level
Road edge/curb detection: Boundary between road and non-road surfaces
Traffic element detection: Signs, signals, and their lane associations
Centerline extraction: For routing and topology reasoning

2.4 Topological Mapping

Understanding not just where map elements are but how they connect is critical for navigation:

T2SG (Traffic Topology Scene Graph) (CVPR 2025) defines a unified scene graph that explicitly models:

Lanes controlled and guided by different road signals (e.g., right turn)
Topology relationships among lanes
TopoFormer: A one-stage transformer with Lane Aggregation Layer (leveraging geometric distance among centerlines) and Counterfactual Intervention Layer (modeling reasonable road structures like intersections)
Achieves 46.3 OLS on OpenLane-V2 benchmark

Layered Topology Mapping: Uses intersections as reference landmark nodes, with each node organized across metric, semantic, and topology layers, providing a hierarchical representation suitable for long-range navigation.

2.5 Airport Surface Applicability

Online mapping approaches are directly relevant to airport surfaces, though adaptation is needed:

Road Domain	Airport Equivalent	Challenge Level
Lane markings (white/yellow lines)	Taxiway centerlines (yellow), runway markings (white)	Moderate -- color-coded but different grammar
Road boundaries / curbs	Taxiway edge markings, safety lines	High -- less distinct physical boundaries
Pedestrian crossings	Vehicle crossing points, personnel corridors	High -- not standardized like zebra crossings
Traffic signals	Taxiway guidance signs, stop bars	High -- aviation-specific signage system
Drivable area	Movement area vs. non-movement area	Critical -- misclassification = runway incursion
Intersection topology	Taxiway junction topology	Critical -- wrong turn = safety incident

The key technical gap: existing models are trained exclusively on road driving datasets (nuScenes, Argoverse). Airport surface environments have different marking conventions, wider operational areas, and unique obstacles (aircraft, jet bridges, GSE equipment). Transfer learning and domain-specific datasets would be required.

3. Localization Approaches

3.1 Visual Localization

Visual localization determines vehicle pose from camera images, either relative to a previously built map or through visual odometry:

Visual Odometry (VO) estimates ego-motion from frame-to-frame visual changes:

Feature-based methods (ORB-SLAM3) extract and track sparse keypoints
Direct methods (DSO, LDSO) minimize photometric error across entire image regions
Hybrid approaches combine both for robustness

Map-based visual localization matches current observations against a pre-built visual map:

Feature matching against 3D point cloud maps (e.g., HLoc pipeline)
Scene coordinate regression (learning to predict 3D coordinates directly)
Image retrieval followed by local feature matching

Advantages for airport environments: Cameras are low-cost, provide rich semantic information, and work in GPS-degraded areas. Limitations: Sensitive to lighting changes, weather conditions, and dynamic scene content.

3.2 LiDAR-Based Localization and SLAM

LiDAR provides precise geometric measurements that are invariant to lighting conditions:

LiDAR Odometry approaches:

ICP-based: Point-to-Point and Point-to-Plane Iterative Closest Point algorithms
Feature-based: LOAM (LiDAR Odometry and Mapping) and its variants extract edge and planar features
Learning-based: GenZ-ICP (2024) uses adaptive weighting for generalizable and degeneracy-robust odometry

GPS-Denied LiDAR SLAM (comprehensive 2025 survey):

Key challenges identified:

Degeneracy: In geometrically feature-less environments (long corridors, open fields, wide aprons), scan matching lacks sufficient constraints. Solid-state LiDARs are particularly susceptible due to limited FOV and fewer feature points per scan
Drift: Accumulated odometry error over long trajectories. Sliding window optimization (as in Fast-LIO) reduces cumulative errors
Dynamic environments: Moving objects interfere with mapping accuracy. Dynamic object filtering and segmentation are active research areas

Recent advances (2024-2025):

DALI-SLAM: Degeneracy-aware LiDAR-inertial SLAM with novel distortion correction
GenZ-ICP: Adaptive weighting scheme that is robust to degeneracy
Anti-degeneracy schemes using ResNet and transformer-based particle filter classification
SLAM2REF: Long-term mapping with 3D LiDAR and reference map integration for precise 6-DoF estimation

3.3 Multi-Sensor Fusion for Robust Localization

No single sensor is sufficient for all conditions. Multi-sensor fusion combines complementary strengths:

Typical fusion stack:

GNSS/RTK  ----\
IMU       -----\
LiDAR     -------> State Estimation (EKF/Factor Graph) -> Pose Estimate
Camera    ------/
Wheel Odom ---/

Fusion strategies:

Loosely coupled: Each sensor produces independent estimates that are fused at the state level
Tightly coupled: Raw sensor measurements are jointly optimized (higher accuracy, greater complexity)
Adaptive fusion: Sensor trust weights are dynamically adjusted based on real-time quality evaluation

A 2024 study demonstrated tightly coupled integration of vector HD map, LiDAR, GNSS, and INS for precise vehicle navigation in GNSS-challenging environments, showing that map-aided localization can maintain centimeter accuracy even when GNSS degrades.

3.4 Localization in GPS-Degraded Environments

This challenge is directly relevant to airport airside operations, where GPS signals are degraded by:

Multipath reflections from terminal buildings, hangars, and aircraft fuselages
Signal blockage under aircraft wings and near tall structures
Electromagnetic interference from radar and communication equipment

GNSS multipath effects in urban canyons (and analogously, airport aprons) cause position errors of several meters. Traditional RTK accuracy of 1-2 cm degrades to meter-level estimates.

Mitigation approaches:

LiDAR-map matching: Pre-built 3D point cloud maps of the airport surface enable centimeter-level localization without GNSS. The vehicle matches its current LiDAR scan against the stored map using ICP or NDT algorithms
Visual-inertial odometry (VIO): Combines camera and IMU for drift-limited dead reckoning between GPS fixes
VLOAM (Visual LiDAR Odometry and Mapping): Fuses visual and LiDAR data for robust navigation in degraded conditions
LiDAR-OSM matching: Using OpenStreetMap or equivalent databases to constrain particle filter localization when GPS is denied
Infrastructure-aided positioning: UWB beacons, magnetic markers, or reflective targets placed at known positions on the airport surface

3.5 Centimeter-Level Positioning Requirements

Autonomous airport operations demand centimeter-level accuracy for:

Precise stand positioning (aircraft door alignment with jet bridge)
Baggage cart navigation between conveyor and aircraft
Pushback operations near aircraft and other GSE
Taxiway centerline following

RTK-GNSS provides the baseline centimeter accuracy when signals are available:

Carrier-phase based differential GNSS technique
Requires base station network transmitting real-time corrections
PPP-RTK: Hybrid approach leveraging both RTK and Precise Point Positioning

When RTK fails (multipath, blockage), multi-sensor fusion maintains accuracy:

Adaptive multi-sensor frameworks: Dynamically fuse LiDAR, IMU, and RTK-GNSS data based on real-time sensor quality evaluation
Vision-RTK systems: Feed all available sensor data into fusion engines, combining GNSS with relative positioning
Map-aided localization: Pre-built HD maps of the airport surface provide geometric constraints that bound drift

AeroVect case study: AeroVect's autonomous GSE system integrates Point One Navigation's RTK corrections from a network of thousands of ground-based reference stations. Their AeroVect Explorer mapping vehicle can create a digital twin of a major airport in less than 2 hours, providing the reference map for subsequent autonomous operations. Centimeter-level accuracy enables safe operation in confined apron spaces.

4. Spatial Memory and Place Recognition

4.1 Visual Place Recognition for Driving

Visual Place Recognition (VPR) answers the question "where am I?" by matching current observations against a database of previously visited locations. This is critical for loop closure in SLAM and for re-localization after GPS outages.

Key methods and evolution:

NetVLAD (2016): Aggregates CNN features with a differentiable VLAD layer. Established the deep learning baseline for VPR
MixVPR (WACV 2023): Feature mixing for VPR, achieving 58.4% recall@1 on the challenging Nordland benchmark -- a 69% improvement over CosPlace and 79% over NetVLAD
DINO-Mix (2024): Combines DINOv2 foundation model features with feature mixing, succeeding where other methods fail under viewpoint changes, illumination changes, and seasonal variations
MS-MixVPR (2024): Multi-scale feature extraction from different CNN layers, creating compact holistic representations robust to environmental changes

LiDAR-based place recognition (2024 survey) has become equally important:

3D point cloud descriptors (PointNetVLAD, MinkLoc3D)
Scan context representations for efficient retrieval
Advantages: invariant to illumination, robust in adverse weather
Provides long measurement distance and rich 3D information

4.2 Long-Term Mapping and Change Detection

Real-world environments change over time. Autonomous systems must:

Detect changes between current observations and stored maps
Update maps to reflect new reality
Maintain historical versions for temporal reasoning

SLAM2REF (2024) addresses long-term mapping by integrating 3D LiDAR SLAM with reference map data for precise 6-DoF trajectory estimation and map extension, explicitly handling the challenge of environments that evolve over construction seasons.

Crowd-sourced map updates (the Lite Map paradigm) address this at fleet scale: each vehicle uploads vectorized observations, and the aggregation system detects changes by comparing new observations against the existing map. Tesla's auto-labeling and Mobileye REM are production implementations of this approach.

4.3 Seasonal and Lighting Variation Handling

The 4Seasons benchmark (IJCV 2024) provides the definitive evaluation for localization under appearance variation:

300+ km of recordings across 9 environments (parking garages, urban tunnels, countryside, highway)
>1 year of data collection capturing snow, rain, sun, and night conditions
Jointly evaluates visual odometry, global place recognition, and map-based localization
Centimeter-level ground truth from fused stereo-inertial odometry with RTK GNSS

Key findings: Methods that combine geometric and appearance features (multi-scale, multi-modal) significantly outperform those relying on appearance alone. Foundation model features (DINOv2) show remarkable robustness to appearance changes due to pre-training on diverse visual data.

Airport relevance: Airport surfaces experience significant appearance variation -- day/night operations, rain pooling on aprons, snow/ice coverage, jet blast heat shimmer, reflective surfaces from wet pavement. Systems must maintain localization across all these conditions.

4.4 AirLoc and Aviation-Specific Localization

AiRLoc (ICLR 2023 Workshop) applies reinforcement learning to aerial view localization, specifically targeting search-and-rescue scenarios. While not directly designed for airport surface operations, it demonstrates:

Aerial patch-based goal specification for localization
RL-based active search strategy for visual matching
Follow-up work GOMAA-Geo generalizes to ground-level imagery and natural language goal specifications

Other aerial localization methods:

CrossLoc (CVPR 2022): Scalable aerial localization assisted by multimodal synthetic data
LoD-Loc v2 (ICCV 2025): Aerial visual localization over low level-of-detail city models using explicit silhouette alignment
FoundLoc: Vision-based onboard aerial localization using foundation models (AnyLoc)
UAV-VisLoc: Large-scale dataset for UAV visual localization

Gap analysis: No published research specifically targets airport surface vehicle localization as a distinct domain. The closest work comes from autonomous GSE companies (AeroVect, reference airside AV stack) who develop proprietary solutions combining RTK-GNSS, LiDAR, and cameras, but publish limited technical details.

5. Airport-Specific Mapping Challenges

5.1 Frequently Changing Airport Layouts

Airports are among the most dynamic operational environments:

Construction projects: Terminal expansions, runway extensions, taxiway reconfigurations occur continuously at major airports
Seasonal changes: Deicing pad activation/deactivation, seasonal stand configurations
Temporary configurations: Special event layouts, temporary barriers, construction zones
Time-of-day changes: Gate assignments, remote stand usage varies with schedule

Impact on autonomous systems: Any autonomous vehicle operating airside must handle map staleness as a first-class concern. A baggage tractor that learned the layout last week may encounter a newly closed taxiway segment today.

Map-free approaches are especially valuable here: The argument for online map construction and world-model-based spatial understanding is stronger in airports than on public roads, because:

Airport layouts change more frequently than road networks
The operational area is geographically bounded (enabling manageable prior mapping)
Lite/implicit map approaches can be updated through fleet observations during normal operations

5.2 Construction Zones and Temporary Closures

Dynamic geofencing is the primary mechanism for managing temporary restrictions:

Keep-in geofences: Define the allowed operational area for autonomous vehicles
Keep-out geofences: Exclude construction zones, closed taxiways, and restricted areas
Virtual boundaries triggered by GNSS, RFID, or cellular data
Must be updated in real-time as conditions change

FAA guidance (Emerging Entrants Bulletin 25-02, CertAlert 24-02):

AGVS may operate in movement areas that are closed to aircraft operations
Airport sponsors must ensure risks are "understood, properly considered, and mitigated"
Remote areas and landside locations are preferred for initial testing
Coordination required with regional FAA Airport Certification and Safety Inspectors

Dynamic map architectures combine static base maps with real-time overlay layers:

Static Base Map (AMXM/AIXM geometry)
    + Real-time NOTAM layer (closures, restrictions)
    + Live sensor layer (detected obstacles, vehicles)
    + Operational layer (active stand assignments, pushback clearances)
    = Current operational map

5.3 NOTAM Integration for Mapping Dynamic Restrictions

NOTAMs (Notices to Air Missions) are the aviation standard for communicating temporary changes to airport operations. Digital NOTAM integration is directly relevant to autonomous airside vehicles:

What NOTAMs cover for surface operations:

Runway closures and threshold displacements
Taxiway closures (full or partial)
Runway declared distance changes
Construction zone boundaries
Temporary obstacle notifications
Equipment outages (lighting, navigation aids)

Integration approaches:

Notamify MCP: Real-time NOTAM intelligence with AI agents, enabling automated assessment of runway availability, taxiway closures, and approach availability
Digital NOTAM services: Graphical presentation of airspace availability on "current airspace activity maps"
EFB integration: Electronic Flight Bag applications depict NOTAM information graphically within 15-20 seconds of publication

For autonomous vehicles: NOTAMs must be parsed into machine-readable geofence updates. A closed taxiway becomes a keep-out zone; a displaced threshold changes the available movement area. The Digital NOTAM format (based on AIXM) enables direct machine processing, but integration with vehicle-level mapping remains an engineering challenge requiring:

NOTAM parsing into geometric constraints
Constraint projection onto the vehicle's local map representation
Path replanning to respect new restrictions
Confirmation/verification that the physical environment matches the NOTAM (e.g., physical barricades present at closed taxiway)

5.4 Airport Surface Markings and Signage Recognition

Airport surface markings follow strict ICAO/FAA standards that differ significantly from road markings:

Color coding:

White: Runway markings (centerline, threshold, touchdown zone, aiming point)
Yellow: Taxiway markings (centerline, edge, holding position)
Red: Runway guard lights, stop bars
Black background with yellow inscription: Surface painted location signs

Signage system:

Mandatory instruction signs: Red background, white text (runway hold position, ILS critical area)
Location signs: Yellow text, black background (taxiway designators)
Direction signs: Yellow background, black text (routing guidance)
Information signs: Yellow background, black text (supplementary info)

Computer vision for airport markings (2025 research):

AssistNet: CNN classifier distinguishing runways from taxiways with 99.5% validation accuracy
ALINA (Automated Line Identification and Notation Algorithm): Trapezoidal ROI establishment, color space transformation, CIRCLEDAT algorithm for pixel identification
Synth_Airport_Taxii: Synthetic dataset generation for airport taxiway navigation training

Key challenges:

Shadows, tire marks, and varying surface conditions degrade detection
Taxiways and runways share road-like appearances, differing mainly in marking color
Dynamic ROI adjustment needed as vehicle perspective changes
Environmental variations (lighting, weather, surface contamination)
Labeled training data is scarce, motivating synthetic data approaches

5.5 Integration with AIXM and AMXM

AIXM (Aeronautical Information Exchange Model):

Developed jointly by FAA, NGA, and EUROCONTROL
Provides a logical data model (UML class diagrams) and XML schema
Encodes aeronautical information for AIS (Aeronautical Information Services)
Based on Geography Markup Language (GML)
Current version: AIXM 5.2
Covers: airspace, procedures, obstacles, airport mapping data
Enables coding of dynamic aeronautical data including closures and restrictions

AMXM (Aerodrome Mapping Exchange Model):

EUROCAE WG-44 / RTCA SC-217 specification for Aerodrome Mapping Databases (AMDB)
Provides detailed geometric information for airport surfaces:
- Runway geometry and markings
- Taxiway centerlines and edges
- Apron boundaries
- Stand/gate positions
- Building footprints
- Obstacle locations
Based on ISO 19100/OGC standards with GML 3.2 schema
Compliant with ICAO Annex 14/15 SARPs
Can be used with OGC Web Feature Service (WFS) for SWIM information services
Bidirectional data exchange with AIXM 5.1

ARINC 816 extends AMXM with additional elements including anchor points for map label placement and tessellated polygons.

For autonomous vehicles: AIXM/AMXM provides the authoritative geometric base map for airport surface operations. This serves as the "HD map equivalent" for airports -- but unlike road HD maps, AIXM/AMXM is:

Maintained by airport authorities as a regulatory requirement
Updated through formal change management processes
Available in standardized, machine-readable formats
Already includes detailed geometry for taxiways, aprons, and stands

The integration opportunity: use AIXM/AMXM as the static base map (analogous to SD/lite maps in road driving), augmented by online perception for dynamic elements and fine-grained local features not captured in the database.

5.6 Indoor-Outdoor Transitions

Airport operations span multiple environments with different characteristics:

Environment	GPS	LiDAR	Vision	Challenges
Open apron	Good (with multipath)	Excellent	Good	Aircraft occlusion, jet blast
Under aircraft	Blocked	Partially occluded	Limited	Confined space, dynamic
Near terminal	Degraded (multipath)	Good	Variable (shadows)	Pedestrians, GSE congestion
Inside baggage hall	None	Good	Controlled lighting	Indoor SLAM required
Tunnel/underpass	None	Limited range	Low light	Degeneracy risk

Transition handling requires:

Seamless switching between GPS-aided and GPS-denied localization modes
Map continuity across indoor/outdoor boundaries
Consistent coordinate frame maintenance
Degradation detection and fallback strategies

6. Synthesis: Implications for Airport Airside Autonomy

6.1 Recommended Technical Architecture

Based on this analysis, the optimal mapping and localization architecture for airport airside autonomous vehicles combines elements from multiple approaches:

Layer 1: STATIC BASE MAP
    Source: AIXM/AMXM airport mapping database
    Content: Taxiway geometry, stand positions, building footprints
    Update cycle: Formal change management (weeks/months)

Layer 2: OPERATIONAL MAP
    Source: NOTAM integration + ATC clearances
    Content: Closed areas, active runway assignments, temporary restrictions
    Update cycle: Real-time (seconds/minutes)

Layer 3: ONLINE PERCEPTION MAP
    Source: Vehicle sensors (cameras, LiDAR)
    Content: Detected markings, obstacles, other vehicles, aircraft positions
    Architecture: MapTR-style online vectorized construction
    Update cycle: Frame-by-frame (10-30 Hz)

Layer 4: NEURAL MAP PRIOR
    Source: Fleet-aggregated observations
    Content: Learned spatial priors for adverse conditions
    Architecture: NMP-style global neural tiles
    Update cycle: Continuous fleet learning (hours/days)

LOCALIZATION STACK:
    Primary: RTK-GNSS (centimeter accuracy in open areas)
    Secondary: LiDAR-map matching against Layer 1 geometry
    Tertiary: Visual-inertial odometry for GPS gaps
    Fusion: Adaptive EKF/factor graph with quality-based weighting

6.2 Key Research Gaps for Airport Application

Airport surface training data: No public dataset exists for airport surface online mapping (equivalent to nuScenes/Argoverse for roads). Synthetic data generation (as in Synth_Airport_Taxii) is a promising but insufficient substitute.
Aviation marking recognition models: Current road-trained models do not understand taxiway centerlines, holding position markings, or ILS critical area boundaries. Domain-specific fine-tuning or training is required.
AIXM/AMXM to perception pipeline: No published work bridges standardized aviation mapping formats with modern online map construction frameworks. Building this bridge would provide airports a significant advantage over road environments.
Multi-vehicle coordination mapping: Airport operations involve coordinated movement of multiple autonomous vehicles (baggage tractors, pushback tugs, cargo loaders). Shared/cooperative mapping and localization across a fleet is essential but underexplored.
Regulatory framework: The FAA acknowledges AGVS testing but has not issued comprehensive technical standards for navigation and positioning systems. ICAO-level standardization is preferred but not yet developed.

6.3 Why World Models Matter for Airports

The world model paradigm is particularly compelling for airport airside operations:

Bounded operational domain: Unlike road driving with infinite geographic scope, airports are finite, surveyable spaces. A world model can be comprehensively trained on a specific airport.
Repetitive operations: Airport GSE performs the same routes repeatedly (baggage hall to stand, cargo area to aircraft). This repetition is ideal for world model learning.
High change frequency: The argument against HD maps is strongest in environments that change frequently. Airports change more frequently than roads.
Safety-critical with fallback options: Airport operations can be paused (vehicles stopped) in ways that highway driving cannot, providing a safer context for world model deployment with human oversight.
Rich prior structure: AIXM/AMXM provides geometric priors that can bootstrap world model training, combining the structure of explicit maps with the adaptability of neural representations.

SLAM Methods

Methods

Mapping, Localization, and Map-Free Autonomous Driving: Technical Report with Airport Airside Applicability ​

Table of Contents ​

Cross-Section Reading Path ​

1. Map-Free / Map-Lite Autonomous Driving ​

1.1 The Problem with HD Maps ​

1.2 Taxonomy of Map Approaches ​

1.3 Key Online Map Construction Methods ​

MapTR / MapTRv2 ​

HDMapNet ​

VectorMapNet ​

Neural Map Prior (NMP) ​

StreamMapNet ​

MapExpert ​

Other Notable Methods ​

1.4 MapEX and Prior Map Integration ​

1.5 How Map-Free Relates to World Models: The Model IS the Map ​

Key World Model Architectures ​

The Conceptual Shift ​

1.6 The CVPR 2024 Mapless Driving Challenge ​

2. Online Mapping and Scene Understanding ​

2.1 Real-Time Vectorized Map Construction from Sensors ​

2.2 Lane Detection and Road Boundary Detection ​

2.3 Semantic Scene Understanding ​

2.4 Topological Mapping ​

2.5 Airport Surface Applicability ​

3. Localization Approaches ​

3.1 Visual Localization ​

3.2 LiDAR-Based Localization and SLAM ​

3.3 Multi-Sensor Fusion for Robust Localization ​