Skip to content

World Models & AI for Autonomous Vehicles: Airside Reference ODD Master Synthesis

Date: 2026-03-21 Scope: Comprehensive research survey across 20 deep-dive reports covering world models, VLAs, simulation, end-to-end driving, and their applicability to autonomous vehicles. Airport airside operations are the detailed reference ODD in this synthesis, not the default lens for the whole repository.


Table of Contents

  1. Executive Summary
  2. The Landscape: What Exists Today
  3. Top Candidate Approaches for Airside AV
  4. NVIDIA Alpamayo: Deep Dive
  5. Recommended Architecture
  6. The Airside Data Problem & Solutions
  7. Simulation Strategy
  8. Safety & Certification Path
  9. Hardware & Deployment
  10. Research Report Index
  11. Key Papers & Repos
  12. Strategic Recommendations

1. Executive Summary

Read this page as an airside reference case for a generic AV stack. The same stack questions should be re-evaluated for road AVs, warehouses, logistics yards, ports, mines, construction sites, farms, delivery robots, and campuses before deployment claims are transferred.

The autonomous driving field has undergone a paradigm shift from modular perception-planning-control pipelines toward learned world models and end-to-end architectures. Three converging trends make this the right time to adopt these approaches for airside AV:

  1. World models are production-ready. Comma.ai's openpilot v0.11 ships a 2B-parameter diffusion transformer world model — the first robotics agent trained fully in learned simulation. Wayve's GAIA series (up to 15B params) navigates 500+ cities with a single model.

  2. NVIDIA Alpamayo provides a research foundation. Released January 2026, this 10B-parameter VLA (8.2B Cosmos-Reason backbone + 2.3B action expert) with Chain-of-Causation reasoning, 1,727 hours of driving data, and AlpaSim closed-loop simulator. Important: Model weights are non-commercial (research/eval only), and it is camera-only — designed as a teacher model for distillation to smaller edge-deployable models.

  3. Airside is an ideal ODD for world models. Low speeds (5-30 km/h), structured environments, and the absence of large-scale airside datasets (a weakness for traditional approaches) becomes a strength for world models that can leverage pre-training on road driving and adapt with minimal domain-specific data.

The Core Thesis

A world model pre-trained on road driving data, fine-tuned on airside data, and combined with occupancy-based planning can leapfrog traditional airside AV approaches that rely on hand-crafted rules and HD maps.

Key advantages over current airside AV solutions (TractEasy, reference airside AV stack, etc.):

  • No per-class object detection needed — occupancy prediction is class-agnostic, handling aircraft, GSE, personnel without per-type training
  • Generalization across airports — world models + VLAs can adapt to new layouts with minimal data, vs. re-mapping for each airport
  • Language-grounded reasoning — VLAs can process ground control instructions and provide explainable decisions for regulatory compliance
  • Synthetic data generation — world models generate training scenarios, addressing the zero-public-dataset problem for airside

2. The Landscape

2.1 World Model Architectures (Maturity Ranking)

ApproachMaturityKey SystemsDomain Fit Notes
Latent world models + RLProductionDreamerV3/V4, TD-MPC2, Think2DriveHigh — proven for control
VLA (Vision-Language-Action)Near-productionAlpamayo, LINGO-2, DriveVLM, pi0Very High — language grounding
Diffusion world modelsResearch+GAIA-2, Vista, DriveDreamer-2, DIAMONDHigh — simulation & planning
Occupancy world modelsResearch+OccWorld, Drive-OccWorld, OccSoraVery High — class-agnostic
Autoregressive video modelsResearchDrivingGPT, DrivingWorld, Copilot4DMedium — simulation
JEPA (embedding prediction)Early researchV-JEPA 2, AD-L-JEPAHigh potential — efficiency
3DGS/NeRF reconstructionResearch+Street Gaussians, SplatAD, PVGHigh — airport digital twins

2.2 Current Airside AV Players

CompanyProductAutonomyDeploymentsTech Stack
TractEasy (TLD + EasyMile)EZTow, EZDollyL4Narita, Changi, Munich, DubaiGPS + 3D LiDAR + fusion
reference airside AV stackautonomous baggage/cargo tug, autonomous cargo vehicleL4Zurich, Schiphol, HeathrowLiDAR + 360 cam + GPS + IMU
Charlatte AutonomAT135L4CDG (Air France), FrankfurtV2X + sensor fusion
FernrideTeleoperation + autonomyL4 teleNVIDIA partnershipProgressive autonomy
OhmioAutonomous shuttlesL4JFK, Schiphol, BrusselsUndisclosed

Critical gaps in current solutions:

  • All use traditional perception pipelines — no world models
  • No public airside driving datasets exist
  • FAA CertAlert 24-02 (Feb 2024) is the only formal US guidance
  • No A-SMGCS integration with autonomous GSE
  • Limited adverse weather handling (de-icing, jet blast, FOD)

2.3 Regulatory Landscape

StandardStatusRelevance
FAA CertAlert 24-02Only formal US guidance (2024)Acknowledges AGVS, standards in development
ISO 3691-4:2020Current certification pathDriverless industrial trucks — used by TractEasy/reference airside AV stack
ISO/PAS 8800Published Dec 2024AI safety lifecycle for AV — bridges ISO 26262 gap
EASA AI Roadmap 2.0In progress, targeting 2028W-shaped development process for aviation AI
ISO 21448 (SOTIF)ActiveSafety of intended functionality — key for ML models
UL 4600ActiveSafety case-based evaluation of autonomous products

3. Top Candidate Approaches for Airside AV

3.1 Tier 1: Build On Now

  • What: 10.5B VLA (8.2B Cosmos-Reason backbone + 2.3B action expert)
  • Why: Open-source, Chain-of-Causation reasoning generates trajectories AND explanations, 1,727h driving data across 25 countries, AlpaSim for closed-loop testing
  • Alpamayo 1.5: Adds RL post-training, flexible multi-camera support, text-guided planning
  • Airside path: Fine-tune on airside data, leverage language reasoning for ground instructions, use AlpaSim for scenario testing
  • License: Apache 2.0 + NVIDIA Open Model License
  • Partners: Lucid, JLR, Uber, Berkeley DeepDrive

B. Occupancy World Models (OccWorld / Drive-OccWorld)

  • What: Predict future 3D occupancy grids — class-agnostic scene prediction
  • Why: No per-class detection needed. Handles aircraft, GSE, personnel, FOD without type-specific training. Drive-OccWorld (AAAI 2025) adds action conditioning — 33% improvement over UniAD
  • Airside path: Pre-train on nuScenes/Waymo occupancy, fine-tune on airside. Jet blast zones become hazard occupancy predictions
  • Open source: OccWorld on GitHub (ECCV 2024)

C. NVIDIA Cosmos World Foundation Models

  • What: Open-weight world foundation models (4-14B params, trained on 9000T+ tokens)
  • Why: Pre-trained world models for fine-tuning on domain-specific data. Up to 2048x compression tokenizers
  • Airside path: Fine-tune Cosmos on airside video for simulation and data augmentation
  • License: Apache 2.0 + NVIDIA Open Model

3.2 Tier 2: Integrate Short-Term

D. 3D Gaussian Splatting for Airport Digital Twins

  • What: Reconstruct airport environments from sensor logs for photorealistic simulation
  • Key systems: Street Gaussians (135 FPS, 30min training), SplatAD (CVPR 2025, joint camera+LiDAR)
  • Airside path: Drive around airport, reconstruct as 3DGS scene, use for closed-loop testing with counterfactual scenarios (insert aircraft, simulate pushback, test FOD response)

E. Foundation Model Perception (Open-Vocabulary Detection)

  • What: SAM, Grounding DINO, YOLO-World for detecting objects never seen in training
  • Why: Airside has unique object classes (30+ types of GSE, 100+ aircraft variants). Open-vocabulary detection handles these without per-class annotation
  • Airside path: Zero-shot detection with text prompts ("baggage tractor", "jet bridge", "belt loader"), then fine-tune with LoRA on small airside dataset
  • Data estimate: 0-50 examples per class for zero-shot; 200-500 for fine-tuned 80%+ mAP

F. Map-Free Driving (MapTR/MapTRv2)

  • What: Online vectorized map construction from sensors — no HD map needed
  • Why: Airport layouts change frequently (construction, seasonal operations, NOTAMs). HD maps are expensive to maintain per-airport
  • Airside path: Combine with AIXM/AMXM airport data as prior, NOTAM integration for dynamic restrictions

3.3 Tier 3: Research & Future

G. JEPA-Style World Models (V-JEPA 2, AD-L-JEPA)

  • What: Predict future scene embeddings, not pixels — ~15x faster planning than video-generation baselines
  • Why: Massive efficiency gains. V-JEPA 2-AC plans in 16 seconds vs. 4 minutes for pixel-generation baselines
  • Status: Early but Meta open-sourcing aggressively. AD-L-JEPA is first driving-specific JEPA (AAAI 2026)

H. LLM-Based Planning & ATC Integration

  • What: LLMs for reasoning about traffic rules, ground control instructions, edge cases
  • Key: DriveReg achieves 100% traffic rule compliance via RAG. Delft's LLM-based ATC agent resolved 119/120 conflicts
  • Airside path: RAG over ICAO/FAA rules + airport-specific procedures. Process digital taxi instructions (NASA NLU work exists)

I. Multi-Agent Fleet Coordination

  • What: Shared world models across vehicle fleet, cooperative perception, turnaround sequencing
  • Key: Moonware HALO (deployed at US hubs, 20% delay reduction), SuperMap Apron Commander
  • Airside path: Fleet world model shared via 5G/CBRS, A-CDM integration for turnaround optimization

4. NVIDIA Alpamayo: Deep Dive

Alpamayo is the most directly applicable system for airside AV development. Here's why:

Architecture

Input: Multi-camera images + vehicle state
    |
    v
[Cosmos-Reason Backbone (8.2B)] -- Vision encoder + language model
    |
    v
[Chain-of-Causation Reasoning] -- Generates causal explanation of scene
    |                              ("Vehicle ahead is braking because...")
    v
[Action Expert (2.3B)] -- Generates driving trajectories
    |
    v
Output: Trajectory + Natural language explanation

Key Capabilities

  • Chain-of-Causation reasoning: Generates both trajectories AND interpretable explanations — critical for regulatory compliance in aviation
  • Text-guided planning (v1.5): Can follow natural language instructions — maps directly to ground control instructions
  • RL post-training (v1.5): Reinforcement learning fine-tuning improves real-world performance
  • 700K reasoning traces: Pre-built dataset for reasoning-aware training
  • AlpaSim: Microservice-based closed-loop simulator for testing

Airside Adaptation Strategy

  1. Phase 1 (0-3 months): Deploy Alpamayo in shadow mode on airside vehicles. Collect data with explanations.
  2. Phase 2 (3-6 months): Fine-tune on collected airside data. Add airport-specific reasoning traces (jet blast awareness, FOD detection, stand procedures).
  3. Phase 3 (6-12 months): RL post-training on airside scenarios via AlpaSim. Add ground control instruction following.
  4. Phase 4 (12+ months): Multi-airport deployment with few-shot adaptation.

5.1 Layered Architecture for Airside AV

┌─────────────────────────────────────────────────┐
│              SAFETY LAYER (Always Active)         │
│  Simplex architecture: High-performance + safe    │
│  RSS-based safety envelope │ OOD detection        │
│  Teleoperation fallback (Fernride-style)          │
└───────────────────────┬─────────────────────────┘

┌───────────────────────▼─────────────────────────┐
│           PLANNING & REASONING LAYER              │
│  Alpamayo VLA (trajectory + explanation)           │
│  OR Occupancy-based MPC (Drive-OccWorld)           │
│  LLM reasoning for edge cases (RAG over rules)    │
│  Multi-agent coordination (fleet world model)     │
└───────────────────────┬─────────────────────────┘

┌───────────────────────▼─────────────────────────┐
│           WORLD MODEL LAYER                       │
│  4D Occupancy prediction (OccWorld-style)          │
│  Motion prediction (MotionLM / MTR++)              │
│  Scene understanding (DINOv2 + open-vocab det.)   │
│  Online mapping (MapTRv2 + AIXM prior)            │
└───────────────────────┬─────────────────────────┘

┌───────────────────────▼─────────────────────────┐
│           PERCEPTION LAYER                        │
│  BEV fusion (BEVFormer/BEVFusion)                 │
│  Multi-sensor: cameras + LiDAR + 4D radar         │
│  Foundation model features (DINOv2 backbone)      │
│  Open-vocabulary detection (Grounding DINO)       │
└───────────────────────┬─────────────────────────┘

┌───────────────────────▼─────────────────────────┐
│           SENSOR & LOCALIZATION LAYER             │
│  Cameras (surround view) + LiDAR + 4D radar       │
│  RTK-GNSS + visual/LiDAR SLAM fallback           │
│  UWB beacons for GPS-degraded areas              │
│  ADS-B receiver for aircraft awareness            │
└─────────────────────────────────────────────────┘

5.2 Integration Points

Airport SystemIntegration MethodPurpose
A-SMGCSAPI / ASTERIX Cat 010/062Aircraft position awareness
A-CDM / AODBACRIS Semantic ModelFlight schedule, turnaround timing
NOTAM systemAIXM/AMXM parserDynamic restriction awareness
Digital towerData feedSurface movement clearances
ADS-B1090ES receiverAircraft proximity detection
Fleet management5G/CBRS meshVehicle coordination

5.3 Sensor Suite Recommendation

SensorQtyPurposeAirside Rationale
Cameras (surround)6-8Primary perceptionRich semantic info, low cost
LiDAR (128-ch)13D geometry, localizationHandles reflective surfaces, large objects
4D Radar2-4Velocity, all-weatherWorks in rain, de-icing spray, fog
RTK-GNSS1Primary localizationcm-level in open areas
UWB anchorsSite-basedGPS fallbackNear terminals, under aircraft
ADS-B receiver1Aircraft awarenessReal-time aircraft positions
IMU1Dead reckoningBridge GPS gaps

6. The Airside Data Problem & Solutions

6.1 The Problem

Zero public airside driving datasets exist. The only resources:

  • Synth_Airport_Taxii (synthetic, limited)
  • AssistTaxi (marking recognition, limited)
  • AeroVect proprietary (not available)

6.2 The Solution: 4-Phase Data Engine

PhaseTimelineStrategyData Volume
1. Foundation0-6 monthsPre-train on road driving (nuScenes, Waymo, nuPlan). Transfer learn with domain adaptation10,000+ hours (existing)
2. Bootstrap3-9 monthsDeploy in shadow mode at 1-2 airports. Manual annotation + auto-labeling (SAM + Grounding DINO)100-500 hours
3. World Model Training6-18 monthsTrain world model on airside data. Generate synthetic scenarios via Cosmos/GAIA. 3DGS digital twin of airports1,000+ hours (real + synthetic)
4. Scaling12+ monthsActive learning from fleet. Federated learning across airports. Continuous world model improvement5,000+ hours

6.3 Synthetic Data Generation Pipeline

Real Airport Scan (3DGS/NeRF)
    → Reconstructed Digital Twin
    → Insert synthetic agents (aircraft, GSE, personnel)
    → Vary weather, lighting, time-of-day
    → Render multi-sensor data (camera + LiDAR + radar)
    → World model training data

Tools: NVIDIA Cosmos for video generation, 3DGS for scene reconstruction, DriveDreamer-2 for LLM-prompted scenario generation, Unreal Engine with airport assets for structured simulation.

6.4 Domain Adaptation Strategy

Pre-training curriculum:

  1. General vision (DINOv2 on ImageNet)
  2. Driving video (Cosmos on road data)
  3. Structured low-speed driving (parking lots, industrial areas)
  4. Synthetic airport scenes
  5. Real airport data (fine-tuning)
  6. Continual learning from deployment

7. Simulation Strategy

7.1 Multi-Fidelity Simulation Stack

LevelToolFidelitySpeedPurpose
L1: Unit testAlpaSimLowReal-timeComponent testing
L2: ScenarioCARLA + airport assetsMedium10-100xScenario validation
L3: Neural simCosmos / GAIAHigh visual1-10xPerception testing
L4: Digital twin3DGS reconstructionPhotorealisticReal-timeClosed-loop testing
L5: Shadow modeReal vehicleReal1xPre-deployment validation

7.2 Key Scenarios to Simulate

ScenarioPriorityChallenge
Aircraft pushback (nose-in/nose-out)CriticalLarge dynamic object, jet blast
Crossing active taxiwayCriticalATC clearance required
FOD on apronHighSmall object detection at distance
De-icing operationsHighGlycol spray, reduced visibility
Multi-vehicle turnaround sequencingHighCoordination, timing
Emergency vehicle responseCriticalYield behavior
Night operations with apron lightingMediumLighting variation
Snow/ice operationsMediumTraction, visibility
Personnel walking near aircraftCriticalVulnerable road user prediction

8. Safety & Certification Path

Given that airside AV falls between automotive and aviation domains:

  1. Primary: ISO 3691-4:2020 (driverless industrial trucks) — used by current deployments
  2. AI safety: ISO/PAS 8800 (AI safety lifecycle) — newly published Dec 2024
  3. Functional safety: ISO 26262 adapted for airside ODD
  4. Intended functionality: ISO 21448 (SOTIF) — critical for ML-based perception
  5. Safety case: UL 4600 framework with GSN notation
  6. Future-proofing: Track EASA AI Roadmap 2.0 for aviation-specific AI certification

8.2 Safety Architecture for World Models

┌──────────────────────────────────────┐
│         WORLD MODEL PLANNER          │
│  (High performance, learned)         │
├──────────────────────────────────────┤
│         SAFETY MONITOR               │
│  - OOD detection (ensemble)          │
│  - Confidence calibration            │
│  - RSS safety envelope check         │
│  - Occupancy prediction validation   │
├──────────────────────────────────────┤
│      DECISION: Safe? ─── Yes ──→ Execute world model trajectory
│                  │
│                  No
│                  ↓
│         SAFE FALLBACK CONTROLLER     │
│  (Rule-based, formally verified)     │
│  - Graceful stop                     │
│  - Maintain safe distance            │
│  - Request teleoperation             │
└──────────────────────────────────────┘

8.3 AMLAS (Assurance of ML for Autonomous Systems) Process

  1. ML Safety Requirements → derive from SOTIF analysis
  2. ML Data Management → curate airside dataset with bias analysis
  3. ML Model Learning → train with safety-aware objectives (SafeDreamer)
  4. ML Model Verification → test against safety requirements
  5. ML Model Deployment → runtime monitoring + OOD detection
  6. ML Model Operation → continuous validation from fleet data

9. Hardware & Deployment

9.1 Compute Platform Recommendation

OptionTOPSPowerCostRecommendation
NVIDIA Orin27560W~$1,500Best for initial deployment — proven, Alpamayo compatible
NVIDIA Thor2,000+130WTBDFuture upgrade — runs full world models on-vehicle
Qualcomm Ride1,280 (dual)VariesCompetitiveAlternative if NVIDIA supply constrained

NVIDIA Orin (60W) is negligible on electric GSE with 30-100 kWh batteries (< 0.2% of battery capacity per hour).

9.2 Cloud-Edge Hybrid Architecture

WorkloadLocationLatencyRationale
Perception + planningOn-vehicle (Orin)<100msSafety-critical, must be real-time
World model inferenceOn-vehicle (Orin)<200msLatent-space models fit in 8GB
World model trainingCloud/edge serverAsyncToo compute-intensive for on-vehicle
Simulation (AlpaSim)Airport edge serverN/AScenario testing, not real-time
Fleet coordinationAirport edge server<50ms5G/CBRS connectivity
Data upload & labelingCloudAsyncLarge data volumes (1-4 TB/hour)

9.3 Fleet Cost Model (50 vehicles)

ComponentYear 1 CostNotes
Compute (Orin per vehicle)$75KHardware + integration
Sensors per vehicle$30-50KLiDAR + cameras + radar + GNSS
5G/CBRS infrastructure$200-400KAirport-wide coverage
Edge server$100-200KTraining + fleet coordination
Cloud (training)$200-500KGPU cluster rental
Software development$1-5MWorld model adaptation, integration
Total Year 1$2.1-9.2M
Payback1-3 yearsVia labor cost savings

10. Research Report Index

Core World Models & Architecture

ReportFileKey Findings
World Models for AV30-autonomy-stack/world-models/overview.md17+ architectures, readiness assessment
End-to-End AV30-autonomy-stack/end-to-end-driving/e2e-architectures.mdSparseDrive SOTA (0.06% collision), recommended architecture
Diffusion World Models30-autonomy-stack/world-models/diffusion-world-models.mdSora, Genie 2/3, 13 driving-specific models
RL with World Models30-autonomy-stack/world-models/rl-with-world-models.mdDreamer v1-v4, TD-MPC2, SafeDreamer
Tokenized & JEPA30-autonomy-stack/world-models/tokenized-and-jepa.mdVQ-VAE world models, JEPA ~15x faster planning
4D Occupancy30-autonomy-stack/world-models/occupancy-world-models.mdClass-agnostic prediction, jet blast as hazard occupancy
Occupancy Comparison30-autonomy-stack/world-models/occupancy-networks-comparison.md20 methods compared, FlashOcc 197.6 FPS
Open-Source Repos30-autonomy-stack/world-models/opensource-implementations.md21 repos rated, only 6 fully usable

Models & Approaches

ReportFileKey Findings
VLAs30-autonomy-stack/vla-vlm/vla-for-driving.md15+ VLA models, LINGO-2, DriveVLM
Alpamayo30-autonomy-stack/vla-vlm/alpamayo-setup.mdCamera-only, non-commercial, teacher model for distillation
Company Deep Dives30-autonomy-stack/end-to-end-driving/company-approaches.mdWayve/NVIDIA/Tesla/Waymo/Comma.ai strategies
LLM Reasoning30-autonomy-stack/planning/llm-reasoning-planning.mdDriveReg 100% rule compliance, Moonware HALO
Motion Prediction30-autonomy-stack/planning/motion-prediction.mdMotionLM, gate turnaround sequencing
Foundation Models30-autonomy-stack/perception/overview/vision-foundation-models.mdSAM, DINO, open-vocab detection
DINOv230-autonomy-stack/perception/overview/dinov2-foundation-models-driving.mdAdapter-mediated integration, LoRA rank 32 optimal

Simulation & Data

ReportFileKey Findings
Neural Simulation30-autonomy-stack/simulation/neural-simulation-platforms.mdAlpamayo, Cosmos, 12+ sim platforms
3DGS & NeRF30-autonomy-stack/simulation/neural-scene-reconstruction.mdStreet Gaussians 135 FPS, SplatAD
Airport Digital Twins30-autonomy-stack/simulation/airport-digital-twins.mdAutonoma, SITA, $50K-$4M
Simulators for Airside30-autonomy-stack/simulation/simulators-for-airside.mdCARLA/AWSIM/Isaac Sim comparison
Datasets & Benchmarks50-cloud-fleet/data-platform/data-engines-datasets.mdZero airside datasets, 4-phase bootstrap plan
Synthetic Data50-cloud-fleet/data-platform/synthetic-data-generation.mdCosmos +16.2% mAP foggy, 7-phase pipeline

Deployment & Safety

ReportFileKey Findings
Safety & Certification60-safety-validation/standards-certification/certification-guide.mdISO/PAS 8800, EASA AI Roadmap, AMLAS
ISO 3691-460-safety-validation/standards-certification/iso-3691-4-deep-dive.md$130K-380K, harmonized May 2024, 27 safety functions
Regulatory Trajectory80-industry-intel/regulations/regulatory-trajectory-deep-dive.mdFAA AC ~2028-2029, EASA ~2028
Ground Crew Safety70-operations-domains/airside/safety/ground-crew-pedestrian-safety.md27K accidents/yr, hi-vis paradox
Insurance & Liability80-industry-intel/regulations/insurance-liability-airside.md$35M per engine exposure
Compute & Hardware20-av-platform/compute/edge-platforms.mdOrin 275 TOPS @ 60W, fleet cost model
NVIDIA Orin20-av-platform/compute/nvidia-orin-technical.md8 power modes, DLA 74% at 15W
NVIDIA Thor20-av-platform/compute/nvidia-drive-thor.md~1000 TOPS, FP8 native, 2025+
TensorRT Guide20-av-platform/compute/tensorrt-deployment-guide.mdPointPillars 6.84ms, DLA deployment
4D Radar20-av-platform/sensors/4d-radar.mdContinental ARS548, immune to all weather
Airside Industry70-operations-domains/airside/operations/industry-overview.md21 companies, FAA CertAlert 24-02
Multi-Agent & Fleet30-autonomy-stack/multi-agent-v2x/fleet-coordination.mdV2X, Moonware, 5G/CBRS, A-CDM
Mapping & Localization30-autonomy-stack/localization-mapping/overview/mapping-and-localization.mdMap-free driving, AIXM, NOTAM integration
Robustness60-safety-validation/verification-validation/robustness/adverse-conditions.mdDe-icing, jet blast, FOD, 4D radar

11. Key Papers & Repos

Must-Read Papers

  1. Alpamayo — NVIDIA's 10.5B VLA for L4 AV (CES 2026)
  2. GAIA-1/2/3 — Wayve's world model series (9B-15B params)
  3. DreamerV3/V4 — Model-based RL at scale (Nature 2025)
  4. OccWorld — 3D occupancy world model (ECCV 2024)
  5. Drive-OccWorld — Action-conditioned occupancy (AAAI 2025)
  6. EMMA — Waymo's end-to-end multimodal model (on Gemini)
  7. Cosmos — NVIDIA world foundation models (9000T+ tokens)
  8. SafeDreamer — Safe RL with world models (ICLR 2024)
  9. WorldRFT — RL fine-tuning for world models (AAAI 2026, 83% collision reduction)
  10. V-JEPA 2 — Meta's efficient video world model (2025)

Key Open-Source Repos

RepoDescription
NVIDIA Alpamayo10.5B VLA + AlpaSim
NVIDIA CosmosWorld foundation models
OpenDWMDriving world model framework
OccWorld3D occupancy world model
DriveDreamerWorld model from real driving
DiffusionDriveReal-time E2E driving (CVPR 2025)
DIAMONDDiffusion RL world model
V-JEPA 2Efficient video prediction
AD-L-JEPAJEPA for driving LiDAR
openpilotOpen-source E2E driving
Awesome-World-ModelPaper collection

12. Strategic Recommendations

Immediate Actions (0-3 months)

  1. Set up Alpamayo + AlpaSim — Download, run inference, understand the architecture
  2. Begin airside data collection — Mount cameras on existing GSE in shadow mode
  3. Build 3DGS digital twin — Scan your primary airport for simulation
  4. Deploy open-vocab detection — Grounding DINO for zero-shot airside object detection

Short-Term (3-12 months)

  1. Fine-tune Alpamayo on airside data — Start with simple scenarios (straight-line baggage tug routes)
  2. Train occupancy world model — Pre-train on nuScenes, fine-tune on airside
  3. Build synthetic data pipeline — Cosmos for video generation, 3DGS for scenarios
  4. Engage with FAA/EASA — Begin safety case development using AMLAS methodology

Medium-Term (12-24 months)

  1. RL post-training — Fine-tune world model policy with SafeDreamer-style constraints
  2. Multi-airport adaptation — Test few-shot generalization to new airports
  3. Fleet coordination — Shared world model via 5G, A-CDM integration
  4. Ground control instruction following — LLM-based digital taxi instruction processing

Long-Term (24+ months)

  1. Full autonomous turnaround sequencing — Multi-agent world model for coordinated GSE
  2. Continuous learning data engine — Fleet-wide active learning, federated model updates
  3. Certification — ISO 3691-4 + ISO/PAS 8800 + SOTIF safety case

Key Risk Mitigations

RiskMitigation
No airside training dataSynthetic data + transfer learning + active learning
Regulatory uncertaintyBuild safety case with established frameworks (AMLAS, UL 4600)
World model hallucinationSimplex architecture with verified fallback controller
Compute constraintsStart with Orin (latent models), upgrade to Thor for full models
Airport stakeholder buy-inDemonstrate in shadow mode first, show explainable decisions via VLA

This synthesis draws from 159 research documents totaling ~134,000 lines of analysis across 300+ papers, models, and systems. See INDEX.md for topic-based navigation.

Next StepDocument
Start coding today90-synthesis/master/getting-started.md
Choose POCs90-synthesis/poc-roadmaps/poc-proposals.md
Assess readiness90-synthesis/readiness-risk/technology-readiness.md
Understand competition80-industry-intel/market-competitive/competitive-landscape.md
Manage risks90-synthesis/readiness-risk/risk-register.md
Full architecture90-synthesis/decisions/design-spec.md

Public research notes collected from public sources.