Skip to content

Cutting-Edge 2025-2026 Papers & Developments

World Models, VLAs, and End-to-End Driving — What's Happening Right Now


1. Major Conference Papers (Late 2025 — Early 2026)

1.1 CVPR 2025 (June 2025)

PaperKey ContributionImpact
DiffusionDrive (Highlight)Truncated diffusion for real-time E2E planning, 10x faster than vanilla diffusionFirst real-time diffusion planner
SplatADJoint camera + LiDAR 3D Gaussian Splatting for AD simulationFirst 3DGS with multi-modal sensor sim
Data Scaling Laws for E2E ADLog-linear gains from 16 to 8,192 hours of driving dataValidates scaling for driving
DriveTransformerUnified transformer for all driving tasksEnd-to-end architecture simplification
SparseDrive0.06% collision rate, 7.3 FPS, fully sparse E2E drivingCurrent SOTA on nuScenes planning

1.2 NeurIPS 2025 (December 2025)

PaperKey ContributionImpact
WorldModelBenchFirst benchmark evaluating video generators AS world modelsStandardizes world model evaluation
VistaGeneralizable driving world model with multi-level controllability55% FID improvement, NeurIPS 2024 but widely adopted in 2025
AutoVLAAutonomous driving VLA with online RL fine-tuningBridge between VLAs and RL

1.3 ICLR 2025-2026

PaperKey ContributionImpact
GS-LiDAR (ICLR 2025)Panoramic 2D Gaussians for LiDAR simulationLiDAR-native 3DGS
AdaWM (ICLR 2025)Adaptive world model alignment for distribution shift2x success rate over DreamerV3
Copilot4D (ICLR 2024, continued impact)Discrete diffusion on LiDAR tokens65% Chamfer distance reduction

1.4 AAAI 2026 (February 2026)

PaperKey ContributionImpact
WorldRFTRL fine-tuning of world models with GRPO83% collision reduction on nuScenes
AD-L-JEPAFirst JEPA for driving LiDAR pre-training1.9-2.7x GPU-hour reduction vs MAE
Drive-OccWorldAction-conditioned 4D occupancy prediction33% improvement over UniAD
DrivingGPTUnified driving language (interleaved image+action tokens)PDMS 82.4% on NAVSIM

1.5 CVPR 2026 / ICML 2026 (Expected)

Paper/SystemWhat's KnownStatus
WorldLensFull-spectrum evaluation of driving world models in real-world settings, 24 dimensions across 5 axesAccepted CVPR 2026
Epona (ICCV 2025)Autoregressive + diffusion hybrid, minute-long drift-free generationOpen code expected
Dreamer V4Block-causal transformer world model, diamonds in Minecraft from offline dataMay 2025 preprint, expected venue 2026

2. Industry Developments (2025-2026)

2.1 NVIDIA (January 2026 — CES)

  • Alpamayo-R1-10B released: 10B VLA (8.2B Cosmos-Reason backbone + 2.3B action expert) with Chain-of-Causation reasoning
  • Alpamayo 1.5: RL post-training via GRPO, text-guided planning, flexible multi-camera
  • AlpaSim: Open-source microservice closed-loop simulator
  • Physical AI dataset: 1,727 hours across 25 countries, 700K reasoning traces (v1), 3M traces (v1.5)
  • Partners: Lucid, JLR, Uber, Berkeley DeepDrive
  • Key detail: Model weights are non-commercial (research/eval only). Designed as teacher for distillation.

2.2 Wayve

  • GAIA-3: 15B parameters, 10x data of GAIA-2, 9 countries
  • Wayve AV2.0: Single model navigating 500+ cities, no fine-tuning needed
  • US expansion: Adapting to US roads with just 500 hours of incremental data
  • Uber partnership: L4 trials in London planned for 2026
  • Valuation: $8.6B

2.3 Waymo

  • EMMA: End-to-end multimodal model built on Gemini, chain-of-thought reasoning
  • Waymo World Model: Built on DeepMind Genie 3, generates camera AND LiDAR data
  • Sim Agents: Scenario generation for testing
  • 200M+ autonomous miles in production
  • Scaling laws paper (June 2025): Validated power-law improvements for motion planning

2.4 Tesla

  • FSD V13: 48 neural nets, fully end-to-end from V12+
  • Temporal-Voxel Transformers: Latest architecture evolution
  • 3B+ FSD miles: Largest real-world E2E driving dataset
  • Dojo 2: Restarted, targeting 100K H100-equivalent scale in 2026

2.5 comma.ai

  • openpilot 0.11 (March 2026): First driving model fully trained in learned simulation
  • DiT world model: 2B parameters (500M/1B variants also available)
  • CVPR 2025 paper: On-policy learning from world model outperforms reprojective sim (52.49% vs 48.10% distance engagement)
  • 325+ car models, 100M+ miles, MIT licensed

2.6 Google DeepMind

  • Genie 2/3: Interactive world models generating photorealistic environments at 24fps/720p
  • Scaling to driving: Waymo World Model uses Genie 3 backbone
  • V-JEPA 2 (Meta, not DeepMind): 1B params, ~15x faster planning than video-generation baselines (16s vs ~4min)

2.7 Physical Intelligence

  • pi0.5: Updated foundation model for robotics
  • Flow matching action head: More stable than diffusion for action generation
  • Cross-embodiment: Transfers across robot types — driving is the next target

3. Emerging Paradigms

3.1 World Models as Simulators (The "Learned Sim" Paradigm)

The biggest paradigm shift: world models ARE the simulator. Instead of building a physics engine, you train a model that generates realistic sensor data given actions.

SystemApproachStatus
comma.ai openpilotTrain driving policy entirely in DiT world modelProduction (March 2026)
Waymo World ModelGenie 3 generates camera+LiDAR for testingResearch+
NVIDIA CosmosFoundation model for world simulationAvailable
Wayve GAIAGenerative world model as simulatorInternal use

Why this matters for airside: You don't need to build a CARLA-like airport simulator. Train a world model on your bags, then train your planner inside the world model. comma.ai proved this works.

3.2 VLAs Replacing Modular Stacks

The VLA paradigm (single model: sensor input → language reasoning → action output) is replacing modular perception-prediction-planning pipelines.

Stage20242025-2026
PerceptionSeparate detector + trackerVLA backbone handles it
PredictionSeparate motion forecasterWorld model predicts implicitly
PlanningSeparate trajectory optimizerVLA directly outputs trajectories
ExplanationNoneChain-of-Causation reasoning

3.3 RL Post-Training for Driving

Following the ChatGPT playbook: pre-train → SFT → RLHF. For driving: pre-train world model → supervised fine-tune on driving data → RL post-train in world model imagination.

SystemRL MethodResult
Alpamayo 1.5GRPO (3 reward signals)AlpaSim 0.73→0.81, minADE 1.22→1.11m
WorldRFTGRPO in latent world model83% collision reduction
SafeDreamerLagrangian-constrained RL94.3% cost reduction
Think2DriveMBRL in latent spaceExpert-level CARLA in 3 days

3.4 Tokenize Everything

The field is converging on: tokenize all modalities → predict next token → scale.

Images → VQ-VAE/FSQ tokens → Transformer predicts next image tokens
LiDAR → Pillar/voxel tokens → Transformer predicts next LiDAR tokens
Actions → Discretized bins → Transformer predicts next action tokens
Language → BPE tokens → Transformer predicts next language tokens
Occupancy → VQ-VAE tokens → Transformer predicts next occupancy tokens

ALL IN ONE MODEL: DrivingGPT, GAIA-1, Alpamayo

4. Current SOTA Leaderboards (March 2026)

4.1 nuScenes Planning

RankMethodCollision RateL2 Error (3s)Year
1SparseDrive0.06%1.55m2024
2DiffusionDrive0.08%1.48m2025
3VADv20.12%1.62m2024
4UniAD0.31%1.65m2023

4.2 NAVSIM

RankMethodPDMS ScoreYear
1NVIDIA GTRS89.3%2025
2DrivingGPT82.4%2025
3DiffusionDrive80.1%2025

4.3 nuScenes Occupancy Prediction

RankMethodmIoUYear
1GaussianFormer-244.2%2025
2FB-OCC43.5%2024
3FlashOcc42.1%2024
4SurroundOcc40.7%2023

4.4 World Model Video Generation (nuScenes)

RankMethodFIDFVDYear
1DriveDreamer-211.255.72025
2DrivingGPT12.78142.62025
3Drive-WM15.8122.72024

5. What's Ahead of the Curve

5.1 Predictions for 2026-2027

  1. World model simulators go mainstream — comma.ai proved it; expect Wayve, NVIDIA, and others to ship products trained primarily in learned simulators.

  2. VLA distillation pipelines mature — Alpamayo is designed as a teacher. Expect standardized distillation → edge deployment pipelines for 100M-500M parameter student models on Orin/Thor.

  3. Multi-modal world models — Models that jointly predict camera + LiDAR + radar futures (Waymo World Model already does camera + LiDAR). This enables full sensor simulation.

  4. Airside as a proving ground — Low speed, structured environment, high economic value = ideal for early world model deployment. Expect 2-3 more companies to enter airside AV with learned approaches.

  5. Regulatory frameworks catch up — ISO/PAS 8800 (Dec 2024) is the first standard addressing AI safety lifecycle. EASA AI Roadmap 2.0 targeting 2028 certification frameworks. FAA likely follows.

  6. Open-source world models reach production quality — OpenDWM, Cosmos, OccWorld ecosystem maturing rapidly. The gap between research and deployable code is shrinking.

5.2 Research Gaps Worth Pursuing

GapWhy It MattersWho Should Work On It
Airside driving datasetZero public datasets existYou — publish a subset, become the benchmark
LiDAR-native world modelsMost work is camera-centricCopilot4D direction, but needs more work
Multi-agent world models for GSENo airside coordination modelsCombine with Moonware-style turnaround prediction
Safety certification for world modelsNo proven methodology existsBuild AMLAS + UL 4600 safety case, publish it
Jet blast as learned hazardOnly lookup tables todayLearn residuals from CFD + real data

Sources

Public research notes collected from public sources.