Skip to content

Open-Source World Model Implementations for Driving

Research Date: 2026-03-22 Purpose: Evaluate every usable open-source world model implementation for autonomous driving research


Table of Contents

  1. Recommendation Matrix
  2. Tier 1: Production-Ready Research Tools
  3. Tier 2: Usable with Effort
  4. Tier 3: Partial Code / Inference Only
  5. Tier 4: Paper-Only / Broken Code
  6. Detailed Evaluations
  7. Dependency Compatibility Matrix

Recommendation Matrix

Actually Usable for Research Today

RepoUsabilityTraining CodePretrained WeightsActive Maint.Best For
NVIDIA CosmosYESYESYESYESGeneral world foundation model, fine-tuning for AV
OpenDWMYESYESYESYESMulti-view driving video generation, LiDAR generation
CarDreamerYESYESYESYESRL-based driving in CARLA with world models
DIAMONDYESYESYESYESDiffusion world model RL (Atari/CSGO, adaptable)
DiffusionDriveYESYESYESYESReal-time end-to-end planning with diffusion
MagicDriveYESYESYESModerateStreet-view image/video generation with 3D control
EponaYESYESYESYESAutoregressive diffusion, video gen + planning
VistaYESYESYESLowGeneralizable driving video prediction
OccWorldYESYESYESLow3D occupancy world modeling
PanaceaYESPartialYESLowPanoramic multi-view video generation
comma.ai commaVQPartialPartialYESYESTokenized driving video prediction
NVIDIA AlpamayoYESInference onlyYESYESVLA reasoning + trajectory prediction

Paper-Only or Broken/Incomplete Code

RepoStatusNotes
DrivingGPTNO CODE RELEASEDProject page only, no GitHub repo
Copilot4DNO CODE RELEASEDWaabi proprietary, paper only
WorldDreamerPLACEHOLDER REPORepo initialized but no code
Drive-WMINCOMPLETEWeights "coming soon" since 2023, training code missing
Think2DriveNO CODEOnly landing page; Bench2Drive has student models only
GenADPAPER REPODriveAGI repo is paper/dataset hub, not runnable GenAD code
DriveDreamerPARTIALCode structure exists but sparse docs, unclear weight availability
DriveDreamer-2PARTIALInference code + weights released Dec 2024, no training code
DrivingWorldINFERENCE ONLYInference code released, training code not available

Tier 1: Production-Ready Research Tools

These repos have working training code, pretrained weights, clear documentation, and active maintenance.

1. NVIDIA Cosmos (Predict 2 / 2.5)

FieldDetails
GitHub URLhttps://github.com/nvidia-cosmos/cosmos-predict2.5 (latest) / https://github.com/nvidia-cosmos/cosmos-predict2 (archived)
Stars~971 (Predict2.5) / ~756 (Predict2)
Last Commit2026-02-24 (Predict2.5)
LicenseApache 2.0 (code); NVIDIA Open Model License (weights)
DependenciesPython 3.10, PyTorch 2.6.0, CUDA 12.6, Ampere+ GPUs (RTX 30xx/A100+)
Supported DatasetsGeneral video, AV-specific multiview post-training, robotics datasets (Bridge, RoboCasa, Libero)
Pretrained WeightsYES - Extensive: 0.6B, 2B, 14B parameter variants on HuggingFace, including AV-specific and action-conditioned versions
Ease of Setup4/5 - pip install via PyPI, Docker support, conda/uv environments
Documentation5/5 - Comprehensive: setup, inference, post-training, distillation, troubleshooting, Cosmos Cookbook
Known IssuesPredict2 archived Dec 2025; must use Predict2.5. Large model sizes require significant VRAM.

Verdict: The most comprehensive open-source world foundation model. Cosmos-Predict2.5 unifies Text2World, Image2World, and Video2World. Has AV-specific post-trained variants. Best starting point for fine-tuning a general world model to driving domains. LoRA post-training and DMD2 distillation supported.


2. OpenDWM (SenseTime)

FieldDetails
GitHub URLhttps://github.com/SenseTime-FVG/OpenDWM
Stars382
Last Commit2025-01-15
LicenseMIT
DependenciesPython >=3.9, PyTorch >=2.5 (tested 2.5.1), Git >=2.25, CUDA-capable GPU
Supported DatasetsnuScenes, Waymo, Argoverse, KITTI-360, CARLA
Pretrained WeightsYES - Extensive model zoo: SD 2.1/3.0/3.5 variants, LiDAR VQVAE/VAE/MaskGIT/DiT models, CogVideoX VAE
Ease of Setup4/5 - Standard pip install, git submodules, clear requirements.txt
Documentation4/5 - Comprehensive guides for datasets, training, evaluation (FID/FVD), interactive generation
Known Issues38 open / 0 closed issues - mostly config questions and dataset generation scripts. 32GB GPU minimum for short video; 80GB for long sequences.

Verdict: Best open-source toolkit specifically designed for driving world models. Supports the widest range of driving datasets. MIT license is ideal. Active development with CogVideoX VAE integration (May 2025). The multi-dataset support (nuScenes + Waymo + Argoverse + KITTI-360) is unmatched. Highly recommended for driving video generation research.


3. CarDreamer

FieldDetails
GitHub URLhttps://github.com/ucd-dare/CarDreamer
Stars326
Last CommitActive (2024-2025)
LicenseMIT (per repo structure)
DependenciesPython 3.10, CARLA 0.9.15, DreamerV2/V3 (separate install), 10-20GB GPU VRAM
Supported DatasetsCARLA simulator (real-time generation, not static datasets)
Pretrained WeightsYES - All task checkpoints on HuggingFace (ucd-dare/CarDreamer)
Ease of Setup3/5 - Requires CARLA installation which is finicky; uses Flit package manager
Documentation5/5 - Full ReadTheDocs, API docs, customization guides, web-based visualization
Known IssuesOnly 2 open issues. CARLA dependency is the main friction point.

Verdict: Best platform for RL-based world model driving research. First open-source platform specifically for WM-based autonomous driving with CARLA integration. Built-in DreamerV2/V3 support. Multi-modal observations (BEV, camera, LiDAR). The agent learns to drive from scratch in a "dream world." Excellent for sim-to-real research pipelines.


4. DIAMOND

FieldDetails
GitHub URLhttps://github.com/eloialonso/diamond
Stars2,000
Last Commit2024-10-13
LicenseMIT
DependenciesPython 3.10, PyTorch (via requirements.txt), k-diffusion, HuggingFace Hub
Supported DatasetsAtari 100k (26 games), CS:GO (separate branch)
Pretrained WeightsYES - via HuggingFace Hub, one-command download
Ease of Setup5/5 - conda create + pip install, single command to play pretrained models
Documentation5/5 - Quick start, Hydra config guide, visualization controls, Discord community
Known IssuesOnly 5 open issues. Not driving-specific but architecture is transferable.

Verdict: Not a driving world model per se, but the highest-quality diffusion world model codebase available. NeurIPS 2024 Spotlight. The architecture (2D U-Net with action conditioning) is directly adaptable to driving. Cleanest codebase of all repos evaluated. If you want to build a custom diffusion world model for driving, start from DIAMOND's architecture.


5. DiffusionDrive

FieldDetails
GitHub URLhttps://github.com/hustvl/DiffusionDrive
Stars1,300
Last Commit2025 (active)
LicenseMIT
DependenciesPyTorch (standard DL stack), environment.yml + requirements.txt provided
Supported DatasetsNAVSIM (primary), nuScenes (separate branch nusc)
Pretrained WeightsYES - ResNet-34 (60M params, 88.1 PDMS) and ResNet-50 variants on HuggingFace
Ease of Setup3/5 - Requires NAVSIM environment setup which adds complexity
Documentation4/5 - install.md, train_eval.md, qualitative results, video demos
Known Issues28 open / 0 closed issues. PyTorch compatibility issues (torch.xpu), assertion errors, some users report config confusion with NAVSIM.

Verdict: State-of-the-art for real-time end-to-end planning with diffusion. 88.1 PDMS on NAVSIM at 45 FPS. CVPR 2025 Highlight. Not a "world model" in the generative sense but uses truncated diffusion for planning. DiffusionDriveV2 achieves 91.2 PDMS. Excellent for planning research, less relevant for future prediction/simulation.


6. Epona

FieldDetails
GitHub URLhttps://github.com/Kevin-thu/Epona
Stars312
Last Commit2025-06-26
LicenseMIT
DependenciesPython 3.10, PyTorch >=2.1.0+cu121, TorchVision >=0.16.0+cu121, CUDA 12.1
Supported DatasetsNuPlan (primary), NuScenes
Pretrained WeightsYES - World models for NuPlan and NuScenes + finetuned temporal-aware DCAE autoencoder on HuggingFace
Ease of Setup4/5 - Standard conda/pip, inference runs on single RTX 4090
Documentation4/5 - Installation, inference scripts, data preparation README, DeepSpeed training docs
Known IssuesOnly 3 open issues (nuScenes preprocessing, trajectory quality, NAVSIM integration).

Verdict: Strong recent entry (ICCV 2025). Generates consistent minutes-long driving videos at high resolution. Doubles as a motion planner (outperforms end-to-end planners on NAVSIM). Runs inference on consumer GPU (4090). Well-suited for both world modeling and planning research. MIT license.


Tier 2: Usable with Effort

These repos have working code but require more setup effort, have documentation gaps, or limited maintenance.

7. MagicDrive (v1 / v2)

FieldDetails
GitHub URLhttps://github.com/cure-lab/MagicDrive (v1) / https://github.com/flymin/MagicDrive-V2 (v2)
Stars1,200 (v1) / 702 (v2)
Last Commit2023-09-07 (v1) / 2025-06-26 (v2)
LicenseApache-2.0 (v1) / AGPL-3.0 (v2)
Dependenciesv1: PyTorch 1.10.2, CUDA 10.2, diffusers 0.17.1, xformers 0.0.19 / v2: PyTorch 2.4.0, torchvision 0.19.0, xformers, flash-attn, ColossalAI
mmdet3dNot required directly
Supported DatasetsnuScenes (primary), Waymo (v2, partial)
Pretrained WeightsYES - v1: 3 resolution variants (OneDrive + HuggingFace) / v2: Stage-3 checkpoint on HuggingFace
Ease of Setup3/5 (v1) / 2/5 (v2 - requires ColossalAI custom builds, multi-node setup)
Documentation4/5 (v1 - includes FAQ, GUI) / 3/5 (v2 - multi-platform guides but complex)
Known Issuesv1: 8 open issues, dead dataset links (CUHK storage expired), broken external resource links. v2: stages 1-2 weights not yet released. AGPL-3.0 license on v2 is restrictive.

Verdict: MagicDrive v1 is a solid street-view generation framework with diverse 3D geometry controls (camera poses, road maps, 3D bounding boxes). Good for data augmentation. v2 adds high-res long video but uses AGPL-3.0 (copyleft) and requires multi-node training infrastructure. v1 is more practical for most research groups.


8. Vista (OpenDriveLab)

FieldDetails
GitHub URLhttps://github.com/OpenDriveLab/Vista
Stars862
Last Commit2024-05-28
LicenseApache-2.0
DependenciesPython 3.9, PyTorch 2.0.1, CUDA 11.7, torchvision 0.15.2, Stability AI generative-models base
Supported DatasetsOpenDV (1700+ hours driving video), nuScenes
Pretrained WeightsYES - vista.safetensors on HuggingFace and Google Drive (note: EMA weight merging error in early release, use latest)
Ease of Setup3/5 - Standard conda setup but built on Stability AI codebase which adds complexity
Documentation3/5 - INSTALL.md, TRAINING.md, SAMPLING.md exist but sparse. TODO items remain (memory-efficient training, online demo).
Known Issues21 open issues, 2 pending PRs. EMA weight merging error in initial release. No updates since May 2024 - appears abandoned.

Verdict: NeurIPS 2024 paper with strong benchmark results (55% better FID, 27% better FVD than prior SOTA). Supports multi-modal action conditioning (steering, speed, commands, trajectories, goal points). Provides reward function for different actions. However, no updates in nearly 2 years suggests project is inactive. Usable but expect to self-maintain.


9. OccWorld

FieldDetails
GitHub URLhttps://github.com/wzzheng/OccWorld
Stars527
Last Commit2023-11-23
LicenseApache-2.0
DependenciesPython 3.8, mmdetection3d (full mmdet3d stack), CUDA-capable GPU (RTX 4090 24GB recommended)
Supported DatasetsnuScenes, Occ3D (semantic occupancy annotations)
Pretrained WeightsYES - Tsinghua cloud storage link
Ease of Setup2/5 - mmdet3d dependency chain is notoriously painful. Environment.yaml provided but mmdet3d version conflicts are common.
Documentation3/5 - Structured README with install/train/eval/vis sections, but gaps in custom dataset guidance
Known Issues23 open / 0 closed issues (none resolved). NaN during training, missing PKL files, unclear VQ-VAE selection, camera projection questions. Maintainers appear unresponsive.

Verdict: ECCV 2024 paper. Unique approach: models 3D occupancy evolution over time. Relevant for spatial understanding in driving. However, the mmdet3d dependency makes setup painful, maintainers have not closed a single issue, and last commit was Nov 2023. Only use if you specifically need occupancy-based world modeling and are comfortable with mmdet3d.


10. Panacea / Panacea+

FieldDetails
GitHub URLhttps://github.com/wenyuqing/panacea
Stars254
Last Commit2023-12-06
LicenseApache-2.0
DependenciesPyTorch (version unspecified), Stability-AI generative models, ControlNet, StreamPETR
Supported DatasetsnuScenes, Gen-nuScenes (self-generated synthetic data)
Pretrained WeightsYES - panaceaplus_40k_deepspeed.ckpt on HuggingFace
Ease of Setup3/5 - Multi-GPU (8 GPU) inference pipeline, DeepSpeed required
Documentation3/5 - Environment setup and data prep docs exist, but inference requires 8-GPU distributed setup
Known Issues7 open issues. Requires significant compute (8 GPUs for inference). Last commit Dec 2023.

Verdict: CVPR 2024. Panoramic multi-view video generation with weather/time/scene control. Good for data augmentation and rare scenario generation (rain, snow). Panacea+ (Aug 2024) improved performance. However, 8-GPU inference requirement limits accessibility. StreamPETR integration enables downstream perception evaluation.


11. NVIDIA Alpamayo + AlpaSim

FieldDetails
GitHub URLhttps://github.com/NVlabs/alpamayo (model) / https://github.com/NVlabs/alpasim (simulator)
Stars1,600 (Alpamayo) / 910 (AlpaSim)
Last Commit2025-11-19 (Alpamayo) / ~2025-10 (AlpaSim)
LicenseApache 2.0 (inference code); Non-commercial (model weights)
DependenciesPython 3.12, PyTorch (via pyproject.toml), NVIDIA GPU >= 24GB VRAM, Flash Attention 2
Supported DatasetsPhysical AI AV Dataset (1700+ hours, gated access on HuggingFace)
Pretrained WeightsYES - 10B parameter model (22GB weights) on HuggingFace (gated)
Ease of Setup3/5 - Standard Python setup but gated access, large model, 24GB VRAM minimum
Documentation4/5 - Comprehensive README, FAQ, troubleshooting, Jupyter notebook for interactive inference
Known IssuesNon-commercial license on weights limits deployment. No RL post-training. No navigation/route conditioning. Research-only, explicitly not for production.

Verdict: Not a world model per se -- it is a Vision-Language-Action (VLA) model with Chain-of-Causation reasoning for trajectory prediction. Alpamayo 1.5 released with improvements. AlpaSim provides closed-loop simulation for testing. The non-commercial weight license is a significant limitation. Best for understanding NVIDIA's approach to reasoning-based AV development.


12. comma.ai commaVQ / World Model

FieldDetails
GitHub URLhttps://github.com/commaai/commavq (world model dataset + code) / https://github.com/commaai/openpilot (production system)
Stars358 (commaVQ) / 52,000+ (openpilot)
Last Commit2026-03-22 (commaVQ, actively maintained)
LicenseMIT (commaVQ)
Dependenciesnumpy, HuggingFace datasets, PyTorch
Supported Datasets100,000 minutes of compressed driving video (VQ-VAE tokenized), 3M minutes used for GPT training
Pretrained WeightsYES - VQ-VAE encoder/decoder + GPT world model weights included
Ease of Setup4/5 - Lightweight, Jupyter notebooks, dataset via HuggingFace
Documentation4/5 - Clear README, notebooks for encoding/decoding/prediction, dataset loading examples
Known IssuesTraining code is limited (focus on inference/dataset). The production openpilot world model training pipeline is NOT open-source. Model architecture details for the production system are partially documented but training is proprietary.

Verdict: Unique position -- comma.ai is the only company shipping a world-model-trained driving system to real cars (openpilot 0.10+, 2025). commaVQ provides the tokenized dataset and a basic GPT world model for research. The gap: production training pipeline is closed-source. Still valuable for experimenting with tokenized driving video prediction at scale. MIT license.


Tier 3: Partial Code / Inference Only

13. DrivingWorld

FieldDetails
GitHub URLhttps://github.com/YvanYin/DrivingWorld
Stars238
Last Commit2024-12-25
LicenseMIT
Pretrained WeightsYES - Video VQVAE + World Model on HuggingFace
Code StatusInference code released. Evaluation code available. Training code NOT released.
Documentation3/5 - Installation (3 steps), data prep, multiple demo scripts
Known IssuesNo training code. 5 open TODO items including missing HuggingFace demos.

Verdict: Generates 40s+ driving videos via Video GPT. Inference-only limits research utility. MIT license is good. Wait for training code release or use for evaluation/benchmarking only.


14. DriveDreamer

FieldDetails
GitHub URLhttps://github.com/JeffWang987/DriveDreamer
Stars551
Last Commit2023-09-17
LicenseApache-2.0
Supported DatasetsnuScenes
Code StatusCode structure released (dreamer-datasets, dreamer-models, dreamer-train directories). Research code released Nov 2024 per announcement, but repo last updated Sep 2023. Unclear weight availability.
Documentation2/5 - Getting Started docs exist but completeness uncertain
Known Issues24 open issues. Significant gap between announcement and actual code quality. Future updates redirected to "GigaAI-research."

Verdict: ECCV 2024. Historically important as first real-world-driven world model for driving. However, code quality and completeness are questionable. Development moved to GigaAI-research org. Not recommended for new research projects.


15. DriveDreamer-2

FieldDetails
GitHub URLhttps://github.com/f1yfisher/DriveDreamer2
Stars242
Last Commit2024-03-11
LicenseApache-2.0
Supported DatasetsnuScenes
Pretrained WeightsYES - Available via Baidu Cloud (password: dkjq)
Code StatusInference code + weights released Dec 2024. Full training code availability uncertain.
Documentation3/5 - Install, dataset prep, train/test/vis docs provided
Known Issues24 open issues. Weights only on Baidu Cloud (slow/inaccessible outside China). Delayed code release.

Verdict: AAAI 2025. LLM-enhanced trajectory generation for diverse driving videos. Inference works but training reproducibility is uncertain. Baidu Cloud for weights is a friction point for international researchers.


Tier 4: Paper-Only / Broken Code

16. DrivingGPT

FieldDetails
Project Pagehttps://rogerchern.github.io/DrivingGPT/
GitHub URLNo public repository found
StatusPAPER ONLY - No code released

Verdict: Promising approach (unified world modeling + planning via autoregressive transformers, strong nuPlan/NAVSIM results) but no code available. Cannot be used for research.


17. Copilot4D (Waabi)

FieldDetails
Project Pagehttps://waabi.ai/copilot-4d/
GitHub URLNo public repository
StatusPROPRIETARY - No code released

Verdict: ICLR 2024. Impressive point cloud forecasting via discrete diffusion (65%+ Chamfer distance improvement). But Waabi is a commercial company and has not released code or weights. Completely unusable for open research.


18. WorldDreamer

FieldDetails
GitHub URLhttps://github.com/JeffWang987/WorldDreamer
Stars201
Last Commit2024-01-18
LicenseMIT
StatusPLACEHOLDER - Only README.md and LICENSE files. No code.
Known Issues4 open issues (likely requesting code release)

Verdict: General world model for video generation (not driving-specific). Repo is a placeholder with zero implementation code. Do not use.


19. Drive-WM

FieldDetails
GitHub URLhttps://github.com/BraveGroup/Drive-WM
Stars415
Last Commit2023-11-18
LicenseApache-2.0
StatusINCOMPLETE - Weights marked "coming soon" since Nov 2023 (over 2 years). Training code missing.
Known Issues8 open issues. All core deliverables (conditional image/video generation weights, action-conditioned weights, training code) remain undelivered.

Verdict: CVPR 2024 paper with promising multiview forecasting + planning approach. But the repo is effectively abandoned -- "coming soon" for 2+ years. Based on diffusers library. Do not depend on this.


20. Think2Drive

FieldDetails
GitHub URLhttps://github.com/Thinklab-SJTU/CornerCaseRepo (landing page only)
Related Reposhttps://github.com/Thinklab-SJTU/Bench2Drive (benchmark, 1.8k stars) / https://github.com/Thinklab-SJTU/Bench2DriveZoo (student models, 369 stars)
StatusNO WORLD MODEL CODE - CornerCaseRepo is just HTML. Bench2DriveZoo has student models trained BY Think2Drive but not the Think2Drive world model itself.

Verdict: Think2Drive is historically important (first model-based RL for driving, DreamerV3-based, expert-level CARLA v2 performance in 3 days on 1 A6000). But the actual world model code is not released. Bench2Drive/Bench2DriveZoo are useful benchmarking tools but do not contain the world model. If you want to reproduce Think2Drive, use CarDreamer instead (which provides the DreamerV3 backbone + CARLA integration).


21. GenAD (OpenDriveLab)

FieldDetails
GitHub URLhttps://github.com/OpenDriveLab/DriveAGI
Stars791
Last CommitActive (dataset tools)
StatusPAPER + DATASET REPO - GenAD model code is not in this repo. Contains OpenDV-YouTube dataset tools and paper references.

Verdict: CVPR 2024 Highlight. GenAD's actual model implementation is not publicly available. The DriveAGI repo is a hub for OpenDriveLab's driving foundation model papers and the OpenDV dataset. Use Vista instead if you want OpenDriveLab's actual runnable world model code.


Dependency Compatibility Matrix

RepoPyTorchCUDAPythonmmdet3dMin VRAMMulti-GPU Required
Cosmos 2.52.6.012.63.10NoVaries by model sizeNo (inference)
OpenDWM>=2.5Any>=3.9No32GB (short) / 80GB (long)Recommended
CarDreamerFlexibleAny3.10No10-20GBNo
DIAMONDRecentAny3.10No8GB+No
DiffusionDriveRecentAny3.8+No16GB+No
Epona>=2.1.012.13.10NoSingle 4090 (24GB)No (inference)
MagicDrive v11.10.210.23.8+No (uses bevfusion)32GB (V100)Yes (8x for training)
MagicDrive v22.4.0Recent3.10+No80GB+Yes (multi-node)
Vista2.0.111.73.9No80GB (A100)Recommended
OccWorld<2.010.x-11.x3.8YES24GB (4090)No
PanaceaUnspecifiedUnspecified3.8+No8x GPU for inferenceYes
AlpamayoRecentRecent3.12No24GBNo
commaVQAnyOptional3.8+NoMinimalNo

Quick Decision Guide

"I want to fine-tune a general world model for driving" --> NVIDIA Cosmos Predict 2.5. Largest pretrained model, best documentation, active support. Has AV-specific variants already.

"I want to generate realistic multi-view driving videos" --> OpenDWM. Widest dataset support, MIT license, active development, full training pipeline.

"I want to train an RL agent using a world model in simulation" --> CarDreamer (CARLA + DreamerV3) for driving-specific. DIAMOND for general diffusion world model RL.

"I want a world model that also does planning" --> Epona (autoregressive diffusion, ICCV 2025) or DiffusionDrive (truncated diffusion, CVPR 2025).

"I want 3D occupancy-based world modeling" --> OccWorld. Only option, but expect mmdet3d pain and unresponsive maintainers.

"I want street-view generation with 3D control for data augmentation" --> MagicDrive v1 (Apache-2.0) or Panacea (panoramic multi-view, weather control).

"I want to understand how a production world model works" --> comma.ai commaVQ + openpilot. Only shipping system. Dataset + basic model open, production training closed.

"I need a VLA reasoning model for trajectory prediction" --> NVIDIA Alpamayo 1.5 + AlpaSim. Non-commercial weights limit deployment.


Summary Statistics

CategoryCountRepos
Fully usable today6Cosmos, OpenDWM, CarDreamer, DIAMOND, DiffusionDrive, Epona
Usable with effort6MagicDrive, Vista, OccWorld, Panacea, Alpamayo, commaVQ
Inference only3DrivingWorld, DriveDreamer-2, DriveDreamer
Paper-only / broken6DrivingGPT, Copilot4D, WorldDreamer, Drive-WM, Think2Drive, GenAD

Top 3 Recommendations for Airside AV Research:

  1. NVIDIA Cosmos Predict 2.5 -- Fine-tune the foundation model on airport airside data. Best generalization, largest pretrained model, LoRA support for efficient adaptation.

  2. OpenDWM -- If you need multi-view driving video generation with layout control, this is the most complete toolkit. MIT license. Supports custom data pipelines.

  3. CarDreamer + DreamerV3 -- If your goal is training an RL policy using a learned world model in simulation, this is the most practical path. Directly integrates with CARLA for closed-loop training.

Public research notes collected from public sources.