Open-Source AV Ecosystem: Stacks, Perception, Planning, and Tools

Status: Research complete | Last updated: 2026-03-21

Autoware.Universe
comma.ai openpilot
NVIDIA DriveWorks / DRIVE SDK
Perception Frameworks
Planning Frameworks
Labeling and Visualization Tools
Simulation Platforms
Airside Relevance Summary

1. Autoware.Universe

1.1 Overview and Deployment Status

Autoware is the world's leading open-source autonomous driving software project, celebrating its 10th anniversary in August 2025. It powers deployments across 500+ companies, 30+ vehicle types, and 20+ countries. Real-world use cases span autonomous shuttles/buses, robotaxis, low-speed vehicles, racing platforms, and private vehicle automation.

A landmark 2025-2026 deployment: TIER IV and Isuzu Motors announced Level 4 autonomous driving for Isuzu ERGA electric and diesel bus models using the NVIDIA DRIVE Hyperion computing platform with Autoware's open-source stack. This represents expansion into heavy-duty commercial vehicles with functional safety rigor.

1.2 ROS 2 Architecture

Autoware is built on Service-Oriented Architecture (SOA) principles implemented in ROS 2. It leverages ROS 2's DDS-based QoS model for reliable communication, structured lifecycle management, and fault containment across distributed compute nodes.

Communication patterns:

Topic-based (pub/sub): Asynchronous many-to-many messaging for sensor streams and perception outputs
Service-based: Synchronous request-response for configuration and queries
Action-based: Goal-oriented interfaces for long-running planning/control tasks

Message architecture uses three standardized packages:

autoware_msgs -- common cross-domain messages
autoware_adapi_msgs -- Automated Driving API contracts
autoware_internal_msgs -- internal component-specific protocols

Agnocast middleware (2025): TIER IV developed Agnocast, a true zero-copy IPC middleware that coexists with ROS 2. Unlike IceOryx, it supports zero-copy for all message types including unsized types (those using std::vector). Results: 16% improvement in average response time, 25% improvement in worst-case response time for PointCloud preprocessing, plus significant memory and CPU reductions. Paper accepted at ISORC 2025 and production version integrated into Autoware.

1.3 Module Architecture

The system is organized into six functional domains with clean interfaces:

Sensing

LiDAR preprocessing and filtering (distortion correction, synchronization)
Radar object adaptation
IMU correction
Camera drivers and preprocessing

Localization

NDT Matching -- point cloud registration against HD maps
EKF Localizer -- Extended Kalman Filter fusion of multiple modalities
Eagleye -- GNSS/RTK-based positioning
YabLoc -- vision-based localization
Landmark-based -- AR tags, LiDAR markers
Pose estimation and error monitoring

Perception

3D Object Detection: BEVFusion, CenterPoint, FRNet, TransFusion
Multi-Object Tracking: ByteTrack
Traffic Light: classification and detection
Ground Segmentation: RANSAC, Ray, scan filters
Occupancy Grid Mapping
Clustering and merging utilities

Planning (hierarchical, modular)

Mission Planning: route computation between waypoints
Behavior Planning: lane changes, merges, intersections, avoidance, goal planning
Velocity Planning: intersection, crosswalk, traffic light, stop line handling
Motion Planning: trajectory optimization, sampling-based planning, velocity limiting
Parking: freespace planning with RRT* algorithm
Scenario Planning: context-aware decision selection

Control

MPC (Model Predictive Control) lateral controller
PID longitudinal controller
Pure Pursuit trajectory follower
Emergency braking and collision detection
Vehicle Command Gate -- safety validation layer

Vehicle Interface

Converts control signals for specific vehicle types via CAN bus

1.4 BEV Integration (Vision-First Perception)

MulticoreWare partnered with the Autoware Foundation to integrate two BEV models into the perception stack, signaling a shift from LiDAR-heavy to scalable vision-based perception:

BEVDet:

Implements Lift-Splat-Shoot: lifts image features into 3D, splats into unified BEV grid, detects in top-down space
Camera-only operation, no LiDAR dependency
ROS 2 native integration, TensorRT 10.x optimization with FP16 mixed-precision

BEVFormer:

Adds temporal reasoning across multiple frames via spatial cross-attention and temporal self-attention
Completely reimplemented in C++ for ROS 2 using ONNX-to-TensorRT workflows
RViz visualization with real-time 3D bounding box rendering

1.5 End-to-End AI and World Model Integration Potential

Autoware is actively exploring end-to-end AI integration:

Autoware.Flex (2024 paper): incorporates human natural-language instructions into ADS decision-making using LLMs. Validated in both simulation and real-world vehicles.
AMD Silo AI collaboration: accelerating E2E model training and deployment using AMD Instinct/Radeon Pro GPUs.
BrightDrive validation: testing E2E models (AutoSpeed, SceneSeg, DomainSeg) under real-world operational constraints, while Autoware serves as a fallback stack and benchmarking environment.

World model integration potential for airside: Autoware's modular architecture with standardized interfaces makes it straightforward to swap perception or planning modules with world-model-based alternatives. The BEV integration work demonstrates the pathway: implement as a ROS 2 node with TensorRT optimization, publish to standard autoware_msgs topics.

2. comma.ai openpilot

2.1 End-to-End Architecture

openpilot is an open-source operating system for robotics that currently upgrades ADAS on 300+ supported car models. It takes a fundamentally different approach from Autoware: rather than a modular pipeline, it uses end-to-end learned driving policies that directly map sensor inputs to steering and acceleration commands.

2.2 World Model Architecture

comma.ai's world model work, documented in their CVPR 2025 Workshop paper "Learning to Drive from a World Model," represents one of the most advanced deployed world models in autonomous driving.

DiT (Diffusion Transformer) configurations trained:

Config	Parameters	Notes
gpt	250M	Small
gpt-medium	500M	Primary experiment model
gpt-large	1B	Larger scale

openpilot 0.11 world model (2B params, released March 2026):

Architecture: n_layer=48, n_head=25, n_embd=1600
Input: 2 seconds past context, 1 second future conditioning, 0-7 seconds simulation window at 5 fps
Training data: 2.5M minutes of driving video
Sampling: 15 Euler steps with Classifier-Free Guidance (strength 2.0)
Throughput: 12.2 frames/sec/GPU

Frame compression (VAE):

Encoder ViT: 50M params
Decoder ViT: 100M params
Latent space: 32x16x32 (from 3x128x256 dual camera input)
Compression: 8x8 spatial with 4 latent channels per image
Training losses: LPIPS + adversarial + least squares error
Masked Auto Encoder formulation achieves 2.7x better performance than baseline

2.3 Training from Real Data

Two simulation approaches for on-policy training:

Reprojective Simulation:

Uses dense depth maps and 6-DOF pose to render new views by reprojecting 3D points
Ships in openpilot since 0.8.15 (lateral) and 0.9.0 (longitudinal)
Limitations: static scene assumption, depth errors, occlusion artifacts, lighting artifacts (especially at night), range limited to <4m translation

World Model Simulation:

Completely end-to-end, general-purpose method that scales with compute
Overcomes reprojective simulation limitations
Future State Anchoring: provides future states at fixed timesteps so the model can recover from errors and converge predictions toward goal states
Uses IMPALA-style distributed, asynchronous rollout data collection

Training data scales tested: 100k, 200k, and 400k segments (each 1 minute of driving at 5 Hz)

2.4 CVPR 2025 Results and Real-World Deployment

The CVPR 2025 Workshop paper (arXiv 2504.19077) by Mitchell Goff, Greg Hogan, George Hotz et al. demonstrates:

Both on-policy learning methods (reprojective and world model) successfully deployed in real-world openpilot
Field deployment across 500 users over two months showed:
- 29.92% engagement time and 52.49% engagement distance for world-model-trained policy

openpilot 0.11 (March 2026) -- first model fully trained in learned simulation:

First driving model trained using both videos AND plans from world model (0.10 used world model for plans only, reprojective sim for videos)
Users on nightly now prefer Experimental mode (E2E longitudinal) over ACC policy
Improved highway speed convergence (a persistent issue with previous simulator-trained models)
Noticeably improved reactivity around parked cars
Hardware: comma four idle power reduced 77% (225mW to 52mW)

2.5 What to Borrow for Airside

On-policy training methodology -- training in learned simulation produces policies that handle real-world distribution shift better than off-policy approaches
Future state anchoring -- technique for overcoming off-policy training limitations, directly applicable to world models for airside navigation
VAE + DiT architecture -- the frame compression + diffusion transformer combo achieves good quality at practical inference speeds
Data pipeline design -- 2.5M minutes of driving data processed into on-policy training; similar scale achievable from airport fleet data
Incremental deployment -- started with reprojective sim (simpler, less compute), graduated to full world model sim; a pragmatic adoption path
IMPALA-style distributed training -- asynchronous rollout collection scales well for fleet-based data collection

3. NVIDIA DriveWorks / DRIVE SDK

3.1 DriveWorks SDK

NVIDIA DriveWorks SDK is middleware for autonomous vehicle software development, part of the larger NVIDIA DRIVE platform. It provides:

Sensor Abstraction Layer (SAL): unified interface for cameras, LiDARs, radars, GPS, IMU
Sensor Plugins: extensible sensor integration
DNN Framework: infrastructure to optimize pre-trained DNNs with TensorRT, prepare input data, perform inference, and post-process results
Data Recorder: for sensor data logging
Vehicle I/O Support: CAN bus and vehicle interface
Calibration Tools: multi-sensor extrinsic/intrinsic calibration

The DNN module specifically: takes pre-trained networks, optimizes with TensorRT (FP32, FP16, INT8), handles input preparation and output post-processing. The SDK abstracts low-level GPU programming so developers focus on algorithms.

Design philosophy: open and modular -- developers pick and choose components.

3.2 NVIDIA DRIVE AGX Thor

The latest NVIDIA DRIVE platform hardware, featuring:

Next-generation SoC for autonomous vehicles
Supports the full DRIVE software stack
TIER IV's Autoware-based solutions run on DRIVE Hyperion computing platform

3.3 Cosmos World Foundation Models

NVIDIA Cosmos is a platform of generative world foundation models for physical AI development:

Model families:

Cosmos Predict: generates future world states from sensor context
Cosmos Transfer 2.5: high-fidelity world-to-world style transfer (relight, change weather/lighting/terrain conditions across multiple cameras)
Cosmos Reason 2: reasoning model for physical AI, foundation for Alpamayo

Timeline:

January 2025: initial Cosmos launch at CES
March 2025: major release with reasoning capabilities
December 2025: Cosmos-Predict2.5-2B with Diffusers support, Image2Image for Transfer 2.5
February 2026: enhanced compute support and quantization for Cosmos Reason 2

Adopters: 1X, Agility Robotics, Figure AI, Foretellix, Skild AI, Uber, Nexar, Oxa

Integration with CARLA: the latest CARLA release includes Cosmos Transfer APIs for generating scene variants with diverse weather/lighting conditions, plus 40,000 clips generated through Cosmos.

3.4 Alpamayo VLA Model

Alpamayo is NVIDIA's Vision-Language-Action model for autonomous driving:

Architecture:

8.2B-parameter Cosmos Reason backbone + 2.3B-parameter action expert = 10B total parameters
Diffusion-based trajectory decoder
Chain-of-Causation reasoning with trajectory planning

Inputs:

Multi-camera images (4 cameras: front-wide, front-tele, cross-left, cross-right)
0.4 second history window at 10Hz
3D translation and 9D rotation egomotion history
Navigation context and text prompts

Training data:

NVIDIA Physical AI Open Dataset: 1,727 hours of driving data (100 TB)
360-degree coverage from 7 cameras, LiDAR, up to 10 radars
25 countries

Capabilities:

Generates human-readable reasoning traces explaining driving decisions
Functions as a teacher model for distillation into smaller, deployable models (not intended to run directly in-vehicle)
Supports fine-tuning and post-training with custom data

AlpaSim: Python-based closed-loop simulation testbed with microservice-based architecture, modular APIs, and pipeline parallelism for evaluation

Requirements: minimum 24 GB VRAM, optimized for NVIDIA GPUs

Resources: code on GitHub (NVlabs/alpamayo), weights on HuggingFace, post-training scripts, inference notebooks

3.5 Relevance for Airside

Cosmos Transfer: generate synthetic airside training data with varied weather/lighting from limited real captures
Alpamayo as teacher: distill airport-specific driving behaviors from fine-tuned Alpamayo into smaller models that run on edge compute
DriveWorks DNN inference: TensorRT optimization pipeline directly applicable for deploying custom perception models on NVIDIA hardware
NuRec: reconstruct 3D scenes from fleet sensor data for digital twin simulation

4. Perception Frameworks

4.1 MMDetection3D

Repository: open-mmlab/mmdetection3dLatest version: v1.4.0 (August 2024) License: Apache 2.0

Supported models (partial list):

Category	Models
LiDAR Outdoor	SECOND, PointPillars, SSN, 3DSSD, SA-SSD, PointRCNN, Part-A2, CenterPoint, PV-RCNN, CenterFormer
LiDAR Indoor	VoteNet, H3DNet, Group-Free-3D, FCAF3D, TR3D
Camera Outdoor	ImVoxelNet, SMOKE, FCOS3D, PGD, MonoFlex, DETR3D, PETR
Multi-Modal	MVXNet, BEVFusion, ImVoteNet
3D Segmentation	MinkUNet, SPVCNN, Cylinder3D, TPVFormer, PointNet++, PAConv, DGCNN

Supported datasets: KITTI, Waymo, nuScenes, Lyft (outdoor); ScanNet, SUN RGB-D (indoor); SemanticKITTI, S3DIS (segmentation)

Key features:

300+ models and methods from 40+ papers via MMDetection integration
Modular design: encoders, backbones, necks, and heads can be configured independently
Multi-modal detection combining LiDAR + camera
Custom dataset support via KITTI-format conversion or native data pipeline customization
Dataset wrappers: RepeatDataset, ClassBalancedDataset, ConcatDataset for training distribution control

Custom dataset training:

Convert data to KITTI format (easiest path) with ImageSets, calib, image_2, velodyne, label_2 directories
Or use native data pipeline customization with custom dataset class
Support for custom class definitions in config files

Custom LiDAR support: any LiDAR producing point clouds can be used -- convert to KITTI bin format (x,y,z,intensity per point) or define a custom data loader. Point cloud coordinate system is unified: (x, y, z, dx, dy, dz, heading) for 3D boxes.

4.2 OpenPCDet

Repository: open-mmlab/OpenPCDetLicense: Apache 2.0

Supported models:

SECOND, PointPillars, PointRCNN, Part-A2, PV-RCNN, PV-RCNN++
CenterPoint, Voxel R-CNN, VoxelNeXt
DSVT (state-of-the-art on Waymo, real-time at 27Hz with TensorRT)
TransFusion-Lidar (69.43% NDS on nuScenes validation)
BEVFusion (70.98% NDS on nuScenes validation)

Key differentiator: data-model separation with unified point cloud coordinate system for easy custom dataset extension. Uses (x, y, z, dx, dy, dz, heading) box definition.

Custom dataset support:

Dedicated custom_dataset.py and custom_dataset.yaml template
Official CUSTOM_DATASET_TUTORIAL.md with step-by-step guide
Shared memory support (USE_SHARED_MEMORY) for training IO optimization

Pre-trained models: available in model zoo for KITTI, Waymo, nuScenes datasets

4.3 Det3D

Repository: V2AI/Det3DLicense: Apache 2.0

Supported models: PointPillars, SECOND, PIXOR (with VoteNet and STD planned) Supported datasets: KITTI, nuScenes, Waymo

Status: Det3D was historically the first general-purpose 3D detection codebase but has seen less active development compared to MMDetection3D and OpenPCDet. Its Multi-Group Head implementation for SECOND differs from other codebases, making direct speed comparisons incompatible.

4.4 Framework Comparison

Feature	MMDetection3D	OpenPCDet	Det3D
Model count	20+ native, 300+ via MM ecosystem	12+	3 native
Dataset support	KITTI, Waymo, nuScenes, Lyft, ScanNet, SUN RGB-D	KITTI, Waymo, nuScenes	KITTI, nuScenes, Waymo
Multi-modal	Yes (BEVFusion, MVXNet)	Yes (BEVFusion, TransFusion)	No
Custom dataset	KITTI-format or native	Dedicated tutorial + template	Manual
Indoor detection	Yes (VoteNet, FCAF3D)	No	No
Segmentation	Yes (MinkUNet, Cylinder3D)	No	No
TensorRT deploy	Via MMDeploy	Community scripts	Limited
Active maintenance	Yes	Yes	Minimal
Best for	Breadth, multi-modal, research	LiDAR-focused, production	Legacy

Recommendation for airside: Start with OpenPCDet for LiDAR-focused detection (simpler setup, good custom dataset support, DSVT achieves real-time with TensorRT). Use MMDetection3D if multi-modal fusion (LiDAR + camera) or BEV-based detection is needed. Both support adding custom classes (GSE, aircraft, tow tractors, etc.) through config modification and dataset annotation.

4.5 Adding Custom Classes

Both MMDetection3D and OpenPCDet support custom classes through:

Annotation: label your airside objects (GSE, aircraft, baggage carts, etc.) with 3D bounding boxes in KITTI or nuScenes format
Config modification: update class lists in dataset config YAML
Training: either train from scratch or fine-tune from a pre-trained checkpoint (recommended for faster convergence)
Pre-trained backbone transfer: use a checkpoint trained on nuScenes/Waymo car/pedestrian/cyclist, freeze the backbone, and train only the detection head on airside classes

5. Planning Frameworks

5.1 CommonRoad

Website: commonroad.in.tum.deOrigin: Technical University of Munich

CommonRoad is a collection of composable benchmarks for motion planning on roads.

Core components:

Scenario Database: continuously growing collection of naturalistic, handcrafted, and auto-generated scenarios in XML format (real traffic recordings + dangerous edge cases)
Drivability Checker: validates trajectory feasibility
Route Planner: high-level route computation
Criticality Measurement Toolbox: evaluates scenario safety
Vehicle Dynamics Models: multiple fidelity levels from kinematic bicycle to multi-body

Evaluation dimensions: efficiency, safety, comfort, and traffic rule compliance

Annual Competition: the 4th CommonRoad Motion Planning Competition (2024) used 500+ scenarios including interactive and non-interactive settings. Provides open-source, reproducible benchmarking framework.

Scenario types: highway and urban environments with diverse participants (cars, buses, bicycles)

Airside relevance: CommonRoad's scenario-based approach maps well to airside planning. Its drivability checker and criticality measurement tools can be adapted for taxiway/ramp scenarios. However, it lacks native support for non-road environments and would require scenario format adaptation.

5.2 nuPlan

Origin: Motional (now part of Hyundai Motor Group)

nuPlan is the world's first real-world autonomous driving dataset and benchmark specifically for motion planning.

Dataset:

1,282 hours of diverse driving scenarios
4 cities: Las Vegas, Boston, Pittsburgh, Singapore
High-quality auto-labeled object tracks and traffic light data
Modular simulation framework initialized with real-world observations

Simulation architecture:

Agent model predicts future trajectories of all agents
Planner predicts best route for ego vehicle
Controller converts intended route into feasible trajectory
Supports both open-loop (playback) and closed-loop (reactive) evaluation

nuPlan-R (2025): reactive closed-loop extension that replaces rule-based IDM agents with noise-decoupled diffusion-based reactive agents:

Learning-based reactive agents as configurable simulation components
Interaction-aware agent selection mechanism
Extended metrics: Success Rate (SR) and All-Core Pass Rate (PR)
More realistic, diverse, and human-like traffic behaviors

Airside relevance: nuPlan's closed-loop simulation framework is the gold standard for planner evaluation. Its reactive agent models could be adapted for simulating airside traffic (aircraft, GSE, personnel). The evaluation metrics framework is directly applicable.

5.3 Diffusion-Based Planners

Diffusion Planner (ICLR 2025 Oral):

Transformer-based diffusion model for closed-loop planning
Joint modeling of prediction and planning under same architecture
Classifier guidance for safe and adaptable planning behaviors
Achieves state-of-the-art closed-loop performance on nuPlan
Validated on 200-hour delivery-vehicle dataset with robust style transfer
GitHub: ZhengYinan-AIR/Diffusion-Planner

DiffusionDrive (CVPR 2025 Highlight):

Truncated diffusion policy with multi-mode anchors
10x reduction in denoising steps (2 steps vs 20+ for vanilla diffusion)
3.5 higher PDMS on NAVSIM benchmark
64% higher mode diversity score
Real-time capable for end-to-end autonomous driving
GitHub: hustvl/DiffusionDrive

CoPlanner (2025):

Interactive motion planner with contingency-aware diffusion
Models multi-agent interactions with contingency planning

Airside relevance: Diffusion planners are particularly promising for airside because they naturally handle multi-modal action distributions (e.g., yield or proceed, multiple valid parking paths). DiffusionDrive's 2-step inference is practical for real-time deployment. The joint prediction-planning architecture of Diffusion Planner aligns with the world-model approach where prediction and planning are unified.

5.4 CARLA Leaderboard

The CARLA Autonomous Driving Challenge provides standardized benchmarks for planning and driving agents:

Leaderboard 2.0/2.1: updated March 2025 with modified infraction scoring
2024 winners: top entries used learning-based approaches (TransFuser++ achieved 2nd place at CVPR 2024 challenge, Team Tuebingen_AI)
2025 challenge: not running due to unforeseen circumstances, but leaderboard platform remains available for local evaluation
Evaluation criteria: driving score combining route completion and infraction penalty

6. Labeling and Visualization Tools

6.1 CVAT (Computer Vision Annotation Tool)

Website: cvat.aiLicense: MIT (self-hosted) / SaaS available Strengths: computer vision specialist, especially video

3D Point Cloud capabilities:

Native rendering and annotation of raw 3D LiDAR data
3D cuboid/bounding box annotation
Synchronized multi-sensor views (camera + LiDAR)
Object tracking and interpolation between frames
Semantic segmentation support

2025 improvements:

Double-click to center camera on objects (speeds up small object annotation)
Standardized zoom across all annotation modes including 3D side views
Persistent zoom levels between object switches
ML-assisted annotation with model backends

Best for: teams focused on vision and LiDAR annotation with strong tracking needs

6.2 Label Studio

Website: labelstud.ioLicense: Apache 2.0 (open source)

Capabilities:

Multi-modality: text, audio, images, video, time series, 3D
Flexible UI customization via templating system
ML backend integration for active learning
3D tasks through extensions
Cloud-friendly deployment

Best for: teams needing flexible, multi-modality labeling with custom UIs and MLOps integration

Comparison with CVAT: CVAT is stronger for dedicated 3D point cloud work and video annotation (tracking, interpolation, segmentation assist). Label Studio is more flexible for mixed data types and custom labeling workflows.

6.3 Scalabel

Website: scalabel.aiLicense: BSD

Capabilities:

Scalable web-based annotation tool
3D bounding box annotation with four-click method
Object tracking and interpolation between frames
Lane marking annotation
Drivable area annotation
LiDAR point cloud support

Best for: research teams needing a lightweight, web-based annotation platform with basic 3D capabilities

6.4 Foxglove

Website: foxglove.devLicense: Source-available (MPL 2.0 for studio, commercial for platform features)

Capabilities:

Full-platform visualization and observability for robotics
Native ROS 1/2 and MCAP support
Real-time and recorded data playback
3D scene visualization with point clouds, transforms, markers
Organization-wide search and collaboration features
Fleet operations and data management
Extensions SDK (React/TypeScript) for custom visualizations
Plugin-based extensibility with online registry

MCAP format: Foxglove + MCAP is the leading solution for multimodal data management in physical AI

Best for: production robotics teams needing fleet-scale data management, visualization, and collaboration

6.5 Rerun

Website: rerun.ioLicense: Apache 2.0 / MIT dual license Funding: $20.2M total (including $17M seed in March 2025)

Capabilities:

Open-source SDK for visualizing multimodal time-varying data
SDKs for Rust, C++, and Python
Interactive 2D/3D visualization: images, point clouds, coordinate frames, bounding boxes
Time-aware: synchronized playback across multiple data streams
Desktop or browser-based viewer
Live streaming or from-disk replay
MCAP format support (with ROS 2 reflection-based support in 0.26)

Recent updates (2025-2026):

v0.27: experimental coordinate frame hierarchies, blueprint controls for 3D views
v0.26: major performance improvements, reflection-based ROS 2 MCAP support
Active development toward embedded/web-based deployment

Best for: engineering teams wanting a code-first, open-source visualization SDK with minimal setup. Particularly strong for debugging computer vision and ML pipelines.

6.6 Tool Comparison for Airside

Tool	3D LiDAR	Tracking	ROS 2	Self-hosted	Cost	Best Airside Use
CVAT	Native cuboids	Yes	No	Yes	Free/SaaS	Annotation pipeline
Label Studio	Via extensions	Limited	No	Yes	Free	Multi-modal labeling
Scalabel	Basic	Yes	No	Yes	Free	Quick annotation
Foxglove	Visualization	No	Native	Yes/SaaS	Free/Paid	Fleet data review, debugging
Rerun	Visualization	No	Experimental	Yes	Free	ML pipeline debugging

Recommended stack for airside:

Annotation: CVAT for 3D point cloud + camera annotation (best native 3D support, tracking interpolation)
Visualization/Debugging: Foxglove for ROS 2 bag review and fleet data; Rerun for ML pipeline development
Data management: Foxglove + MCAP for fleet-scale recorded data

7. Simulation Platforms

7.1 CARLA

Website: carla.orgLicense: MIT Engine: Unreal Engine 5.5 (ue5-dev branch) Latest: v0.10.0

Key features:

Open-source simulator for autonomous driving research
150,000+ active developers worldwide
Native ROS 2 support compiled directly into server (as of 0.10.0)
Python API for full scenario control
Multi-agent traffic simulation
Weather and lighting variation
Multiple sensor models (camera, LiDAR, radar, GNSS, IMU)
Digital Twins from OpenStreetMaps (v0.1)
New maps: Town13 (large), Town15

UE5 features:

Chaos physics engine
Nanite/Lumen for high-fidelity rendering
Requires Ubuntu 22.04+ or Windows 11+

2025-2026 NVIDIA integrations:

Cosmos Transfer APIs: generate scene variants with diverse weather/lighting
NuRec tools: neural reconstruction and rendering for high-fidelity simulation
Physical AI Dataset: 40,000 clips generated through Cosmos
Sample reconstructed scenes for neural rendering

Leaderboard: 2.1 version since March 2025 with modified infraction scoring. Challenge not running in 2025, but platform available for local evaluation.

7.2 AWSIM

Website: github.com/tier4/AWSIMLicense: Apache 2.0 Engine: Unity3D

Key features:

Built specifically for Autoware integration (works out-of-the-box)
Sensor simulation: IMU, GNSS, RGB camera, various LiDARs
GPU LiDAR: high-performance simulation via Robotec GPU Lidar (RGL)
Digital twin: Shinjuku Tokyo area included with lanenet2 and point cloud maps
Traffic: NPC vehicles respecting traffic lights and right of way, configurable density and spawn points
Communication: ROS2ForUnity with higher throughput and lower latency than bridge solutions
No API support (unlike CARLA)

2025 updates:

Rain and fog simulation
GPU skinning
Multi-return LiDAR modes
v2.0.0: major refactoring for improved scalability

AWSIM-Labs: Autoware Foundation maintained fork for self-driving research

D-AWSIM (2025 paper): distributed version for dynamic map generation framework

7.3 NVIDIA Isaac Sim

Website: developer.nvidia.com/isaac/simLicense: Free for individual developers (NVIDIA Omniverse license) Engine: NVIDIA Omniverse (USD-based)

Key features (Isaac Sim 5.0):

Open-source and fully customizable (available on GitHub)
Neural reconstruction and rendering
Advanced synthetic data generation pipelines
MobilityGen: generates diverse physics-based data for autonomous robots/vehicles
New OmniSensor USD schema for improved sensor simulation
Software-in-the-Loop (SIL) testing
Domain randomization for training data generation
Physically accurate sensor models

Primary focus: general robotics simulation (manipulators, mobile robots, humanoids), not specifically autonomous driving

AV-relevant capabilities:

High-fidelity sensor simulation (cameras, LiDAR, radar, ultrasonic)
Digital twin creation from real-world data
Cosmos world foundation model integration
FMU integration for vehicle dynamics

Limitation for AV: not purpose-built for driving scenarios. NVIDIA has DRIVE Sim (not fully open) for production AV simulation; Isaac Sim is more general-purpose robotics.

7.4 Comparison for Airside Use

Feature	CARLA	AWSIM	Isaac Sim
Engine	UE5.5	Unity3D	Omniverse
License	MIT	Apache 2.0	Free/Commercial
Primary use	AV research	Autoware testing	General robotics
ROS 2	Native (0.10+)	Native	Via bridge
API	Full Python API	No API	USD/Python API
LiDAR sim	Built-in	GPU-accelerated (RGL)	Omniverse RTX
Custom environments	UE5 editor	Unity editor	USD/Omniverse
Traffic sim	Yes (multi-agent)	Yes (configurable NPCs)	Limited
Weather/lighting	Yes + Cosmos	Rain/fog (2025)	Domain randomization
Digital twins	OpenStreetMaps	Shinjuku included	USD scene import
Neural rendering	NuRec integration	No	Built-in
Cosmos integration	Yes (Transfer APIs)	No	Yes
Community size	150,000+	Autoware ecosystem	Broad NVIDIA ecosystem
Airside fit	Best general choice	Best if using Autoware	Best for custom physics

Recommendation for airside simulation:

Short-term: CARLA is the best starting point. It has:

Largest community and most resources
Full API for scripting custom airside scenarios (aircraft, GSE, personnel)
Native ROS 2 for integration with Autoware or custom stacks
Cosmos/NuRec integration for neural rendering and scene variation
UE5 editor for creating custom airport environments

For Autoware integration: AWSIM is the natural choice if the production stack is Autoware-based, as it works out-of-the-box with Autoware's full pipeline.

For synthetic data generation: Isaac Sim excels at generating diverse, physically accurate training data through MobilityGen and domain randomization. Best used to supplement CARLA-based scenario testing.

Key gap: none of these simulators ship with airport/airside environments. Custom environment creation is required regardless of choice. CARLA's UE5 editor is the most mature for custom environment authoring. For airside, you need to model:

Taxiways, aprons, and ramp areas (non-road surfaces)
Aircraft (parked and taxiing)
Ground support equipment (tugs, belt loaders, fuel trucks, baggage carts)
Personnel (marshalling, ramp workers)
Jet bridges and terminal buildings
Airport-specific signage and markings

8. Airside Relevance Summary

Architecture Decision

Approach	Stack	When to Use
Modular pipeline	Autoware.Universe	Production deployment, safety certification, multi-sensor fusion, gradual component upgrades
End-to-end learned	openpilot-style	R&D, world model research, rapid iteration, data-driven improvement
Hybrid	Autoware + E2E modules	Best of both: use Autoware pipeline for safety-critical control, swap perception/planning with learned models

Recommended Tool Chain for Airside AV Development

Data Collection:  ROS 2 bags from fleet
                     |
Annotation:       CVAT (3D LiDAR + camera cuboids)
                     |
Perception:       OpenPCDet or MMDetection3D (custom airside classes)
                     |
World Model:      Custom DiT (borrow comma.ai architecture)
                  + Cosmos Transfer (synthetic data augmentation)
                     |
Planning:         Diffusion Planner (multi-modal, handles airside ambiguity)
                  + CommonRoad (scenario-based safety validation)
                     |
Simulation:       CARLA (custom airport environment)
                  + AWSIM (if Autoware-based)
                     |
Visualization:    Foxglove (fleet data review)
                  + Rerun (ML pipeline debugging)
                     |
Deployment:       Autoware.Universe (modular, certifiable)
                  + TensorRT (NVIDIA edge inference)

Key Takeaways

Autoware.Universe provides the most production-ready open-source AV stack with strong ROS 2 architecture, active BEV perception integration, and Agnocast for real-time performance. Its modular design allows swapping in world-model-based components.
comma.ai's world model is the most advanced deployed learned driving simulator. The 2B DiT architecture, future state anchoring, and on-policy training methodology are directly applicable to airside world models. The progression from reprojective sim to full world model sim provides a practical adoption roadmap.
NVIDIA's ecosystem (Cosmos + Alpamayo + DriveWorks) provides the most comprehensive toolchain for going from data to deployed model, with Cosmos enabling synthetic data amplification and Alpamayo serving as a teacher model for knowledge distillation.
Perception: OpenPCDet for LiDAR-first development with custom airside classes; MMDetection3D for multi-modal fusion. Both support custom datasets with pre-trained backbone transfer.
Planning: Diffusion-based planners (ICLR/CVPR 2025) represent the state of the art, with natural multi-modality handling that suits airside's ambiguous scenarios. nuPlan provides the evaluation framework standard.
Simulation: CARLA with NVIDIA Cosmos integration is the strongest platform but requires custom airport environment development. No open-source airside simulation environments exist.
Tooling: CVAT for annotation, Foxglove for fleet data management, Rerun for ML debugging -- all open source and self-hostable.

SLAM Methods

Methods

Open-Source AV Ecosystem: Stacks, Perception, Planning, and Tools ​

Table of Contents ​

1. Autoware.Universe ​

1.1 Overview and Deployment Status ​

1.2 ROS 2 Architecture ​

1.3 Module Architecture ​

Sensing ​

Localization ​

Perception ​

Planning (hierarchical, modular) ​

Control ​

Vehicle Interface ​

1.4 BEV Integration (Vision-First Perception) ​

1.5 End-to-End AI and World Model Integration Potential ​

2. comma.ai openpilot ​

2.1 End-to-End Architecture ​

2.2 World Model Architecture ​

2.3 Training from Real Data ​

2.4 CVPR 2025 Results and Real-World Deployment ​

2.5 What to Borrow for Airside ​

3. NVIDIA DriveWorks / DRIVE SDK ​

3.1 DriveWorks SDK ​

3.2 NVIDIA DRIVE AGX Thor ​

3.3 Cosmos World Foundation Models ​

3.4 Alpamayo VLA Model ​

3.5 Relevance for Airside ​

4. Perception Frameworks ​

4.1 MMDetection3D ​

4.2 OpenPCDet ​

Open-Source AV Ecosystem: Stacks, Perception, Planning, and Tools

Table of Contents

1. Autoware.Universe

1.1 Overview and Deployment Status

1.2 ROS 2 Architecture

1.3 Module Architecture

Sensing

Localization

Perception

Planning (hierarchical, modular)

Control

Vehicle Interface

1.4 BEV Integration (Vision-First Perception)

1.5 End-to-End AI and World Model Integration Potential

2. comma.ai openpilot

2.1 End-to-End Architecture

2.2 World Model Architecture

2.3 Training from Real Data

2.4 CVPR 2025 Results and Real-World Deployment

2.5 What to Borrow for Airside

3. NVIDIA DriveWorks / DRIVE SDK

3.1 DriveWorks SDK

3.2 NVIDIA DRIVE AGX Thor

3.3 Cosmos World Foundation Models

3.4 Alpamayo VLA Model

3.5 Relevance for Airside

4. Perception Frameworks

4.1 MMDetection3D

4.2 OpenPCDet