Skip to content

Augmenting a Frenet-Frame Trajectory Planner with Learned World Model Cost Functions

Executive Summary

This report details a practical architecture for augmenting a classical Frenet-frame trajectory planner (Werling 2010) with cost functions derived from a learned world model. The core idea: keep the proven, certifiable sampling-based planner that generates candidate trajectories, but add neural cost terms — occupancy collision cost, hazard proximity cost, predicted smoothness — evaluated by running each candidate through a world model. The classical planner remains the safety backbone. The world model provides richer scene understanding that hand-tuned heuristics cannot capture. This document covers the full stack from mathematical foundations through GPU inference engineering, ROS2 integration, safety constraints, and the eventual migration path toward fully learned planning.


1. Frenet Planner Fundamentals

1.1 The Werling (2010) Formulation

The seminal paper "Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame" (Werling, Ziegler, Kammel, Thrun — ICRA 2010) introduced the dominant sampling-based planning paradigm for structured roads. The key insight: decompose vehicle motion into a longitudinal component along a reference path (arc-length coordinate s) and a lateral component perpendicular to it (offset coordinate d). In this curvilinear Frenet frame, highway-like planning problems become two independent 1D optimization problems.

Coordinate system. Given a reference path (road centerline), any position can be described by (s, d) where s is the distance traveled along the reference and d is the signed perpendicular offset. The Frenet-Serret frame provides local tangent and normal vectors at each point along the reference, enabling smooth coordinate transforms even on curved roads.

Why Frenet works for structured environments. Airport airside roads, taxiways, and service lanes are highly structured — well-defined lanes, known geometry, predictable curvature. This is the ideal operating domain for Frenet-frame planning, as the road structure provides natural reference paths and the operational design domain constrains the lateral offset range.

1.2 Trajectory Generation via Quintic Polynomials

Each candidate trajectory is generated by connecting the current vehicle state to a sampled terminal state using polynomial functions that minimize jerk (the derivative of acceleration).

Lateral trajectory generation. For high-speed mode, lateral motion d(t) is represented as a quintic (5th-order) polynomial:

d(t) = a0 + a1*t + a2*t^2 + a3*t^3 + a4*t^4 + a5*t^5

The six coefficients are determined by six boundary conditions:

  • Initial: d(0) = d0, d'(0) = d0_dot, d''(0) = d0_ddot
  • Terminal: d(T) = d1, d'(T) = 0, d''(T) = 0

Terminal velocity and acceleration are set to zero (smooth lane centering). This formulation inherently minimizes the integral of squared jerk, producing comfortable trajectories.

Longitudinal trajectory generation. For velocity-keeping mode, longitudinal motion s(t) uses quartic (4th-order) polynomials:

s(t) = b0 + b1*t + b2*t^2 + b3*t^3 + b4*t^4

With five boundary conditions:

  • Initial: s(0) = s0, s'(0) = v0, s''(0) = a0
  • Terminal: s'(T) = v_target, s''(T) = 0

For stopping/merging modes, quintic polynomials are used with an explicit terminal position constraint.

1.3 Candidate Sampling Strategy

The planner generates a combinatorial set of candidates by sampling across three dimensions:

ParameterSymbolTypical RangeStep Size
Lateral offsetd1-MAX_ROAD_WIDTH to +MAX_ROAD_WIDTH0.5–1.0 m
Time horizonTMIN_T to MAX_T (e.g. 3.0–5.0 s)0.2–0.5 s
Target velocityv_targetv_desired +/- delta_v1.0–2.0 m/s

For a typical configuration with 7 lateral samples, 5 time samples, and 5 velocity samples, this yields 175 candidate trajectories per planning cycle. Production systems like TUM's Frenetix generate 400–800+ candidates and evaluate them in under 8 ms using C++ acceleration.

Each lateral trajectory is combined with each longitudinal trajectory, transformed from Frenet back to Cartesian coordinates, and checked for feasibility (curvature limits, acceleration bounds).

1.4 Standard Cost Functions

The total cost for a candidate trajectory combines lateral and longitudinal components:

C_total = K_LAT * C_lat + K_LON * C_lon

Lateral cost:

C_lat = K_J * J_d + K_T * T + K_D * d1^2

Where:

  • J_d = sum of squared lateral jerk (comfort)
  • T = time horizon (prefer faster maneuvers)
  • d1^2 = squared final lateral offset (prefer lane center)

Longitudinal cost (velocity-keeping mode):

C_lon = K_J * J_s + K_T * T + K_V * (v_target - v_final)^2

Where:

  • J_s = sum of squared longitudinal jerk
  • (v_target - v_final)^2 = velocity deviation penalty

Typical weight values (from PythonRobotics reference implementation):

WeightValuePurpose
K_J0.1Jerk penalty
K_T0.1Time penalty
K_D1.0Lateral offset penalty
K_V (K_S_DOT)1.0Velocity deviation
K_LAT1.0Lateral-longitudinal balance
K_LON1.0Lateral-longitudinal balance

After cost evaluation, candidates are filtered by feasibility (max curvature, max acceleration) and collision-checked against known obstacles. The lowest-cost feasible, collision-free trajectory is selected.

1.5 Path Tracking: Stanley and Pure Pursuit

Once a trajectory is selected, a tracking controller follows it.

Pure Pursuit. A geometric controller that computes steering angle by tracking a look-ahead point on the reference path at a fixed distance ahead of the rear axle. Steering angle delta = arctan(2 * L * sin(alpha) / L_d), where L is wheelbase, alpha is the angle to the look-ahead point, and L_d is the look-ahead distance. Simple, stable at low speeds, but poor high-speed tracking and corner-cutting behavior.

Stanley Controller. Uses the front axle as reference and considers both heading error psi_e and cross-track error e:

delta = psi_e + arctan(k * e / v)

Where k is a gain parameter and v is vehicle speed. Stanley provides tighter tracking with better cross-track error convergence but can produce aggressive steering inputs.

For airside operations (low speeds, 5–25 km/h, tight maneuvers around aircraft), a hybrid approach is practical: Pure Pursuit for straight segments, Stanley for tight curves, with MPC for scenarios requiring precise constraint satisfaction near obstacles.


2. Adding World Model Cost Functions

2.1 Architecture: Classical Planner + Neural Scorer

The core architecture separates trajectory generation from trajectory scoring:

                    ┌─────────────────────────┐
                    │   Frenet Planner Core    │
                    │  (Trajectory Generation) │
                    │                          │
                    │  Sample N candidates     │
                    │  Compute classical costs │
                    │  Filter infeasible       │
                    └──────────┬──────────────┘

                    N candidates (positions, velocities, accelerations)

              ┌────────────────┼────────────────┐
              │                │                │
              v                v                v
    ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐
    │  Classical   │  │ World Model │  │   Combined      │
    │  Cost        │  │ Cost        │  │   Selection     │
    │  (jerk, d,   │  │ (occupancy, │  │                 │
    │   velocity)  │  │  hazard,    │  │  C = alpha*C_cl │
    │              │  │  smoothness)│  │    + beta*C_wm  │
    └─────────────┘  └─────────────┘  └─────────────────┘

The world model evaluates each candidate trajectory by:

  1. Rolling out the trajectory through its predicted future scene state
  2. Computing neural cost terms based on predicted occupancy, agent interactions, and scene evolution
  3. Returning bounded cost values that are combined with classical costs

2.2 Occupancy Collision Cost

The world model predicts future occupancy grids — spatial probability maps showing where obstacles, vehicles, and pedestrians will be at each future time step.

Formulation:

C_occ = sum_{t=0}^{T} sum_{cells in ego_footprint(t)} P_occupied(cell, t) * gamma^t

Where:

  • P_occupied(cell, t) is the predicted occupancy probability at grid cell for time step t
  • ego_footprint(t) is the set of grid cells covered by the ego vehicle at time t along the candidate trajectory
  • gamma is a temporal discount factor (e.g., 0.95) — near-term collisions matter more

Implementation detail. The ego vehicle footprint is rasterized onto the occupancy grid at each trajectory waypoint. The predicted occupancy can come from:

  • 3D occupancy networks (OccWorld-style) predicting voxel occupancy evolution
  • BEV occupancy flow models predicting 2D top-down occupancy + motion vectors
  • Agent-based prediction converted to occupancy (Gaussian splats from predicted trajectories)

For airside operations, the occupancy model should be trained or fine-tuned on airside-specific actors: aircraft (large, slow-moving), ground support equipment (GSE), fuel trucks, baggage tugs, and pedestrians (ground crew, marshals).

Cost normalization. Raw occupancy collision cost is bounded: C_occ_normalized = min(C_occ / C_occ_max, 1.0). This prevents a single high-confidence prediction from dominating the cost.

2.3 Hazard Proximity Cost

Beyond binary collision detection, the world model provides soft proximity penalties based on predicted distances to hazards.

Formulation:

C_hazard = sum_{t=0}^{T} sum_{agents} max(0, d_safe - d_predicted(agent, t))^2 / d_safe^2

This penalizes trajectories that bring the ego vehicle within a safety margin of predicted agent positions, with quadratic penalty increasing as distance decreases.

Time-to-collision (TTC) component:

C_ttc = sum_{t=0}^{T} max(0, TTC_threshold - TTC(t)) / TTC_threshold

The world model's predicted agent trajectories enable more accurate TTC computation than rule-based constant-velocity assumptions, particularly for:

  • Vehicles executing turns or lane changes
  • GSE with erratic stopping patterns
  • Pedestrians crossing apron areas

Learned hazard proximity extends beyond geometric distance. The world model can learn that certain configurations are dangerous even at seemingly safe distances — e.g., a fuel truck approaching a running engine, or a baggage cart near an aircraft door zone.

2.4 Predicted Smoothness Cost

The world model can evaluate trajectory smoothness in the context of predicted scene evolution, not just kinematic jerk.

Reactive smoothness. A trajectory may be kinematically smooth but require abrupt corrections if the scene evolves unfavorably. The world model scores trajectories by rolling them out against predicted futures and measuring:

C_smooth = sum_{t=1}^{T} ||a_predicted(t) - a_planned(t)||^2

Where a_predicted(t) is the acceleration the ego would need to remain safe given the predicted scene at time t, and a_planned(t) is the planned acceleration. A trajectory that remains feasible without modification scores low; one that would require emergency braking scores high.

Passenger comfort proxy. For airport shuttle/transport vehicles, the world model can learn comfort-relevant features beyond jerk:

  • Anticipated stop-and-go patterns near gates
  • Queuing behavior behind other GSE
  • Smooth deceleration profiles approaching aircraft stands

2.5 Combining Classical and World Model Costs

The final trajectory cost is a weighted combination:

C_total = alpha * C_classical + beta * C_world_model

Where:

C_classical = K_LAT * C_lat + K_LON * C_lon + C_collision_check
C_world_model = w_occ * C_occ + w_hazard * C_hazard + w_smooth * C_smooth

Weight scheduling. During initial deployment, alpha >> beta (classical dominates). As confidence in the world model grows through validation, beta increases. This is a continuous dial, not a binary switch.

Candidate selection with dual scoring:

  1. Compute C_classical for all N candidates
  2. Rank by classical cost; take top-M (e.g., M = N/2) as pre-filtered set
  3. Evaluate world model costs only on top-M candidates (saves GPU inference)
  4. Compute C_total for top-M; select the minimum

This two-stage approach reduces GPU inference load while ensuring that only classically-reasonable trajectories are evaluated by the world model.


3. Practical Implementation

3.1 Batched GPU Inference for N Candidates

The key engineering challenge is evaluating the world model on N candidate trajectories within the planning cycle budget (typically 50–100 ms at 10–20 Hz).

Batching architecture:

python
# Pseudocode for batched world model evaluation
def evaluate_candidates_batch(world_model, scene_state, candidates):
    """
    scene_state: BEV features, agent states [B=1, C, H, W]
    candidates: trajectory waypoints [N, T, 2]  (x, y per timestep)
    """
    # Expand scene state to batch dimension
    scene_batch = scene_state.expand(N, -1, -1, -1)  # [N, C, H, W]

    # Encode trajectory candidates as action embeddings
    action_embeddings = trajectory_encoder(candidates)  # [N, T, D]

    # Single forward pass through world model
    # Predicts future scene conditioned on each candidate trajectory
    predicted_futures = world_model(scene_batch, action_embeddings)  # [N, T, C, H, W]

    # Compute costs from predicted futures
    occ_costs = occupancy_collision_cost(predicted_futures, candidates)  # [N]
    hazard_costs = hazard_proximity_cost(predicted_futures, candidates)  # [N]
    smooth_costs = smoothness_cost(predicted_futures, candidates)       # [N]

    return occ_costs, hazard_costs, smooth_costs

Performance targets. Research by the BEV world model trajectory evaluation paper (2504.01941) demonstrated that evaluating 256 trajectories through a transformer-based world model achieves 18.7 ms total latency on an NVIDIA L20 GPU. This is well within the real-time budget for a 20 Hz planner.

Key optimizations:

  • Shared scene encoding. The expensive BEV feature extraction is done once, then reused across all N trajectory evaluations. Only the trajectory-conditioned decoding varies per candidate.
  • TensorRT compilation. Convert PyTorch models to TensorRT for 2–5x inference speedup. Use FP16 precision for further gains with minimal accuracy loss.
  • CUDA graphs. Capture the inference computation graph once, then replay it each cycle. Eliminates kernel launch overhead for repeated identical graph shapes.
  • Dynamic batching. If N varies (due to pre-filtering), use padding to maintain fixed batch size for optimal GPU utilization.

3.2 Pre-Filtering Strategy

Not all candidates deserve GPU evaluation. A two-stage filter dramatically reduces computational load:

Stage 1: Classical filter (CPU, < 1 ms)

  • Discard kinematically infeasible trajectories (curvature > max, acceleration > max)
  • Discard trajectories with immediate collision (overlap with known static obstacles)
  • Discard trajectories leaving drivable area

Stage 2: Cheap heuristic filter (CPU, < 1 ms)

  • Rank remaining candidates by classical cost
  • Take top-K candidates (e.g., K = 50–100 from 400+ initial samples)
  • These are the candidates sent to GPU for world model evaluation

Result: From 400+ initial samples, 50–100 reach the GPU. At 18.7 ms for 256 candidates, 50–100 candidates should evaluate in under 10 ms.

3.3 C++/Python Interop Architecture

Production autonomous driving stacks are predominantly C++ (ISO 26262 compliance, real-time guarantees). The world model is trained in Python/PyTorch. Bridging them requires careful engineering.

Option A: TensorRT C++ API (Recommended for production)

┌──────────────────────────────┐
│  C++ Planner Node            │
│                              │
│  1. Generate candidates      │
│  2. Classical cost (C++)     │
│  3. Pre-filter               │
│  4. Fill input tensors       │  ──→  GPU shared memory
│  5. Call TensorRT engine     │  ←──  GPU shared memory
│  6. Read cost tensors        │
│  7. Select best trajectory   │
└──────────────────────────────┘

The world model is exported from PyTorch to ONNX, then compiled to a TensorRT engine. The C++ planner loads the engine and executes inference natively. No Python runtime involvement at inference time.

Option B: Shared memory bridge (Development/prototyping)

┌───────────────────┐     Shared Memory      ┌───────────────────┐
│  C++ Planner      │ ◄──────────────────────►│  Python WM Server │
│                   │   CUDA IPC handles       │                   │
│  candidates →     │   or POSIX shm           │  ← candidates     │
│  costs ←          │                          │  → costs           │
└───────────────────┘                          └───────────────────┘

Using CUDA IPC memory handles, GPU tensors can be shared between processes without copy. The C++ planner writes candidate trajectories to shared GPU memory; the Python process reads them, runs inference, and writes costs back. Latency overhead is ~0.1 ms for the IPC handshake.

Option C: PyBind11 in-process (Research/testing)

cpp
// C++ side
#include <pybind11/embed.h>
py::scoped_interpreter guard{};
py::object wm_module = py::module::import("world_model_scorer");
py::object score_fn = wm_module.attr("score_trajectories");
py::array_t<float> costs = score_fn(candidates_numpy).cast<py::array_t<float>>();

Embeds the Python interpreter in the C++ process. Simple but introduces GIL contention and Python runtime overhead. Suitable for research prototyping only.

3.4 Memory Layout and Zero-Copy Transfers

Trajectory tensor format. Candidates should be stored in a contiguous GPU buffer:

candidates: float32[N, T, 5]  // (x, y, heading, velocity, curvature) per waypoint

Scene state format. BEV features from the perception pipeline:

bev_features: float16[C, H, W]  // e.g., [256, 200, 200] for 100m x 100m at 0.5m resolution

Zero-copy pipeline:

  1. C++ planner allocates candidates tensor in CUDA pinned memory
  2. Fill trajectory waypoints (CPU computation, but memory is GPU-accessible)
  3. cudaMemcpyAsync to device memory (overlaps with other computation)
  4. TensorRT inference on device
  5. Cost results remain in device memory; cudaMemcpyAsync back for final selection

Using DLPack or __cuda_array_interface__ for framework-agnostic zero-copy tensor exchange between C++, Python, PyTorch, and TensorRT.


4. Existing Work: Learned Planning in Practice

4.1 Think2Drive (ECCV 2024)

Think2Drive is the first model-based reinforcement learning method for autonomous driving. It uses a latent world model based on DreamerV3 architecture.

Architecture:

  • World model: Learns environment transitions in a compact latent space — state transitions, rewards, and termination conditions
  • Planner: Trained entirely within the world model's latent space ("thinking" in imagination)
  • Input: 3D occupancy as scene representation

Key results: Achieved expert-level driving in CARLA v2 within 3 days on a single A6000 GPU. First method to achieve 100% route completion on CARLA v2 scenarios.

Relevance to Frenet augmentation: Think2Drive demonstrates that world models can serve as effective neural simulators for planner training. For augmenting a Frenet planner, a similar world model could be trained to predict scene evolution, but used at inference time for trajectory scoring rather than for policy learning. The latent space efficiency (low-dimensional state, parallel tensor computation) is directly applicable to our batched evaluation architecture.

4.2 WorldRFT (AAAI 2026)

WorldRFT combines latent world models with reinforcement fine-tuning (RFT) for end-to-end driving.

Architecture:

  • Spatial-Aware World Encoder (SWE): Processes multi-view images through ResNet backbone, integrates VGGT (visual geometry grounded transformer) frozen tokens via cross-attention for 3D spatial awareness
  • Hierarchical Planning Refinement (HPR): Three parallel subtasks — target region localization (probabilistic Laplace distributions), spatial path planning (2m intervals), temporal trajectory prediction (0.5s intervals)
  • Local-Aware Iterative Refinement: K iterations of deformable-convolution-based feature sampling along projected trajectory points

Reinforcement fine-tuning: Uses Group Relative Policy Optimization (GRPO) with collision-aware rewards (-1 for collision, 0 otherwise). Trajectory Gaussianization recasts deterministic regression as probabilistic modeling with auxiliary variance networks, enabling sampling-based exploration.

Safety results: 83% collision rate reduction on nuScenes (0.30% to 0.05%). The RFT approach specifically improves safety-critical performance beyond what supervised learning alone achieves.

Relevance: The collision-aware reward formulation and GRPO optimization could be adapted for fine-tuning the world model cost function in our architecture. The iterative refinement mechanism (K iterations with deformable attention) is a good pattern for refining trajectory scores based on local scene features.

4.3 DiffusionDrive (CVPR 2025 Highlight)

DiffusionDrive uses a truncated diffusion model for end-to-end trajectory generation.

Architecture:

  • Anchored Gaussian distribution: Instead of denoising from pure noise, starts from K-means clustered trajectory anchors (20 anchors from training data)
  • Truncated schedule: Only 50/1000 diffusion steps during training; 2 denoising steps at inference
  • Cascade transformer decoder: Stacked transformer layers with deformable spatial cross-attention to BEV/perspective features

Performance: 88.1 PDMS on NAVSIM at 45 FPS on an NVIDIA 4090. This is 10x fewer denoising steps than vanilla diffusion policy, with 6x faster inference.

Relevance to Frenet augmentation: DiffusionDrive's anchor-based approach mirrors the Frenet planner's sampling philosophy — both generate diverse candidates and score them. The key insight is that "human drivers adhere to established driving patterns," justifying starting from anchored proposals rather than random noise. A hybrid could use Frenet-generated candidates as anchors for a diffusion-based refinement step.

4.4 NVIDIA GTRS (NAVSIM v2 Challenge Winner, 2025)

Generalized Trajectory Scoring is NVIDIA's winning approach to the NAVSIM v2 Challenge.

Architecture:

  • Diffusion-based trajectory generator: Produces 100 diverse fine-grained proposals conditioned on BEV features
  • Vocabulary Generalization Scorer: Transformer decoder trained on super-dense vocabulary (16,384 trajectories), tested on 8,192. Uses trajectory dropout during training for robustness
  • Sensor-Augmented Scorer: Horizontal rotation perturbations for out-of-domain robustness with EMA teacher self-distillation

Scoring pipeline: Coarse trajectory sets covering broad situations combined with fine-grained trajectories for safety-critical scenarios. A transformer decoder distilled from perception-dependent metrics (safety, comfort, traffic compliance) progressively filters candidates.

Relevance: GTRS directly validates the "generate-then-score" paradigm. The vocabulary generalization technique — training the scorer on more trajectories than it sees at inference — is directly applicable to our Frenet candidates. The scorer learned to generalize across trajectory distributions rather than memorizing specific patterns.

4.5 Comma.ai's World Model Approach (Production, 2025–2026)

Comma.ai shipped the first production driving model trained entirely in a learned world model simulator.

Architecture:

  • Compressor Model: Based on Stable Diffusion's image VAE, compresses world states to lower-dimensional latent representations
  • Dynamics Model: Video Diffusion Transformer that models latent space dynamics — predicts future states given history and actions
  • Plan Head: Added to the dynamics model, predicts the trajectory to take. Trained on human driving data as ground truth

Key innovation — future anchoring: Providing only past states led to offline training failures (model couldn't recover from errors). Solution: anchor the model on future observations, allowing it to "know where the car is going to be" and recover from prediction mistakes.

Production deployment: openpilot 0.11 (2025) is the first real-world robotics agent fully trained in a learned simulation, shipped to consumer vehicles. This replaces hand-coded MPC planners entirely.

Relevance: Comma's approach represents the end state of our migration path. Their journey — from hand-coded MPC to MPC augmented with learned models to fully learned planning — is exactly the trajectory this augmented Frenet architecture enables. The future-anchoring technique addresses a fundamental challenge in world model training that applies to our setting.

4.6 Tesla FSD: MCTS + Neural Scoring (Historical, pre-2024)

Before transitioning to end-to-end learning, Tesla's FSD used a hybrid architecture directly relevant to Frenet augmentation:

  • Tree search: Monte Carlo Tree Search (MCTS) generates candidate action sequences
  • Neural network guidance: A trained network provides heuristics to guide the search, assessing which paths are most promising
  • Trajectory scoring: Combined classical physics-based checks (collision, comfort) with neural network evaluators predicting intervention likelihood and human-likeness

This is precisely the "classical generator + neural scorer" pattern proposed for Frenet augmentation. Tesla's experience validates the approach but also shows its limitations — the hand-off between classical generation and neural scoring creates a bottleneck that eventually motivated their move to end-to-end learning.

4.7 AdaWM: Adaptive World Model Planning (ICLR 2025)

AdaWM addresses a critical practical problem: performance degradation when adapting pretrained world models to new tasks.

Core insight: When pretrained dynamics models and policies are applied to new domains, distribution shift causes two types of mismatch — dynamics model mismatch (model fails to capture new environment) and policy mismatch (pretrained policy becomes suboptimal).

Solution: Quantify which mismatch dominates using Total Variation distance, then selectively update either the policy or the model using LoRA-based low-rank adaptation. Decision criterion: "Update dynamics model if model error >= C1 * policy error - C2" where C1, C2 are theoretically derived constants.

Results: Significantly outperforms baselines on challenging CARLA tasks (roundabouts, dense traffic left turns) with only 1 hour of fine-tuning on a single V100 GPU after 12 hours of pretraining.

Relevance: AdaWM's adaptive fine-tuning approach is directly applicable to deploying world model costs in new airside environments. When transitioning between airports or operating conditions, the LoRA-based adaptation enables rapid domain transfer without full retraining.

4.8 Woven by Toyota: ML Planner Deployment (Production)

Woven Planet (Toyota) deployed a machine-learned planner as the default in their autonomous vehicles in San Francisco and Palo Alto.

Deployment methodology:

  • Shadow mode validation: The ML planner ran alongside the classical planner, with outputs compared but not actuated
  • Data-driven evaluation: Built a system that scales with planner performance, learning from safety driver disengagements and notes
  • Gradual transition: ML planner became default only after extensive validation against classical baseline

This is the most relevant production precedent for our migration path.


5. Safety Architecture

5.1 Handling Wrong World Model Scores

World models will produce incorrect predictions. The safety architecture must assume this and bound the damage.

Failure modes:

  1. Overconfident false safety: World model scores a dangerous trajectory as low-cost (missed obstacle, wrong prediction)
  2. Overconfident false danger: World model scores a safe trajectory as high-cost (hallucinated obstacle, prediction error)
  3. Out-of-distribution collapse: World model produces nonsensical scores on unseen scenarios

Mode 1 (false safety) is safety-critical. Mode 2 causes conservatism. Mode 3 is detectable.

5.2 Bounded Cost Functions

All world model cost functions must be bounded to prevent them from overriding classical safety checks:

C_wm_bounded = clamp(C_wm_raw, 0, C_max)

Hard safety invariants that the world model cannot override:

  • Minimum following distance (physics-based, not learned)
  • Maximum speed limits
  • Drivable area boundaries
  • Emergency stopping distance at current velocity

These are implemented as hard constraints in the classical planner, evaluated after world model scoring. Any trajectory violating hard constraints is discarded regardless of its world model score.

5.3 Fallback to Traditional Scoring

Watchdog mechanisms for world model reliability:

python
class WorldModelScorer:
    def __init__(self):
        self.inference_timeout_ms = 30  # max time for WM inference
        self.score_variance_threshold = 0.95  # detect collapsed predictions
        self.ood_detector = OODDetector()  # out-of-distribution input detector

    def score_with_fallback(self, candidates, scene_state):
        # Check if scene is in-distribution for the world model
        if self.ood_detector.is_ood(scene_state):
            return None  # signal: use classical costs only

        try:
            costs = self.run_inference_with_timeout(candidates, scene_state)
        except TimeoutError:
            return None  # inference too slow, use classical

        # Check for collapsed predictions (all scores nearly identical)
        if costs.std() < self.score_variance_threshold:
            return None  # model not discriminating, use classical

        # Check for NaN/Inf
        if not torch.isfinite(costs).all():
            return None  # numerical instability, use classical

        return costs

OOD detection approaches:

  • Embedding distance: Compare current scene embedding to training distribution. If Mahalanobis distance exceeds threshold, flag as OOD
  • Ensemble disagreement: Run multiple world model heads; if their cost predictions disagree significantly, reduce world model weight
  • Reconstruction error: If a VAE-based world model cannot reconstruct the current scene well, its predictions are unreliable

5.4 Confidence-Weighted Blending

Rather than binary fallback, use continuous confidence weighting:

C_total = C_classical + confidence * beta * C_world_model

Where confidence is derived from:

  • World model prediction entropy (lower entropy = higher confidence)
  • OOD score (in-distribution = higher confidence)
  • Ensemble agreement (high agreement = higher confidence)
  • Historical accuracy (running average of prediction quality)

This approach, inspired by HyPlan's confidence-calibrated blending, ensures graceful degradation rather than abrupt mode switches.

5.5 Safety Envelope Verification

Post-scoring verification ensures the selected trajectory satisfies safety constraints:

1. Select trajectory with lowest C_total
2. Verify: is trajectory within drivable area at all timesteps?
3. Verify: does trajectory maintain minimum clearance from static obstacles?
4. Verify: does trajectory satisfy kinematic constraints?
5. Verify: does trajectory have non-negative TTC at all timesteps?
6. If any verification fails: fall back to next-best trajectory
7. If no trajectory passes: execute emergency stop profile

The emergency stop profile is a pre-computed, kinematically-feasible braking trajectory that is always available as the ultimate fallback. This is the classical safety backbone that makes the hybrid architecture certifiable.


6. Migration Path: Augmented Frenet to Fully Learned Planning

6.1 Phase 1: Shadow Mode (Months 1–3)

Objective: Validate world model cost function without affecting vehicle behavior.

┌──────────────────────────────────────────────┐
│                 Planning Node                 │
│                                               │
│  Frenet Planner ──→ Classical Selection ──→ Control
│       │                                       │
│       └──→ World Model Scoring (logged only)  │
│             - Compare WM selection vs actual  │
│             - Log disagreements               │
│             - Measure WM inference latency    │
└──────────────────────────────────────────────┘

Metrics to track:

  • Agreement rate: How often does the WM's top-ranked trajectory match the classical selection?
  • Counterfactual safety: When WM disagrees, would the WM's choice have been safer?
  • Latency distribution: P50, P95, P99 of WM inference time
  • OOD frequency: How often does the OOD detector trigger?

6.2 Phase 2: Augmented Mode (Months 3–6)

Objective: World model costs influence trajectory selection with conservative weighting.

C_total = C_classical + 0.1 * confidence * C_world_model

Start with beta = 0.1 and increase based on validation metrics. The world model acts as a "tiebreaker" — it only changes the selected trajectory when classical costs are similar between candidates.

Validation criteria for weight increase:

  • Zero safety-critical disagreements where WM overrode a safer classical choice
  • WM inference latency P99 < planning cycle budget
  • OOD rate < 5% of planning cycles
  • WM-influenced selections show equal or better smoothness in tracking

6.3 Phase 3: Co-Equal Mode (Months 6–12)

Objective: World model and classical costs contribute equally.

C_total = C_classical + 1.0 * confidence * C_world_model

The world model now significantly influences trajectory selection. This requires:

  • Extensive closed-loop simulation validation (millions of scenarios)
  • Real-world A/B testing with safety driver oversight
  • Formal safety analysis of the hybrid cost function

6.4 Phase 4: WM-Primary Mode (Months 12–18)

Objective: World model is the primary scorer; classical costs serve as regularization.

C_total = 0.3 * C_classical + 1.0 * C_world_model

Classical costs prevent the world model from selecting kinematically absurd trajectories but no longer dominate selection.

6.5 Phase 5: Fully Learned Planning (18+ months)

Objective: Replace Frenet sampling with learned trajectory generation.

Options:

  • DiffusionDrive-style: Diffusion model generates candidates from anchored distributions, scored by the world model
  • End-to-end: Single model from perception to trajectory, world model used for training (comma.ai approach)
  • Hybrid generation: Frenet candidates mixed with learned candidates, all scored by unified cost function

Classical planner remains as safety fallback even in the fully-learned phase. It runs in parallel, and if the learned planner's output violates safety constraints, the classical trajectory is substituted.

6.6 Validation Framework

Offline validation (continuous):

  • nuPlan-style replay on recorded scenarios
  • Closed-loop simulation in CARLA/SUMO with airside scenarios
  • Adversarial scenario generation targeting WM failure modes
  • Regression testing: any model update must not degrade on existing scenario suite

Online validation (deployment):

  • A/B testing with matched vehicle pairs (one augmented, one classical)
  • Safety driver intervention rate as primary metric
  • Passenger comfort surveys (for transport vehicles)
  • Scenario-specific tracking: roundabout performance, tight-space navigation, aircraft proximity operations

Metrics hierarchy:

  1. Safety: Collision rate, near-miss rate, intervention rate
  2. Compliance: Rule violations, drivable area departures, speed limit adherence
  3. Efficiency: Route completion time, average speed, unnecessary stops
  4. Comfort: Jerk distribution, lateral acceleration, stop smoothness

7. ROS2 Integration Architecture

7.1 Node Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        ROS2 System Architecture                  │
│                                                                  │
│  ┌──────────┐    ┌──────────────┐    ┌───────────────────────┐  │
│  │Perception│───►│ BEV Encoder  │───►│   World Model Node    │  │
│  │  Node    │    │    Node      │    │  (GPU inference)      │  │
│  └──────────┘    └──────────────┘    │                       │  │
│                         │            │  - Maintains scene    │  │
│                         │            │    state buffer       │  │
│                         │            │  - Runs occupancy     │  │
│                         ▼            │    prediction         │  │
│                  ┌──────────────┐    │  - Provides scoring   │  │
│                  │   Planner    │◄──►│    service            │  │
│                  │    Node      │    └───────────────────────┘  │
│                  │              │                                │
│                  │  - Frenet    │    ┌───────────────────────┐  │
│                  │    sampling  │    │   Safety Monitor      │  │
│                  │  - Classical │───►│      Node             │  │
│                  │    + WM cost │    │                       │  │
│                  │  - Selection │    │  - Constraint check   │  │
│                  └──────┬───────┘    │  - Fallback trigger   │  │
│                         │            │  - Emergency stop     │  │
│                         ▼            └───────────────────────┘  │
│                  ┌──────────────┐                                │
│                  │  Controller  │                                │
│                  │    Node      │                                │
│                  │  (Stanley/   │                                │
│                  │   MPC)       │                                │
│                  └──────────────┘                                │
└─────────────────────────────────────────────────────────────────┘

7.2 Communication Patterns

Option A: Service-based scoring (synchronous)

python
# Planner node calls world model as a service
from world_model_msgs.srv import ScoreTrajectories

class PlannerNode(Node):
    def __init__(self):
        self.wm_client = self.create_client(
            ScoreTrajectories, '/world_model/score')
        self.planning_timer = self.create_timer(0.05, self.planning_callback)  # 20 Hz

    def planning_callback(self):
        candidates = self.generate_frenet_candidates()
        classical_costs = self.compute_classical_costs(candidates)

        # Pre-filter
        top_candidates = self.pre_filter(candidates, classical_costs, top_k=50)

        # Call world model service with timeout
        request = ScoreTrajectories.Request()
        request.trajectories = top_candidates
        request.scene_stamp = self.latest_scene_stamp

        future = self.wm_client.call_async(request)
        rclpy.spin_until_future_complete(self, future, timeout_sec=0.03)

        if future.done() and future.result() is not None:
            wm_costs = future.result().costs
            total_costs = self.combine_costs(classical_costs, wm_costs)
        else:
            # Fallback: use classical costs only
            total_costs = classical_costs

        best_trajectory = top_candidates[np.argmin(total_costs)]
        self.publish_trajectory(best_trajectory)

Option B: Topic-based scoring (asynchronous, recommended)

python
class PlannerNode(Node):
    def __init__(self):
        # Publish candidates for WM scoring
        self.candidates_pub = self.create_publisher(
            TrajectoryBundle, '/planner/candidates', 10)

        # Subscribe to WM scores (arrives next cycle)
        self.wm_scores_sub = self.create_subscription(
            CostArray, '/world_model/scores', self.wm_scores_callback, 10)

        self.latest_wm_scores = None  # Scores from previous cycle
        self.planning_timer = self.create_timer(0.05, self.planning_callback)

    def planning_callback(self):
        candidates = self.generate_frenet_candidates()
        classical_costs = self.compute_classical_costs(candidates)

        # Use WM scores from PREVIOUS cycle (1-cycle latency, never blocks)
        if self.latest_wm_scores is not None:
            total_costs = self.combine_with_stale_wm(
                classical_costs, self.latest_wm_scores, staleness_penalty=0.1)
        else:
            total_costs = classical_costs

        best = candidates[np.argmin(total_costs)]
        self.publish_trajectory(best)

        # Publish current candidates for next cycle's WM scoring
        top_candidates = self.pre_filter(candidates, classical_costs, top_k=50)
        self.candidates_pub.publish(top_candidates)

The asynchronous pattern never blocks the planner on world model inference. WM scores are one cycle stale but this is acceptable at 20 Hz (50 ms latency), as the scene changes minimally between cycles.

7.3 Latency Management

Latency budget breakdown (20 Hz planning, 50 ms cycle):

StageBudgetNotes
BEV encoding10–15 msShared with perception
Frenet sampling + classical cost2–5 msC++, 400+ candidates
Pre-filtering< 1 msSort + threshold
World model inference10–20 msGPU, 50–100 candidates
Cost combination + selection< 1 msCPU
Safety verification< 1 msCPU
Total25–43 msWithin 50 ms budget

Latency mitigation strategies:

  1. Pipelined execution. While the current cycle's trajectory is being tracked, the next cycle's WM inference is already running. The planner publishes candidates at t=0, receives scores at t=50ms, uses them for t=50ms selection.

  2. QoS configuration. Use BEST_EFFORT reliability for WM score topics (dropped messages are acceptable; classical costs provide fallback). Use RELIABLE for trajectory output to controller.

  3. Dedicated executor threading. Assign WM inference callbacks to a separate MultiThreadedExecutor with MutuallyExclusiveCallbackGroup to prevent blocking planner callbacks.

  4. CUDA stream prioritization. Use high-priority CUDA streams for the WM inference node, ensuring it preempts lower-priority GPU tasks (visualization, logging).

7.4 Handling Slow or Crashed World Model Node

Lifecycle node management. The world model node implements ROS2 Managed (Lifecycle) Node states:

Unconfigured → Inactive → Active → [Deactivating → Inactive]
                                    [ErrorProcessing → Unconfigured]

Heartbeat-based watchdog:

python
# In Safety Monitor Node
class WMWatchdog:
    def __init__(self):
        self.last_heartbeat = time.time()
        self.heartbeat_timeout = 0.2  # 200ms (4 missed planning cycles)
        self.wm_healthy = True

    def heartbeat_callback(self, msg):
        self.last_heartbeat = time.time()
        self.wm_healthy = True

    def check_health(self):
        if time.time() - self.last_heartbeat > self.heartbeat_timeout:
            self.wm_healthy = False
            self.get_logger().warn("World Model node unresponsive")
            # Trigger lifecycle transition to Inactive
            # Planner automatically uses classical-only costs

Using the ros-safety/software_watchdogs package, the WM node publishes periodic heartbeats. A SimpleWatchdog grants a lease; if the heartbeat violates the lease period, the watchdog transitions to Inactive state.

Recovery strategy:

  1. Timeout (< 200 ms): Planner uses classical costs only for missed cycles. No user-visible impact.
  2. Sustained failure (> 1 s): Safety monitor triggers lifecycle deactivation of WM node. Logs diagnostic data.
  3. Crash recovery: systemd/launch file restarts the WM node. On restart, node enters Unconfigured state, loads TensorRT engine, warms up with dummy inference, then transitions to Active.
  4. Repeated crashes: Safety monitor disables WM node for the remainder of the mission. Vehicle operates on classical planner only.

GPU error handling:

  • CUDA OOM: Reduce batch size or disable WM until GPU memory is freed
  • TensorRT engine corruption: Reload engine from disk
  • GPU hang: Watchdog timer triggers cudaDeviceReset() and engine reload

7.5 Composition and Zero-Copy IPC

For production deployments, using ROS2 node composition with intra-process communication eliminates serialization overhead:

xml
<!-- launch file composing planner and WM in single process -->
<node_container pkg="rclcpp_components" exec="component_container_mt">
  <composable_node pkg="frenet_planner" plugin="FrenetPlannerNode"/>
  <composable_node pkg="world_model_scorer" plugin="WorldModelScorerNode"/>
</node_container>

With composition, message passing between planner and WM scorer uses shared pointers — zero copy, zero serialization. Research shows this reduces CPU usage by an order of magnitude and provides stable, low latency compared to inter-process communication.

For the GPU inference specifically, the composed node shares the CUDA context, enabling direct tensor exchange without CUDA IPC overhead.


8. Airside-Specific Considerations

8.1 Operational Domain Differences

Airport airside operations differ from public roads in ways that affect the Frenet planner and world model:

FactorPublic RoadAirport Airside
Speed range0–130 km/h0–25 km/h
Actor typesCars, trucks, pedestriansAircraft, GSE, fuel trucks, personnel
Actor dynamicsPredictable lane-followingErratic, priority-based, marshaller-directed
Road structureLanes, intersectionsTaxiways, aprons, service roads, stand areas
Traffic rulesHighway code, signalsATC instructions, marshalling, right-of-way zones
Failure consequenceVehicle damage, injuryAircraft damage ($millions), fuel hazard, injury

8.2 World Model Training Data Requirements

The world model must be trained or fine-tuned on airside-specific data:

  • Aircraft pushback trajectories: Highly non-standard motion patterns
  • GSE interaction patterns: Baggage tugs forming trains, fuel trucks with extended hoses, belt loaders with varying reach
  • Pedestrian behavior: Ground crew crossing aprons, marshalling signals, safety zone awareness
  • Temporal patterns: Gate turnaround sequences, departure/arrival rushes

8.3 Frenet Reference Path Considerations

Standard Frenet planning assumes a single lane-like reference path. Airside environments require:

  • Multiple reference paths: Service roads, taxi lanes, stand approach paths
  • Dynamic reference paths: Routes around parked aircraft change as stands are occupied/vacated
  • Reference path switching: Transition between service road and stand approach requires smooth reference path blending

9. Implementation Roadmap

9.1 Quick Start: Minimum Viable Augmentation

Week 1–2: Integrate a pre-trained BEV occupancy model (e.g., from OccWorld or similar) as a ROS2 node. Output: future occupancy grids.

Week 3–4: Implement the occupancy collision cost function. For each Frenet candidate, rasterize ego footprint onto predicted occupancy grids and sum collision probabilities.

Week 5–6: Add shadow mode logging. Run augmented planner in parallel with classical planner. Log disagreements.

Week 7–8: Analyze shadow mode data. Tune beta weight. Identify failure cases. Build initial OOD detector.

9.2 Reference Implementations

ComponentRecommended Starting Point
Frenet plannerTUM Frenetix (C++/Python, modular cost functions)
BEV encodingOccWorld or SimpleBEV
Occupancy predictionOccWorld flow-based prediction
World modelDreamerV3 (Think2Drive-style) for latent world model
Trajectory scoringBEV world model trajectory evaluation (2504.01941)
ROS2 integrationAutoware planning module architecture
Safety monitoringros-safety/software_watchdogs

9.3 Computational Requirements

ComponentGPU MemoryInference TimeHardware
BEV encoder2–4 GB10–15 msNVIDIA Orin / L4
World model (50 candidates)2–4 GB10–15 msShared GPU
World model (256 candidates)4–8 GB15–20 msShared GPU
TensorRT optimized total4–6 GB8–12 msNVIDIA Orin

For edge deployment on vehicle hardware (NVIDIA Orin), the entire augmented pipeline fits within the 32 GB memory budget and the 50 ms latency budget with room to spare.


10. Key Takeaways

  1. The Frenet planner is the safety backbone. It generates kinematically-feasible, collision-checked candidates using proven polynomial trajectory generation. The world model augments but never replaces this foundation during the transition period.

  2. Batch evaluation makes it practical. Evaluating 50–100 pre-filtered candidates through a world model costs 10–20 ms on modern GPUs. This fits within a 20 Hz planning cycle with margin.

  3. Bounded costs and fallback are non-negotiable. World model scores are clamped, confidence-weighted, and subject to hard safety constraints. If the world model fails, the classical planner operates independently.

  4. The industry is converging on this pattern. Tesla (historical MCTS + neural scoring), NVIDIA GTRS (generate + score), Woven/Toyota (ML planner with classical fallback), and comma.ai (progressive replacement of classical planners) all validate the "classical generation + learned scoring" approach as a practical migration path.

  5. Production deployment requires shadow mode first. Every system that successfully deployed learned planning (Woven, comma.ai) used extensive shadow mode validation with A/B testing before the learned component influenced vehicle behavior.

  6. ROS2 architecture supports this natively. Lifecycle nodes, QoS policies, composition, and watchdog packages provide the infrastructure for safe integration of a potentially-failing GPU inference node into a safety-critical planning pipeline.


References

Foundational Papers

  • Werling, M., Ziegler, J., Kammel, S., Thrun, S. "Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame." ICRA 2010.
  • Werling, M., Kammel, S., Ziegler, J., Groll, L. "Optimal trajectories for time-critical street scenarios using discretized terminal manifolds." IJRR 2012.

World Model Planning

  • Li, Q. et al. "Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Autonomous Driving." ECCV 2024.
  • Xue, Z. et al. "WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving." AAAI 2026.
  • Liao, B. et al. "DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving." CVPR 2025 Highlight.
  • Zheng, Z. et al. "AdaWM: Adaptive World Model based Planning for Autonomous Driving." ICLR 2025.
  • Li, Z. et al. "Generalized Trajectory Scoring for End-to-end Multimodal Planning." NAVSIM v2 Challenge Winner, 2025.
  • Wang, Z. et al. "End-to-End Driving with Online Trajectory Evaluation via BEV World Model." 2025.

Production Systems

  • comma.ai. "Learning to Drive from a World Model." 2025. https://blog.comma.ai/mlsim
  • Woven by Toyota. "Deploying a Machine-Learned Planner for Autonomous Vehicles in San Francisco." 2023.
  • NVIDIA. "Building Autonomous Vehicles That Reason with NVIDIA Alpamayo." 2026.

Hybrid Planning

  • Navarro, I. et al. "Hybrid Imitation-Learning Motion Planner for Urban Driving." 2024.
  • Bey, H. et al. "HyPlan: Hybrid Learning-Assisted Planning Under Uncertainty for Safe Autonomous Driving." 2025.

Frenet Planning Implementations

ROS2 and Safety

Path Tracking

  • Hoffmann, G. et al. "Autonomous automobile trajectory tracking for off-road driving." (Stanley controller)
  • Coulter, R.C. "Implementation of the Pure Pursuit Path Tracking Algorithm." CMU-RI-TR-92-01.

Safety and Validation

  • Caesar, H. et al. "nuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles." 2021.
  • NIO / NVIDIA. "Designing an Optimal AI Inference Pipeline for Autonomous Driving." NVIDIA Developer Blog.

Public research notes collected from public sources.