Edge-Cloud Hybrid Inference Architecture for Fleet-Scale Autonomous GSE
Last Updated: 2026-04-11 Context: Autonomous electric GSE fleet (20-100+ vehicles) operating 16-20 hrs/day on airport airside Platforms: NVIDIA Jetson AGX Orin 64GB (on-vehicle), NVIDIA DGX/HGX or multi-GPU servers (airport edge), cloud GPU clusters (training/analytics) Connectivity: Airport private 5G/CBRS (100 Mbps-1 Gbps DL, 50-200 Mbps UL, 5-20ms RTT)
Key Takeaway: On-vehicle Orin AGX (275 TOPS) is sufficient for the safety-critical perception and planning loop but cannot simultaneously run VLMs, world models, foundation model backbones, cooperative perception fusion, and fleet-level intelligence. Airport private 5G provides the missing link that highway AVs lack: reliable, high-bandwidth, low-latency connectivity to local compute infrastructure. A three-tier architecture (on-vehicle Orin + airport MEC edge server + cloud) enables advanced AI capabilities without replacing vehicle hardware, amortizes expensive GPU infrastructure across the fleet ($2,500/vehicle for a shared edge server vs $2,000-5,000/vehicle for individual Thor upgrades), and degrades gracefully when the network fails because the safety-critical Simplex baseline controller never leaves the vehicle. The critical design constraint is that the vehicle must always be able to operate fully autonomously -- the edge enhances but never gates safety. This architecture uniquely suits airports because the operating environment is geographically bounded, the infrastructure is owned or co-managed, the connectivity is private and controllable, and vehicles return to depots where edge servers are co-located.
Table of Contents
- Introduction and Motivation
- Three-Tier Compute Architecture
- Model Placement Decision Framework
- Bandwidth and Latency Analysis
- Split Inference Patterns
- Airport Edge Server Architecture
- Graceful Degradation When Network Fails
- Security and Privacy
- Cost-Benefit Analysis
- Integration with Existing reference airside AV stack Systems
- Industry Approaches
- Implementation Roadmap
- Key Takeaways
- References
1. Introduction and Motivation
1.1 The On-Vehicle Compute Bottleneck
The NVIDIA Jetson AGX Orin 64GB provides 275 TOPS (sparse INT8), 138 TFLOPS (FP16), and 64 GB unified LPDDR5 memory at 15-60W. For the current reference airside AV stack safety-critical perception and planning stack, this is sufficient with headroom:
Current reference airside AV stack Orin Compute Budget (from model-compression-edge-deployment.md):
┌─────────────────────────────────┬──────────┬──────────┐
│ Component │ Latency │ Memory │
├─────────────────────────────────┼──────────┼──────────┤
│ LiDAR preprocessing │ 5ms │ 0.5 GB │
│ 3D Detection (PointPillars) │ 6.84ms │ 0.8 GB │
│ 3D Segmentation (FlatFormer) │ 25ms │ 1.5 GB │
│ Tracking (Kalman + association) │ 3ms │ 0.2 GB │
│ Occupancy grid (nvblox) │ 10ms │ 1.0 GB │
│ Localization (GTSAM + VGICP) │ 8ms │ 0.5 GB │
│ Planning (Frenet, 420 cands) │ 5ms │ 0.3 GB │
│ CBF safety filter │ 1ms │ 0.1 GB │
│ Safety monitoring (STL) │ 2ms │ 0.1 GB │
├─────────────────────────────────┼──────────┼──────────┤
│ TOTAL │ ~66ms │ 5.0 GB │
│ Remaining budget (100ms cycle) │ ~34ms │ 59.0 GB │
└─────────────────────────────────┴──────────┴──────────┘The problem emerges when attempting to add advanced AI capabilities simultaneously:
Desired Additional Models (cannot all fit on Orin simultaneously):
┌─────────────────────────────────┬──────────┬──────────┬───────────┐
│ Model │ Orin ms │ Memory │ Frequency │
├─────────────────────────────────┼──────────┼──────────┼───────────┤
│ VLM co-pilot (InternVL2-2B) │ 300ms │ 3.0 GB │ 1-2 Hz │
│ World model (3-step prediction) │ 50-100ms │ 2.0 GB │ 5 Hz │
│ Foundation backbone (PTv3) │ 30-40ms │ 2.5 GB │ 10 Hz │
│ Scene flow (DeFlow) │ 26-40ms │ 1.5 GB │ 10 Hz │
│ Multi-task perception head │ 14.8ms │ 1.8 GB │ 10 Hz │
│ Place recognition (MinkLoc3D) │ 15ms │ 0.8 GB │ 1 Hz │
│ Cooperative perception fusion │ 10-20ms │ 1.0 GB │ 10 Hz │
│ Uncertainty quantification │ 7.5ms │ 0.5 GB │ 10 Hz │
│ Neural motion planner │ 15-45ms │ 1.0 GB │ 10 Hz │
│ Thermal fusion │ 6-8ms │ 0.5 GB │ 10 Hz │
├─────────────────────────────────┼──────────┼──────────┼───────────┤
│ TOTAL additional │ ~500ms+ │ 14.6 GB │ (mixed) │
│ Combined with safety stack │ ~570ms+ │ 19.6 GB │ │
└─────────────────────────────────┴──────────┴──────────┴───────────┘Even with CUDA streams enabling concurrent execution, the GPU contention from running all models simultaneously would blow the 100ms cycle budget by 5-6x. The memory fits in 64 GB, but the compute does not fit in time. Running models at reduced frequency (VLM at 1 Hz, world model at 5 Hz) helps but still leaves the perception cycle overloaded during complex scenarios like turnaround operations where all capabilities are needed most.
1.2 Why Airports Are Different from Highways
Highway autonomous vehicles face a fundamental connectivity problem: the vehicle moves through heterogeneous coverage zones at 100+ km/h, traverses rural areas with no infrastructure, and cannot depend on any particular wireless connection. This forces all compute onto the vehicle.
Airport airside operations have the opposite characteristics:
| Property | Highway AV | Airport GSE |
|---|---|---|
| Speed | 30-130 km/h | 5-25 km/h |
| Coverage area | Open road, 100s of km | Bounded apron, <5 km2 |
| Connectivity | Variable cellular, no guarantee | Private 5G/CBRS, owned infrastructure |
| Base station distance | Macro cells, 500m-5km | Small cells, 50-200m |
| RTT achievable | 20-100ms (variable) | 5-20ms (consistent) |
| Bandwidth | 10-100 Mbps (contested) | 100 Mbps-1 Gbps (dedicated) |
| Infrastructure ownership | Carrier-owned, shared | Airport-owned or co-managed |
| Vehicle return to base | Unpredictable | Every shift (depot/charging) |
| Environment predictability | Open world | Semi-structured, mapped |
| Fleet co-location | Geographically dispersed | Co-located on same apron |
The combination of owned connectivity, bounded geography, predictable routes, and co-located fleet creates a uniquely favorable environment for edge-cloud offloading. This is not aspirational -- DFW Airport has deployed private 5G/CBRS at $10M covering 27 square miles (see airport-5g-cbrs.md Section 1), and Changi operates autonomous tractors on private 5G today (Section 2).
1.3 The Edge-Cloud Opportunity
The key insight is that airport 5G connectivity bridges the gap between what Orin can compute locally and what a fleet needs for advanced AI capabilities:
Without Edge-Cloud With Edge-Cloud
(Vehicle Only) (Three-Tier)
┌──────────────┐ ┌──────────────┐
│ Vehicle │ │ Vehicle │
│ Orin 275T │ │ Orin 275T │
│ │ │ │
│ PointPillars │ │ PointPillars │ <- Safety: always on-vehicle
│ Frenet │ │ Frenet │
│ CBF/Simplex │ │ CBF/Simplex │
│ GTSAM │ │ GTSAM │
│ FlatFormer │ │ FlatFormer │
│ │ │ │
│ VLM? No room │ │ + BEV feat. │──┐ 5G (5-20ms RTT)
│ WM? Maybe 5Hz│ │ + UQ heads │ │
│ PTv3? No room│ │ + Flow input │ │
│ │ │ │ │
│ Capability: │ │ Capability: │ │
│ ★★★☆☆ │ │ ★★★★★ │ │
└──────────────┘ └──────────────┘ │
│
┌──────────────┐ │
│ Airport Edge │<─┘
│ 4-8x A100 │
│ │
│ VLM co-pilot │
│ World model │
│ Coop. fusion │
│ Map updates │
│ Fleet percep.│
│ │
└──────┬───────┘
│ Internet (>100ms)
┌──────┴───────┐
│ Cloud │
│ │
│ Training │
│ Auto-labeling│
│ Analytics │
│ Regulatory │
└──────────────┘1.4 Relation to Existing Documents
This document builds on and cross-references:
nvidia-orin-technical.md: Orin AGX hardware capabilities, power modes, and memory subsystemenergy-efficient-inference-24-7.md: Power management and compute scheduling on-vehiclemodel-compression-edge-deployment.md: Compression techniques for on-vehicle deploymentedge-platforms.md: Compute platform survey including Orin and Thorairport-5g-cbrs.md: Airport connectivity infrastructure (DFW, Changi, LAX case studies)v2x-protocols-airside.md: V2X message standards and bandwidth planningcollaborative-fleet-perception.md: V2V cooperative perception algorithmsvlm-scene-understanding.md: VLM co-pilot architecture and deployment considerationslidar-native-world-models.md: LiDAR world model inference requirementsruntime-verification-monitoring.md: Safety monitoring and Simplex architecturetensorrt-deployment-guide.md: TensorRT optimization pipeline
2. Three-Tier Compute Architecture
2.1 Architecture Overview
┌────────────────────────────────────────────────────────────────────────────┐
│ THREE-TIER COMPUTE ARCHITECTURE │
│ │
│ TIER 1: ON-VEHICLE TIER 2: AIRPORT EDGE TIER 3: CLOUD │
│ (Per Vehicle) (Per Airport) (Global) │
│ ───────────────── ───────────────── ─────────────── │
│ Orin AGX 64GB DGX H100 / 4-8x A100 GPU Cluster │
│ 275 TOPS, 60W ~600-8,000 TOPS ~Petascale │
│ 64 GB LPDDR5 40-640 GB HBM TB+ RAM │
│ $1,599/vehicle $50K-400K/airport $10K+/mo │
│ │
│ Latency: <10ms local Latency: 20-100ms E2E Latency: >1s │
│ Availability: 100% Availability: 99.9% Availability: 99% │
│ Bandwidth: N/A (local) Bandwidth: 5G private Bandwidth: Inet │
│ │
│ RESPONSIBILITIES: RESPONSIBILITIES: RESPONSIBILITIES: │
│ • Safety-critical percep. • VLM co-pilot inference • Model training │
│ • Object detection (PP) • World model prediction • Auto-labeling │
│ • Planning (Frenet) • Foundation model heads • Data processing │
│ • CBF safety filter • Cooperative perception • Fleet analytics │
│ • Simplex BC controller • Map change detection • Regulatory logs │
│ • GTSAM localization • Fleet-level fusion • SW OTA packaging │
│ • STL runtime monitors • Place recognition DB • Sim/validation │
│ • Emergency stop • Advanced UQ analysis • Federated agg. │
│ • Basic segmentation • Neural planner (verify) • Incident review │
│ │
│ FAILURE MODE: FAILURE MODE: FAILURE MODE: │
│ N/A (always available) → Vehicle goes autonomous → Edge handles all │
│ If vehicle fails → stop Network loss = transparent short-term tasks │
│ for safety-critical path │
└────────────────────────────────────────────────────────────────────────────┘2.2 Tier 1: On-Vehicle (NVIDIA Orin AGX)
The on-vehicle tier runs everything required for safe autonomous operation with zero network dependency. This is the irreducible compute core that cannot be offloaded.
Design principle: The vehicle must always be capable of completing its current mission if the network disappears.
| Component | Latency | Model | Why On-Vehicle |
|---|---|---|---|
| 3D detection | 6.84ms | PointPillars INT8 | Safety: primary obstacle detection |
| Segmentation | 16-25ms | FlatFormer INT8 | Safety: ground/obstacle classification |
| Tracking | 3ms | SimpleTrack/Kalman | Safety: temporal consistency |
| Occupancy grid | 2-5ms | GPU raycasting (nvblox) | Safety: collision avoidance |
| Localization | 8ms | GTSAM + VGICP | Safety: position knowledge |
| Planning | 5ms | Frenet (420 candidates) | Safety: trajectory generation |
| CBF filter | <1ms | OSQP solver | Safety: formal collision avoidance |
| Simplex BC | <0.5ms | Emergency fallback | Safety: system failover |
| STL monitors | <2ms | 20 airside specs | Safety: runtime verification |
| Safety MCU | Continuous | STM32H725 (MISRA C) | Safety: HW speed limiter, watchdog |
Total on-vehicle safety budget: ~50-55ms within 100ms cycle (10 Hz)
The remaining ~45-50ms is available for on-vehicle portions of split inference (backbone execution, feature extraction, local uncertainty estimation) and for receiving/integrating edge results from the previous cycle.
Memory allocation (on-vehicle):
Safety stack (always resident): ~5.0 GB
Feature extraction backbone: ~1.5 GB
V2X communication buffers: ~0.5 GB
Edge result caching (last known good): ~1.0 GB
ROS node overhead: ~2.0 GB
TensorRT execution contexts: ~1.5 GB
OS + drivers: ~4.0 GB
────────────────────────────────────────────────
Total resident: ~15.5 GB
Available for optional models: ~48.5 GBThe 48.5 GB headroom enables selective on-vehicle execution of enhanced models during network degradation, or caching of edge-provided priors (neural map prior, fleet perception state, world model predictions) that remain valid for seconds even after network loss.
2.3 Tier 2: Airport Edge Server (MEC)
The airport edge server is a Multi-access Edge Computing (MEC) node co-located with the airport's 5G infrastructure. Physically, it sits in an equipment room at the airport -- often in the same facility as the 5G core network, within 1-2 network hops of the radio access network.
Purpose: Run compute-intensive AI models that enhance safety and capability but are not required for baseline safe operation.
| Function | Model | Edge Latency | Value Added |
|---|---|---|---|
| VLM co-pilot | InternVL2-7B or Qwen-VL-7B | 40-80ms | Scene reasoning, anomaly explanation |
| World model prediction | LiDAR-native (UnO variant) | 30-60ms | 3-step future occupancy prediction |
| Foundation perception | PTv3 backbone + multi-task heads | 15-30ms | Higher accuracy seg/det/prediction |
| Cooperative fusion | Where2comm aggregation | 10-20ms | Fleet-level perception merge |
| Map change detection | RTMap incremental update | 20-40ms | Real-time HD map maintenance |
| Neural map prior | NMP inference | 30-50ms | Enhanced mapping in adverse conditions |
| Advanced UQ | Deep ensemble (M=5) | 50-100ms | Gold-standard uncertainty estimates |
| Place recognition DB | FAISS + MinkLoc3D verification | 5-15ms | Multi-session map alignment |
| Fleet state estimation | Graph-based fleet optimizer | 10-30ms | Cross-vehicle consistency |
| Auto-labeling (near-RT) | SAM + CLIP on selected frames | 200-500ms | Trigger-based edge labeling |
The edge server operates as a shared resource. Every vehicle in the fleet submits requests and receives enhanced results. The server must handle concurrent requests from all vehicles, with prioritization based on the vehicle's current operational context (turnaround stand operations get priority over taxiway transit).
2.4 Tier 3: Cloud
The cloud tier handles tasks where latency tolerance exceeds 1 second and compute requirements exceed what a single airport edge server can provide.
| Function | Latency Tolerance | Compute Need | Frequency |
|---|---|---|---|
| Model training (full) | Hours-days | Multi-node GPU | Weekly-monthly |
| Federated learning aggregation | Minutes | CPU/GPU hybrid | Per FL round (hours) |
| Auto-labeling (batch) | Hours | Multi-GPU | Daily |
| Simulation / validation | Hours | GPU cluster | Per release |
| Fleet analytics dashboard | Seconds | CPU | Continuous |
| Regulatory log archival | Minutes | Storage-heavy | Continuous |
| OTA update packaging | Minutes | CPU/storage | Per release |
| Cross-airport model transfer | Hours | Multi-GPU | Per new airport |
| Incident replay / analysis | Minutes | GPU | Per incident |
| Causal SCM inference (batch) | Minutes | CPU/GPU | Per shift |
Cloud provider considerations for aviation:
- Data sovereignty: some airports (especially EU, Middle East) require data to remain in-country
- Aviation cybersecurity compliance (see
cybersecurity-airside-av.md) - Hybrid cloud: airport-owned compute for sensitive data + public cloud for training
2.5 Data Flow Architecture
TIER 3: CLOUD
┌────────────────────────────┐
│ Training Analytics │
│ AutoLabel Simulation │
│ OTA Mgmt Regulatory │
└──────────┬─────────────────┘
│ Internet
│ (100-500ms RTT)
│
TIER 2: AIRPORT EDGE
┌──────────┴─────────────────┐
│ VLM WorldModel CoopFuse │
│ MapMgr PlaceRecDB UQ │
│ │
│ NVIDIA Triton Server │
│ Request Queue + Priority │
│ Result Cache + Broadcast │
└───┬────┬────┬────┬──────────┘
│ │ │ │ Private 5G
│ │ │ │ (5-20ms RTT)
┌────────────┘ │ │ └────────────┐
│ │ │ │
┌─────┴──────┐ ┌─────┴──┐ ┌──────┴───┐ ┌──┴───────┐
│ Vehicle 1 │ │ Veh. 2 │ │ Veh. 3 │ │ Veh. N │
│ Orin AGX │ │ Orin │ │ Orin │ │ Orin │
│ │ │ │ │ │ │ │
│ Safety PP │ │ Safety │ │ Safety │ │ Safety │
│ Frenet+CBF │ │ stack │ │ stack │ │ stack │
│ GTSAM │ │ │ │ │ │ │
└────────────┘ └────────┘ └──────────┘ └──────────┘
Data flows:
UPLINK (vehicle → edge):
• Compressed LiDAR features: 50-200 KB @ 10 Hz
• Camera frame (selected): 100-300 KB @ 2-5 Hz
• BEV feature map: 50-100 KB @ 10 Hz
• Detection results: 5-10 KB @ 10 Hz
• Health/telemetry: 1-2 KB @ 1 Hz
DOWNLINK (edge → vehicle):
• VLM scene description: 1-5 KB @ 1-2 Hz
• World model predictions: 20-50 KB @ 5 Hz
• Enhanced detections: 10-20 KB @ 10 Hz
• Cooperative perception: 50-100 KB @ 10 Hz
• Map updates: 10-50 KB @ 1 Hz
• Fleet state: 5-10 KB @ 1 Hz3. Model Placement Decision Framework
3.1 Latency Budget Taxonomy
Every model in the reference airside AV stack falls into one of four latency categories. The category determines which tier(s) can host the model:
LATENCY CATEGORIES:
┌─────────────────────────────────────────────────────────────────────┐
│ CATEGORY A: HARD REAL-TIME (<10ms) │
│ Tier: ON-VEHICLE ONLY │
│ Rationale: Any network hop adds 5-20ms minimum. Cannot risk │
│ network jitter or failure. Controls actuators directly. │
│ Examples: CBF filter, Simplex BC, STL monitors, safety MCU │
├─────────────────────────────────────────────────────────────────────┤
│ CATEGORY B: SOFT REAL-TIME (10-100ms) │
│ Tier: ON-VEHICLE PRIMARY, EDGE ENHANCEMENT │
│ Rationale: On-vehicle model provides baseline. Edge provides │
│ higher-quality result that is used if it arrives in time. │
│ Examples: Detection, segmentation, tracking, planning │
├─────────────────────────────────────────────────────────────────────┤
│ CATEGORY C: NEAR REAL-TIME (100ms-1s) │
│ Tier: EDGE PRIMARY, ON-VEHICLE CACHE │
│ Rationale: These models run at 1-5 Hz. Edge compute is sufficient. │
│ Vehicle caches last-known-good result for network loss periods. │
│ Examples: VLM co-pilot, world model, cooperative fusion, map update│
├─────────────────────────────────────────────────────────────────────┤
│ CATEGORY D: OFFLINE (>1s) │
│ Tier: CLOUD ONLY │
│ Rationale: Not time-critical. Compute requirements exceed edge. │
│ Examples: Training, auto-labeling, simulation, regulatory reports │
└─────────────────────────────────────────────────────────────────────┘3.2 Decision Tree
For each model, walk this decision tree to determine optimal placement:
┌──────────────────┐
│ Does model output │
│ directly control │
│ actuators? │
└────────┬─────────┘
Yes │ No
┌────────┘ └────────┐
▼ ▼
┌────────────┐ ┌──────────────┐
│ TIER 1 ONLY │ │ Latency req. │
│ (on-vehicle)│ │ < 100ms? │
└────────────┘ └──────┬───────┘
Yes │ No
┌──────────┘ └──────────┐
▼ ▼
┌────────────┐ ┌────────────┐
│ Orin can run│ │ Latency req.│
│ within 50ms?│ │ < 1s? │
└──────┬─────┘ └──────┬─────┘
Yes │ No Yes │ No
┌────────┘ └───────┐ ┌────────┘ └────┐
▼ ▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ TIER 1 │ │ TIER 1+2 │ │ TIER 3 │
│ (on-vehicle│ │ Split: │ │ (cloud) │
│ sufficient)│ │ backbone T1 │ └────────────┘
└────────────┘ │ head on T2 │
└────────────┘3.3 Complete Model Placement Table
| Model | Category | Tier | On-Vehicle Role | Edge Role | Cloud Role | Fallback |
|---|---|---|---|---|---|---|
| PointPillars (detection) | A | T1 only | Full inference (6.84ms) | N/A | N/A | Is the fallback |
| CBF-QP safety filter | A | T1 only | Full solve (<1ms) | N/A | N/A | Is the fallback |
| Simplex BC (Frenet) | A | T1 only | Full planning (5ms) | N/A | N/A | Is the fallback |
| STL runtime monitors | A | T1 only | 20 specs (<2ms) | N/A | N/A | Is the fallback |
| Safety MCU (STM32) | A | T1 only | HW watchdog | N/A | N/A | Hardware independent |
| GTSAM localization | A | T1 only | Full filter (8ms) | N/A | N/A | Dead reckoning |
| FlatFormer (seg.) | B | T1+T2 | INT8 on-vehicle (16-25ms) | PTv3 head (higher acc.) | N/A | On-vehicle result |
| CenterPoint (det.) | B | T1+T2 | INT8 on-vehicle (12ms) | Foundation head | N/A | On-vehicle result |
| Scene flow (DeFlow) | B | T1+T2 | On-vehicle (26-40ms) | Enhanced resolution | N/A | On-vehicle result |
| Multi-task perception | B | T1+T2 | Shared backbone (14.8ms) | Additional heads | N/A | On-vehicle heads |
| Tracking (SimpleTrack) | B | T1 | On-vehicle (3ms) | Fleet-consistent IDs | N/A | On-vehicle tracks |
| Uncertainty (evidential) | B | T1+T2 | Single-pass (7.5ms) | Deep ensemble (M=5) | N/A | Single-pass UQ |
| Neural planner | B | T1+T2 | On-vehicle + CBF (16.3ms) | Verify/correct | N/A | Frenet (Simplex) |
| Thermal fusion | B | T1 | YOLO-Thermal (6-8ms) | Enhanced fusion | N/A | LiDAR-only |
| VLM co-pilot | C | T2 | Cache last result | InternVL2-7B (40-80ms) | InternVL2-26B | No VLM (safe without) |
| World model | C | T2 | Cache predictions | LiDAR-native (30-60ms) | Training only | Occupancy flow only |
| Cooperative fusion | C | T2 | Local features only | Where2comm agg. | N/A | Single-vehicle percep. |
| Map change detection | C | T2 | Report observations | RTMap fusion | Archival | Static HD map |
| Neural map prior | C | T2 | Cache prior | NMP inference | Training | Standard map |
| Place recognition | C | T1+T2 | ScanContext CPU (5ms) | FAISS DB + MinkLoc3D | N/A | Odometry only |
| Fleet state estimation | C | T2 | N/A | Graph optimizer | Dashboard | No fleet state |
| Auto-labeling (RT) | C | T2 | Select trigger frames | SAM + CLIP | Batch processing | Manual labeling |
| Model training | D | T3 | N/A | N/A | Full training | N/A |
| Batch auto-labeling | D | T3 | N/A | N/A | SAM + CLIP pipeline | N/A |
| Simulation | D | T3 | N/A | N/A | CARLA/NVIDIA Isaac | N/A |
| Federated aggregation | D | T3 | Local LoRA training | N/A | Global aggregation | N/A |
| OTA updates | D | T3 | Receive + apply | Stage locally | Package + sign | N/A |
| Regulatory reporting | D | T3 | Log locally | Forward | Archive + query | Local logs only |
3.4 Context-Adaptive Placement
Model placement is not static. The decision framework adapts based on the vehicle's current operational context, following the context-aware switching strategy from energy-efficient-inference-24-7.md:
CONTEXT → PLACEMENT ADJUSTMENT:
Taxiway transit (low complexity):
T1: Safety stack only (PointPillars, Frenet, CBF)
T2: VLM at 0.5 Hz, world model off, cooperative perception at 2 Hz
Power: 15-30W on Orin
Apron transit (medium complexity):
T1: Safety stack + segmentation + tracking
T2: VLM at 1 Hz, world model at 2 Hz, cooperative at 5 Hz
Power: 30-45W on Orin
Stand approach (high complexity):
T1: Full safety stack + feature backbone
T2: VLM at 2 Hz, world model at 5 Hz, cooperative at 10 Hz, map updates
Power: 45-60W on Orin
Runway crossing (maximum alert):
T1: Full safety stack, no optional models
T2: All models at maximum rate, fleet-level fusion priority
Power: MAXN on Orin, all available edge GPU allocated4. Bandwidth and Latency Analysis
4.1 Airport 5G Network Characteristics
Based on deployed airport 5G networks (DFW, Changi, LAX -- see airport-5g-cbrs.md):
| Parameter | Private 5G (CBRS n48) | Private 5G (mmWave n260) | WiFi 6E |
|---|---|---|---|
| Downlink peak | 300 Mbps-1 Gbps | 1-4 Gbps | 1-2 Gbps |
| Uplink peak | 50-200 Mbps | 200-500 Mbps | 500 Mbps-1 Gbps |
| RTT (UE to MEC) | 5-15ms | 2-8ms | 3-10ms |
| RTT (UE to internet) | 20-50ms | 15-30ms | 10-30ms |
| Reliability | 99.999% (URLLC) | 99.9% (coverage limited) | 99% |
| Range per cell | 200-500m | 50-150m | 30-100m |
| Handover time | <20ms | <30ms | 50-200ms |
| Vehicle density per cell | 20-50 | 10-20 | 10-30 |
| Frequency | 3.55-3.7 GHz | 24-40 GHz | 6 GHz |
Recommended for airside GSE: Private 5G CBRS (sub-6 GHz) -- best balance of range, reliability, and latency. mmWave provides higher bandwidth but smaller cells and is susceptible to rain fade and jet blast turbulence. WiFi 6E lacks URLLC guarantees.
4.2 Per-Vehicle Data Payloads
The choice of what data to send from vehicle to edge server determines bandwidth consumption. The options range from raw sensor data (maximum bandwidth, maximum flexibility) to compressed features (minimum bandwidth, requires on-vehicle preprocessing).
| Data Type | Raw Size | Compressed | Rate | Per-Vehicle BW | Direction |
|---|---|---|---|---|---|
| LiDAR point cloud (4-8 sensors) | 4-12 MB/frame | 200-500 KB (Draco) | 10 Hz | 16-40 Mbps | UL |
| Camera frame (1-2 cameras) | 6-12 MB/frame | 100-300 KB (JPEG95) | 10-30 Hz | 8-72 Mbps | UL |
| BEV feature map | 2-8 MB/frame | 50-200 KB (FP16+LZ4) | 10 Hz | 4-16 Mbps | UL |
| Pillar features | 1-3 MB/frame | 30-80 KB (sparse) | 10 Hz | 2.4-6.4 Mbps | UL |
| Detection results | 10-50 KB | 5-10 KB | 10 Hz | 0.4-0.8 Mbps | UL |
| Vehicle telemetry | 2-5 KB | 1-2 KB | 10 Hz | 0.08-0.16 Mbps | UL |
| Health diagnostics | 1-2 KB | 1-2 KB | 1 Hz | 0.008-0.016 Mbps | UL |
| Edge detection results | 10-50 KB | 5-20 KB | 10 Hz | 0.4-1.6 Mbps | DL |
| VLM scene description | 1-10 KB | 1-5 KB | 1-2 Hz | 0.008-0.08 Mbps | DL |
| World model predictions | 50-200 KB | 20-50 KB | 5 Hz | 0.8-2.0 Mbps | DL |
| Cooperative percep. result | 100-500 KB | 50-100 KB | 10 Hz | 4.0-8.0 Mbps | DL |
| Map updates | 10-100 KB | 10-50 KB | 1 Hz | 0.08-0.4 Mbps | DL |
| Fleet state broadcast | 5-20 KB | 5-10 KB | 1 Hz | 0.04-0.08 Mbps | DL |
4.3 Per-Vehicle Bandwidth Profiles
Different offloading strategies consume different bandwidth:
Profile A: Feature offload (RECOMMENDED)
UL: BEV features (4-16 Mbps) + detections (0.8 Mbps) + telemetry (0.16 Mbps)
= 5-17 Mbps uplink per vehicle
DL: Edge results (1.6 Mbps) + VLM (0.08 Mbps) + world model (2.0 Mbps) +
coop percep (8.0 Mbps) + map (0.4 Mbps) + fleet (0.08 Mbps)
= 5-12 Mbps downlink per vehicle
Profile B: Raw sensor offload (for auto-labeling, not real-time)
UL: LiDAR (16-40 Mbps) + camera (8-72 Mbps)
= 24-112 Mbps uplink per vehicle
DL: Minimal return data = 1-5 Mbps downlink per vehicle
Profile C: Minimal (degraded network)
UL: Detection results only (0.8 Mbps) + telemetry (0.16 Mbps)
= ~1 Mbps uplink per vehicle
DL: Fleet state broadcast only (0.08 Mbps)
= ~0.1 Mbps downlink per vehicle4.4 Fleet-Scale Bandwidth Planning
| Fleet Size | Profile A (UL/DL) | Profile B (UL/DL) | Profile C (UL/DL) |
|---|---|---|---|
| 20 vehicles | 100-340 / 100-240 Mbps | 480-2,240 / 20-100 Mbps | 20 / 2 Mbps |
| 50 vehicles | 250-850 / 250-600 Mbps | 1,200-5,600 / 50-250 Mbps | 50 / 5 Mbps |
| 100 vehicles | 500-1,700 / 500-1,200 Mbps | Infeasible | 100 / 10 Mbps |
| 200 vehicles | 1,000-3,400 / 1,000-2,400 Mbps | Infeasible | 200 / 20 Mbps |
Key observation: Profile A (feature offload) supports up to ~50 vehicles on a single 5G sector. Beyond 50, either additional sectors or bandwidth optimization is needed. Profile B (raw offload) is only viable for up to 5-10 vehicles simultaneously and should be reserved for selected trigger-frame uploads, not continuous streaming.
4.5 Congestion Management
When multiple vehicles operate near the same stand during turnaround (the highest-demand scenario), network congestion must be managed:
# QoS Priority Levels for 5G Network Slicing
# 5QI (5G QoS Identifier) mapping for autonomous GSE
QOS_PROFILES = {
# 5QI 82: Delay-critical GBR (guaranteed bit rate)
"safety_critical": {
"5qi": 82,
"priority": 1, # Highest
"guaranteed_br": "1 Mbps", # Detections + V2X safety
"max_latency": "10ms",
"packet_error": "1e-6",
"slice": "URLLC",
"contents": ["V2X_safety", "emergency_stop", "detection_results"]
},
# 5QI 7: Non-GBR, real-time
"edge_inference": {
"5qi": 7,
"priority": 2,
"guaranteed_br": "N/A",
"max_latency": "50ms",
"packet_error": "1e-3",
"slice": "eMBB",
"contents": ["BEV_features", "edge_results", "cooperative_percep"]
},
# 5QI 8: Non-GBR, best effort with priority
"enhanced_percep": {
"5qi": 8,
"priority": 3,
"guaranteed_br": "N/A",
"max_latency": "100ms",
"packet_error": "1e-2",
"slice": "eMBB",
"contents": ["VLM_results", "world_model", "map_updates"]
},
# 5QI 9: Non-GBR, best effort
"analytics": {
"5qi": 9,
"priority": 4, # Lowest
"guaranteed_br": "N/A",
"max_latency": "1000ms",
"packet_error": "1e-2",
"slice": "eMBB",
"contents": ["raw_sensor_upload", "logging", "diagnostics"]
}
}Network slicing partitions the 5G network into logically separate networks with guaranteed resources. The safety-critical slice (URLLC) is provisioned with guaranteed bit rate and never contends with analytics traffic. This is a standard 5G SA (standalone) feature available in private deployments.
4.6 End-to-End Latency Breakdown
For the recommended Profile A (feature offload) path:
End-to-end latency for edge-enhanced inference:
Step 1: On-vehicle backbone (feature extraction)
LiDAR preprocess: 5ms
Pillar/voxel encoding: 3ms
Backbone forward pass: 8ms
Feature compression (LZ4): 1ms
───────────────────────────────────────
Subtotal: 17ms
Step 2: Network transport (vehicle → edge)
5G UL scheduling + encoding: 2-5ms
Air interface: 1-3ms
Backhaul to MEC: 1-3ms
───────────────────────────────────────
Subtotal: 4-11ms
Step 3: Edge server inference
Triton request deserialization: 0.5ms
Feature decompression: 0.5ms
Model inference (varies): 10-50ms
Result serialization: 0.5ms
───────────────────────────────────────
Subtotal: 11.5-51.5ms
Step 4: Network transport (edge → vehicle)
Backhaul from MEC: 1-3ms
Air interface: 1-3ms
5G DL scheduling: 1-2ms
───────────────────────────────────────
Subtotal: 3-8ms
Step 5: On-vehicle result integration
Deserialization: 0.5ms
Confidence-weighted fusion: 1ms
───────────────────────────────────────
Subtotal: 1.5ms
═══════════════════════════════════════════
TOTAL END-TO-END: 37-89ms
Typical: ~55msThis 55ms typical latency means edge results arrive during the same 100ms perception cycle or the next one. The vehicle never waits -- it uses its on-vehicle results immediately and integrates edge results when they arrive, which is typically within 1 cycle.
5. Split Inference Patterns
5.1 Pattern A: Full Offload
VEHICLE NETWORK EDGE SERVER
┌────────────┐ ┌────────┐ ┌────────────────┐
│ Raw LiDAR │────────────>│ 5G UL │────────────>│ Full model │
│ (200-500KB │ 16-40 Mbps │ 5-15ms │ │ inference │
│ per frame)│ └────────┘ │ (PointPillars │
│ │ │ + PTv3 + VLM │
│ Wait for │<────────────┐────────┐<────────────│ + world model)│
│ results... │ 1-5 Mbps │ 5G DL │ │ │
│ │ │ 5-15ms │ │ 50-200ms │
└────────────┘ └────────┘ └────────────────┘
Total latency: 60-230ms
Bandwidth: 16-40 Mbps UL per vehicleUse case: Not for real-time safety perception. Viable only for:
- Batch auto-labeling of recorded data (upload during depot charging)
- Post-hoc analytics on full sensor streams
- Shadow-mode evaluation of new models against on-vehicle results
Advantages: Simplest vehicle-side code. Edge runs any model without vehicle changes. Disadvantages: Highest bandwidth. Highest latency. Vehicle cannot act until results return. Single point of failure if network drops.
Verdict: Use for data pipeline and analytics only. Never for safety-relevant perception.
5.2 Pattern B: Feature Offload (Split Backbone-Head)
VEHICLE NETWORK EDGE SERVER
┌────────────┐ ┌────────┐ ┌────────────────┐
│ LiDAR │ │ │ │ │
│ preprocess │ │ │ │ │
│ ↓ │ │ │ │ │
│ Backbone │ │ │ │ │
│ (pillars/ │ │ │ │ │
│ voxels) │ │ │ │ │
│ ↓ │ │ │ │ │
│ BEV feats │────────────>│ 5G UL │────────────>│ Foundation head│
│ (50-200KB) │ 4-16 Mbps │ 5-15ms │ │ (PTv3 decoder) │
│ │ └────────┘ │ ↓ │
│ Meanwhile: │ │ VLM head │
│ Safety det.│ │ ↓ │
│ (PP 6.84ms)│ │ World model │
│ Frenet plan│<────────────┐────────┐<────────────│ ↓ │
│ CBF filter │ 5-12 Mbps │ 5G DL │ │ Enhanced dets │
│ │ │ 5-15ms │ │ + VLM output │
│ Merge edge │ └────────┘ │ + predictions │
│ results │ │ │
└────────────┘ └────────────────┘
Total latency: 35-90ms
Bandwidth: 4-16 Mbps UL per vehicleThis is the recommended primary pattern for reference airside AV stack.
The vehicle runs its lightweight backbone (pillar/voxel encoding) and sends the resulting BEV feature map to the edge. The edge runs multiple heads on those features simultaneously: a high-accuracy detection head (PTv3 decoder), a VLM head for scene reasoning, and a world model head for future prediction.
Meanwhile, the vehicle runs its safety stack (PointPillars detection, Frenet planning, CBF filter) on the same raw data. The vehicle acts on its own results immediately. When edge results arrive (typically within the same or next 100ms cycle), they are fused with on-vehicle results using confidence-weighted merging.
# Feature offload: vehicle-side ROS node (simplified)
import rospy
import numpy as np
import lz4.frame
from std_msgs.msg import Header
from sensor_msgs.msg import PointCloud2
from edge_msgs.msg import BEVFeatureMap, EdgeInferenceResult
class FeatureOffloadNode:
"""Extracts BEV features on-vehicle, sends to edge, fuses results."""
def __init__(self):
rospy.init_node('feature_offload')
# On-vehicle backbone (TensorRT engine)
self.backbone = TensorRTEngine('/models/pillar_backbone_int8.engine')
# Publisher to edge (via rosbridge or custom UDP transport)
self.feat_pub = rospy.Publisher(
'/edge/bev_features', BEVFeatureMap, queue_size=1
)
# Subscriber for edge results
self.edge_sub = rospy.Subscriber(
'/edge/inference_result', EdgeInferenceResult,
self.edge_result_cb, queue_size=1
)
# Local safety detections (from on-vehicle PointPillars)
self.local_det_sub = rospy.Subscriber(
'/perception/detections_3d', Detection3DArray,
self.local_det_cb, queue_size=1
)
# Fused output
self.fused_pub = rospy.Publisher(
'/perception/fused_detections', Detection3DArray, queue_size=1
)
# State
self.last_edge_result = None
self.edge_result_age = float('inf')
self.MAX_EDGE_AGE = 0.2 # 200ms - discard stale edge results
def lidar_cb(self, msg):
"""Process LiDAR, extract features, send to edge."""
t0 = rospy.Time.now()
# 1. Voxelize point cloud (on-vehicle, ~3ms)
pillars = self.voxelize(msg)
# 2. Run backbone (on-vehicle, ~8ms)
bev_features = self.backbone.infer(pillars)
# 3. Compress and send to edge (~1ms)
compressed = lz4.frame.compress(
bev_features.astype(np.float16).tobytes()
)
feat_msg = BEVFeatureMap()
feat_msg.header = msg.header
feat_msg.data = compressed
feat_msg.shape = list(bev_features.shape)
feat_msg.vehicle_id = self.vehicle_id
feat_msg.send_time = t0
self.feat_pub.publish(feat_msg)
def edge_result_cb(self, msg):
"""Receive enhanced detections from edge server."""
latency = (rospy.Time.now() - msg.send_time).to_sec()
self.last_edge_result = msg
self.edge_result_age = 0.0
rospy.logdebug(f"Edge result received, RTT={latency*1000:.1f}ms")
def local_det_cb(self, local_dets):
"""Fuse local safety detections with edge-enhanced detections."""
fused = Detection3DArray()
fused.header = local_dets.header
# Always include local detections (safety baseline)
for det in local_dets.detections:
det.source = "on_vehicle"
fused.detections.append(det)
# Merge edge results if fresh
if (self.last_edge_result is not None and
self.edge_result_age < self.MAX_EDGE_AGE):
for edet in self.last_edge_result.detections:
match = self.find_matching_detection(
edet, local_dets.detections
)
if match:
# Confidence-weighted merge of matched detections
merged = self.merge_detections(match, edet)
# Replace local with merged
self.replace_detection(fused, match, merged)
else:
# Edge-only detection (new object not seen locally)
edet.source = "edge_only"
edet.confidence *= 0.8 # Discount edge-only slightly
fused.detections.append(edet)
self.fused_pub.publish(fused)5.3 Pattern C: Ensemble Augmentation
VEHICLE EDGE SERVER
┌──────────────────────┐ ┌──────────────────┐
│ LiDAR → PointPillars │ (6.84ms, safety baseline) │ │
│ ↓ │ │ │
│ Detections_local │ │ │
│ ↓ │ │ │
│ [BEV features]───────│──── 5G (50-200KB) ───────>│ PTv3 decoder │
│ │ │ (15-30ms) │
│ Frenet planning │ │ ↓ │
│ CBF filter │ │ Detections_edge │
│ ↓ │ │ ↓ │
│ Initial trajectory │ │ Enhanced_dets────│──> back to vehicle
│ ↓ │ │ │
│ FUSION: merge local │<── 5G (10-20KB) ──────────│ │
│ + edge detections │ │ │
│ ↓ │ │ │
│ Updated trajectory │ │ │
│ (if edge improves) │ │ │
└──────────────────────┘ └──────────────────┘Key design: The vehicle runs its full safety stack and generates an initial trajectory. The edge runs a more accurate model on the same features. If the edge results arrive before the next planning cycle, the vehicle fuses them and may update its trajectory. If they arrive late or not at all, the vehicle has already acted safely on its own results.
When edge disagrees with vehicle:
Disagreement Resolution Matrix:
Edge says CLEAR Edge says OBSTACLE
Vehicle says CLEAR Both agree: proceed CONSERVATIVE: treat as
obstacle (edge may see
more; reduce speed)
Vehicle says OBSTACLE Keep obstacle Both agree: obstacle
(vehicle is safety- (highest confidence wins
critical authority) for position/size)The rule is simple: any detection is real until proven otherwise. If either the vehicle or edge reports an obstacle, the planner treats it as present. False positives cause unnecessary stops; false negatives cause collisions. The asymmetry favors safety.
5.4 Pattern D: Speculative Execution
Timeline within one 100ms cycle:
t=0ms Vehicle receives LiDAR scan
t=5ms Preprocessing complete
t=12ms PointPillars detection complete → passed to planner
t=17ms Frenet planning complete → trajectory generated
t=18ms CBF filter applied → safe trajectory committed to actuators
SIMULTANEOUSLY at t=5ms: BEV features sent to edge
t=55ms Edge result arrives (typical)
t=56ms Compare edge vs vehicle detections
t=57ms If edge found NEW obstacle not in vehicle detections:
→ Insert into next planning cycle (t=100ms)
→ If critical (obstacle on current trajectory): trigger re-plan
If edge refines existing detections (better position/size):
→ Update tracking state for next cycle
If edge agrees with vehicle:
→ Confidence boost, no action needed
t=100ms Next cycle begins with updated stateThis pattern treats edge inference as speculative look-ahead. The vehicle never delays its own safety loop waiting for edge results. Edge corrections arrive asynchronously and are incorporated at the next opportunity.
Implementation as a ROS callback-based pipeline:
class SpeculativeExecutionNode:
"""
Asynchronous edge inference integration.
Vehicle acts immediately on local results.
Edge corrections applied to NEXT cycle.
"""
def __init__(self):
self.correction_buffer = []
self.correction_lock = threading.Lock()
def planning_cycle(self, local_detections, ego_state):
"""Main 10 Hz planning cycle (runs on-vehicle)."""
# 1. Apply any pending edge corrections from previous cycle
with self.correction_lock:
corrected_dets = self.apply_corrections(
local_detections, self.correction_buffer
)
self.correction_buffer.clear()
# 2. Plan on corrected detections
trajectory = self.frenet_planner.plan(corrected_dets, ego_state)
# 3. Safety filter (always on-vehicle, never delayed)
safe_trajectory = self.cbf_filter.apply(trajectory, corrected_dets)
return safe_trajectory
def edge_correction_cb(self, edge_result):
"""Async callback when edge results arrive (any time)."""
latency = self.get_age(edge_result)
if latency > 0.15: # 150ms - too stale
return
with self.correction_lock:
# Check if edge found something vehicle missed
new_objects = self.find_novel_detections(edge_result)
refined_objects = self.find_refined_detections(edge_result)
for obj in new_objects:
if obj.confidence > 0.5: # Edge-only requires higher conf
self.correction_buffer.append(
Correction(type='ADD', detection=obj)
)
for old, new in refined_objects:
self.correction_buffer.append(
Correction(type='REFINE', old=old, new=new)
)
# CRITICAL: Check if correction is urgent
# (new obstacle on current trajectory)
for obj in new_objects:
if self.is_on_current_trajectory(obj):
rospy.logwarn("Edge found obstacle on trajectory, "
"triggering immediate re-plan")
self.trigger_immediate_replan()5.5 Pattern E: Cooperative Fleet Fusion
Vehicle A Vehicle B Vehicle C
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Backbone │ │ Backbone │ │ Backbone │
│ ↓ │ │ ↓ │ │ ↓ │
│ BEV feat │ │ BEV feat │ │ BEV feat │
│ + pose │ │ + pose │ │ + pose │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ 5G UL │ 5G UL │ 5G UL
│ (50-200 KB) │ (50-200 KB) │ (50-200 KB)
└──────────┬───────────┴───────────┬──────────┘
│ │
┌─────┴───────────────────────┴─────┐
│ EDGE SERVER │
│ │
│ 1. Receive all vehicle features │
│ 2. Ego-motion compensate │
│ (transform to common frame) │
│ 3. Where2comm attention fusion │
│ 4. Run detection/segmentation head │
│ 5. Generate fleet-level perception │
│ │
│ Processing: 20-40ms per cycle │
└─────┬───────────┬───────────┬──────┘
│ │ │
┌──────────┘ │ └──────────┐
│ 5G DL │ 5G DL │ 5G DL
│ (50-100 KB) │ (50-100 KB) │ (50-100 KB)
┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐
│ Vehicle A │ │ Vehicle B │ │ Vehicle C │
│ │ │ │ │ │
│ Fuse with│ │ Fuse with│ │ Fuse with│
│ local │ │ local │ │ local │
│ dets │ │ dets │ │ dets │
└──────────┘ └──────────┘ └──────────┘This extends the V2V cooperative perception architecture (see collaborative-fleet-perception.md) by centralizing the fusion on the edge server rather than requiring each vehicle to fuse with every other vehicle peer-to-peer. The edge server is the Where2comm aggregation point.
Advantages over pure V2V:
- Each vehicle sends features once (to edge) instead of N-1 times (to every other vehicle)
- Edge has more compute for attention-based fusion than any single Orin
- Edge can maintain global consistency (no conflicting pairwise fusions)
- Edge can incorporate infrastructure sensors (CCTV, SMR) in the same fusion
Bandwidth efficiency: With Where2comm's learned attention masks, each vehicle sends only the informative regions of its BEV feature map -- typically 50-200 KB per frame instead of the full 2-8 MB. The edge fuses all vehicles and returns a fleet-level perception result that each vehicle merges with its local detections. Per the Where2comm results, this achieves 95.3% of full raw-data sharing AP at 1/64 bandwidth (see finding 127 in CLAUDE.md).
5.6 Pattern Selection Guide
| Scenario | Recommended Pattern | Why |
|---|---|---|
| Normal operations (stand approach) | B + C + E | Feature offload + ensemble + cooperative fusion |
| Taxiway transit (low complexity) | B + E (reduced rate) | Feature offload at 5 Hz, cooperative at 2 Hz |
| Turnaround (maximum complexity) | B + C + E (maximum rate) | All patterns active, all models at max rate |
| Degraded network (>50ms RTT) | C only (reduced rate) | Ensemble augmentation tolerates latency |
| Minimal network (<1 Mbps) | None (vehicle-only) | Fall back to on-vehicle safety stack |
| Data collection (depot/charging) | A | Full raw offload for auto-labeling |
| New airport (shadow mode) | A + B + C | All patterns for maximum data collection |
6. Airport Edge Server Architecture
6.1 Hardware Sizing by Fleet Scale
The edge server must handle concurrent inference requests from all active vehicles. The critical sizing parameter is not total TOPS but rather the number of concurrent model instances that can run without queuing.
| Fleet Size | Active Vehicles (peak) | GPU Requirement | Recommended Config | Estimated Cost |
|---|---|---|---|---|
| 10-20 | 15 | 2-4x A100 80GB | NVIDIA DGX Station A100 (4x A100) | $35,000-50,000 |
| 20-50 | 35 | 4-8x A100 80GB | NVIDIA DGX A100 (8x A100) | $100,000-150,000 |
| 50-100 | 70 | 2x DGX A100 or DGX H100 | DGX H100 (8x H100 80GB) | $200,000-400,000 |
| 100-200 | 140 | DGX SuperPOD (partial) | 2-4x DGX H100 | $500,000-1,000,000 |
Detailed sizing for 20-vehicle fleet (reference airside AV stack near-term):
Workload Analysis (20 vehicles, peak):
────────────────────────────────────────────────────────────────
Model Per-Vehicle Frequency GPU ms/req Total GPU-ms/s
────────────────────────────────────────────────────────────────
PTv3 detection head 1 req 10 Hz 20ms 4,000
VLM (InternVL2-7B) 1 req 2 Hz 50ms 2,000
World model 1 req 5 Hz 40ms 4,000
Cooperative fusion 1 req 10 Hz 15ms 3,000
Map update 1 req 1 Hz 30ms 600
Place recognition 1 req 0.5 Hz 15ms 150
UQ ensemble (M=5) 1 req 2 Hz 80ms 3,200
────────────────────────────────────────────────────────────────
TOTAL GPU-ms/s: 16,950
Available GPU-ms/s per A100: 1,000 (1 GPU, 1 second, accounting
for overhead and memory transfers ≈ 85% utilization)
GPUs needed: 16,950 / 850 ≈ 20 GPU-seconds/second
= 20 concurrent GPU-slots needed at peak
= 4x A100 at ~5x concurrent streams via TensorRT
(4 GPUs * 5 streams ≈ 20 effective slots)
Memory: 4x 80GB = 320 GB total HBM
- Model weights shared across streams: ~25 GB
- Per-stream activations: ~2 GB * 20 streams = 40 GB
- KV cache for VLM: ~8 GB
- Workspace: ~40 GB
Total: ~113 GB (fits in 320 GB with headroom)
RECOMMENDATION: 4x A100 80GB (DGX Station or custom build)6.2 Software Stack
┌─────────────────────────────────────────────────────────┐
│ EDGE SERVER SOFTWARE STACK │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ FLEET MANAGEMENT LAYER │ │
│ │ - Vehicle registry and health monitoring │ │
│ │ - Request prioritization (turnaround > transit) │ │
│ │ - Load balancing across GPUs │ │
│ │ - Result caching and broadcast │ │
│ └───────────────────────────┬───────────────────────┘ │
│ │ │
│ ┌───────────────────────────┴───────────────────────┐ │
│ │ NVIDIA TRITON INFERENCE SERVER │ │
│ │ - Model repository (TensorRT engines) │ │
│ │ - Dynamic batching (across vehicles) │ │
│ │ - Model versioning and A/B testing │ │
│ │ - GPU scheduling and resource isolation │ │
│ │ - Health checks and auto-restart │ │
│ │ - Prometheus metrics export │ │
│ └───────────────────────────┬───────────────────────┘ │
│ │ │
│ ┌───────────────────────────┴───────────────────────┐ │
│ │ GPU RUNTIME │ │
│ │ - TensorRT 10.x engines (FP16/INT8) │ │
│ │ - CUDA 12.x + cuDNN 9.x │ │
│ │ - MPS (Multi-Process Service) for isolation │ │
│ │ - CUDA streams for concurrent execution │ │
│ └───────────────────────────┬───────────────────────┘ │
│ │ │
│ ┌───────────────────────────┴───────────────────────┐ │
│ │ INFRASTRUCTURE │ │
│ │ - Kubernetes (K3s for single-node, K8s for multi) │ │
│ │ - Container runtime: NVIDIA Container Toolkit │ │
│ │ - Storage: NVMe SSD for model repo + result cache │ │
│ │ - Networking: SR-IOV for direct NIC-to-container │ │
│ │ - Monitoring: Prometheus + Grafana │ │
│ │ - Logging: Loki for structured inference logs │ │
│ └───────────────────────────────────────────────────┘ │
│ │
│ OS: Ubuntu 22.04 LTS + NVIDIA Driver 550+ │
│ Hardware: DGX Station A100 or custom 4U server │
└─────────────────────────────────────────────────────────┘NVIDIA Triton Inference Server is the centerpiece. It provides:
Dynamic batching: When multiple vehicles submit BEV features within a short window, Triton batches them into a single GPU call. This is the primary efficiency mechanism -- batching 4 vehicles' features into one PTv3 forward pass costs ~30ms instead of 4x20ms = 80ms sequential.
Model ensemble pipelines: Chain backbone → detection head → VLM head as a single request, minimizing data movement.
Concurrent model execution: Different models run on different CUDA streams. While the VLM processes vehicle A's request, the detection head processes vehicle B's features.
Model versioning: Deploy new model versions alongside existing ones. Route a subset of vehicles (canary) to the new version while monitoring accuracy.
6.3 Dynamic Batching Configuration
# Triton model configuration for edge server
# File: model_repository/ptv3_detection_head/config.pbtxt
name: "ptv3_detection_head"
platform: "tensorrt_plan"
max_batch_size: 8 # Batch up to 8 vehicles' features
# Dynamic batching: wait up to 5ms to collect a batch
dynamic_batching {
preferred_batch_size: [4, 8]
max_queue_delay_microseconds: 5000
priority_levels: 3
default_priority_level: 2
# Priority 1: runway crossing vehicles
# Priority 2: stand approach vehicles
# Priority 3: taxiway transit vehicles
}
# Input: BEV feature map from vehicle backbone
input [
{
name: "bev_features"
data_type: TYPE_FP16
dims: [256, 200, 200] # C x H x W BEV grid
},
{
name: "ego_pose"
data_type: TYPE_FP32
dims: [4, 4] # 4x4 transformation matrix
}
]
# Output: 3D detections
output [
{
name: "boxes_3d"
data_type: TYPE_FP32
dims: [-1, 9] # N x (x,y,z,w,l,h,yaw,vel_x,vel_y)
},
{
name: "scores"
data_type: TYPE_FP32
dims: [-1]
},
{
name: "labels"
data_type: TYPE_INT32
dims: [-1]
}
]
instance_group [
{
count: 2 # 2 instances on GPU 0
kind: KIND_GPU
gpus: [0]
},
{
count: 2 # 2 instances on GPU 1
kind: KIND_GPU
gpus: [1]
}
]6.4 Redundancy and Failover
The edge server is an enhancement, not a dependency. Its failure mode is simple: vehicles revert to on-vehicle-only operation. However, for availability, redundancy is still important because the edge provides significant safety enhancements (cooperative perception, VLM anomaly detection).
PRIMARY EDGE SERVER SECONDARY EDGE SERVER
┌─────────────────────┐ ┌─────────────────────┐
│ DGX Station A100 │ │ DGX Station A100 │
│ 4x A100 80GB │◄────────►│ 4x A100 80GB │
│ │ heartbeat │ │
│ Active │ (1 Hz) │ Hot standby │
│ - Serving requests │ │ - Models loaded │
│ - State replicated │ │ - Ready to serve │
│ to standby │ │ - No GPU active │
└─────────────────────┘ └─────────────────────┘
Failover scenarios:
1. Primary healthy: secondary idle, models pre-loaded in GPU memory
2. Primary degraded (GPU failure): secondary promotes to active (<5s)
3. Primary down: secondary active, vehicles experience 1-5s edge gap
(during gap: vehicle operates fully autonomously — transparent)
4. Both down: all vehicles fully autonomous, alert sent to ops centerFor cost-sensitive deployments, the secondary can be a smaller server (2x A100) that handles reduced models (cooperative perception + detection only, no VLM or world model) during failover. This halves the redundancy cost while maintaining the most safety-relevant edge functions.
6.5 Physical Deployment
The edge server is physically co-located with the airport's 5G MEC infrastructure to minimize network hops:
Typical Airport Network Topology:
Airport Operations Center
│
│ Fiber (1-10 Gbps)
│
┌─────────┴──────────┐
│ Airport Data Center│
│ (Terminal Building) │
│ │
│ ┌───────────────┐ │
│ │ 5G Core (UPF) │ │ UPF = User Plane Function
│ └───────┬───────┘ │ (where data plane terminates)
│ │ │
│ ┌───────┴───────┐ │
│ │ EDGE SERVER │ │ ← Co-located with UPF
│ │ (DGX Station) │ │ 1 hop from radio
│ └───────────────┘ │ <2ms additional latency
│ │
└──────────┬──────────┘
│ Fiber
┌──────────┴──────────┐
│ 5G Radio Units │
│ (gNodeB / small │
│ cells on apron) │
└──────────┬──────────┘
│ 5G NR (air interface)
┌──────────┴──────────┐
│ Vehicle 5G Modem │
│ (Cradlepoint/Sierra │
│ Wireless) │
└─────────────────────┘
Network path: Vehicle → 5G air (1-3ms) → gNodeB → Fiber → UPF (1ms) → Edge Server
Total: 2-4ms one-way network latencyEnvironmental requirements:
- Power: 2-6 kW per DGX Station (dedicated 30A/240V circuit)
- Cooling: 2-6 kW heat dissipation (airport data centers typically have this)
- Physical security: locked rack in controlled-access data center
- UPS: minimum 30 minutes backup (allows vehicles to transition to autonomous mode)
- Network: dual 25/100 GbE uplinks to 5G UPF
7. Graceful Degradation When Network Fails
7.1 Network Failure Is Expected
Network failure on an airport apron is not exceptional -- it is a routine operating condition:
| Failure Mode | Frequency | Duration | Cause |
|---|---|---|---|
| Coverage gap | Daily | 5-30s | Vehicle enters shadowed area behind hangar |
| Handover hiccup | Hourly | 50-200ms | Vehicle transitions between 5G cells |
| Congestion spike | Peak hours | 1-30s | Many vehicles near same stand during turnaround |
| Weather interference | Seasonal | Minutes-hours | Heavy rain attenuates mmWave; sub-6 more resistant |
| Planned maintenance | Monthly | 30-120 min | 5G equipment firmware updates |
| Equipment failure | Rare | Hours | gNodeB or switch failure |
| Construction | During works | Days | Terminal construction alters RF environment |
| RF interference | Unpredictable | Seconds-minutes | Radar, ILS, other airport RF sources |
The system must handle all of these without any discontinuity in safe vehicle operation.
7.2 Degradation Levels
NETWORK STATE MACHINE:
┌────────────────────────────────────────────────────────────────────┐
│ │
│ FULL DEGRADED MINIMAL OFFLINE │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │ RTT │ │ RTT │ │ RTT │ │ No │ │
│ │<20ms │──────>│20-100│─────────>│>100ms│────────>│ conn.│ │
│ │>10Mb │ degr. │ ms │ further │<1Mbps│ total │ │ │
│ │ │ │1-10 │ degrad. │ │ loss │ │ │
│ │ │<──────│ Mbps │<─────────│ │<────────│ │ │
│ └──────┘ recov └──────┘ recovery └──────┘ recovery└──────┘ │
│ │
│ Recovery requires sustained good conditions for 5+ seconds │
│ (asymmetric transition: fast degradation, slow recovery) │
└────────────────────────────────────────────────────────────────────┘7.3 Capability Matrix by Network State
| Capability | FULL | DEGRADED | MINIMAL | OFFLINE |
|---|---|---|---|---|
| On-vehicle safety stack | Full (10 Hz) | Full (10 Hz) | Full (10 Hz) | Full (10 Hz) |
| Edge detection enhancement | Full (10 Hz) | Reduced (5 Hz) | Off | Off |
| VLM co-pilot | Full (2 Hz) | Reduced (0.5 Hz) | Off | Off |
| World model prediction | Full (5 Hz) | Reduced (2 Hz) | Off | Off |
| Cooperative perception | Full (10 Hz) | Reduced (5 Hz) | Safety msgs only | Off |
| Map updates | Full (1 Hz) | Reduced (0.2 Hz) | Off | Off |
| Fleet state | Full (1 Hz) | Reduced (0.2 Hz) | Off | Off |
| Auto-labeling | Background | Off | Off | Off |
| V2X safety messages | Full | Full | Full (prioritized) | Off (PC5 sidelink) |
| Max speed | 25 km/h | 20 km/h | 15 km/h | 10 km/h |
| Safety margins | Standard | +20% | +50% | +100% (doubled) |
| Teleop available | Yes | Yes | Voice-only | No |
7.4 The Simplex Analogy
The edge-cloud architecture mirrors the Simplex fault-tolerance pattern already used in the reference airside AV stack's planning stack:
Simplex Pattern for Planning:
Advanced Controller (AC): Neural planner (better performance)
Baseline Controller (BC): Frenet planner (proven safe)
Decision Module (DM): CBF safety filter decides which to use
Edge-Cloud Simplex Pattern:
Advanced Controller (AC): Edge-enhanced perception (better accuracy)
Baseline Controller (BC): On-vehicle perception (proven safe)
Decision Module (DM): Freshness + confidence check decides which to use
┌─────────────────────┐
│ DECISION MODULE │
│ │
│ if edge_result is │
│ fresh (<200ms) │
│ AND consistent │
│ with local: │
│ → use FUSED │
│ else: │
│ → use LOCAL ONLY │
│ │
└──────┬──────────────┘
│
┌────────────┴────────────┐
│ │
┌────────┴────────┐ ┌────────┴────────┐
│ AC: Edge-Enhanced│ │ BC: On-Vehicle │
│ Perception │ │ Perception │
│ │ │ │
│ • PTv3 accuracy │ │ • PointPillars │
│ • VLM reasoning │ │ • FlatFormer │
│ • Fleet fusion │ │ • nvblox │
│ • World model │ │ • Basic tracking │
│ │ │ │
│ Performance: ★★★★★│ │ Performance: ★★★ │
│ Availability: 95% │ │ Availability: 100%│
└──────────────────┘ └──────────────────┘7.5 Seamless Transition Implementation
The transition between network states must be invisible to the planning and control layers. They always receive detections from the same topic -- the fusion node handles the switching internally.
class GracefulDegradationNode:
"""
Monitors network health and adjusts edge utilization.
Provides seamless perception output regardless of network state.
"""
# Network state thresholds
FULL_RTT = 0.020 # <20ms
DEGRADED_RTT = 0.100 # <100ms
MINIMAL_BW = 1_000_000 # 1 Mbps
# Recovery hysteresis
RECOVERY_HOLD = 5.0 # 5 seconds of good before upgrading state
def __init__(self):
self.state = NetworkState.FULL
self.last_edge_result_time = rospy.Time.now()
self.rtt_ewma = 0.010 # Exponential weighted moving average
self.rtt_alpha = 0.3 # Smoothing factor
self.recovery_timer = 0.0
# Cache of last known good edge results
self.cached_edge_detections = None
self.cached_cooperative_map = None
self.cached_vlm_description = None
self.cached_world_prediction = None
def update_network_state(self, measured_rtt, measured_bw):
"""Called on each edge response (or timeout)."""
self.rtt_ewma = (self.rtt_alpha * measured_rtt +
(1 - self.rtt_alpha) * self.rtt_ewma)
new_state = self._classify_state(self.rtt_ewma, measured_bw)
# Fast degradation, slow recovery
if new_state.value > self.state.value:
# Degrading: switch immediately
self.state = new_state
self.recovery_timer = 0.0
self._adjust_edge_requests()
rospy.logwarn(f"Network degraded to {self.state.name}")
elif new_state.value < self.state.value:
# Recovering: require sustained good conditions
self.recovery_timer += self.dt
if self.recovery_timer >= self.RECOVERY_HOLD:
self.state = new_state
self.recovery_timer = 0.0
self._adjust_edge_requests()
rospy.loginfo(f"Network recovered to {self.state.name}")
def get_fused_perception(self, local_detections):
"""
Returns best available perception regardless of network state.
The planner/controller never knows (or cares) about network state.
"""
if self.state == NetworkState.FULL:
return self._fuse_local_and_edge(
local_detections,
self.cached_edge_detections,
max_age=0.2
)
elif self.state == NetworkState.DEGRADED:
return self._fuse_local_and_edge(
local_detections,
self.cached_edge_detections,
max_age=0.5 # Accept slightly staler edge results
)
else:
# MINIMAL or OFFLINE: local only
return local_detections
def _adjust_edge_requests(self):
"""Reduce edge request rate based on network state."""
rates = {
NetworkState.FULL: {'det': 10, 'vlm': 2, 'wm': 5, 'coop': 10},
NetworkState.DEGRADED: {'det': 5, 'vlm': 0.5, 'wm': 2, 'coop': 5},
NetworkState.MINIMAL: {'det': 0, 'vlm': 0, 'wm': 0, 'coop': 0},
NetworkState.OFFLINE: {'det': 0, 'vlm': 0, 'wm': 0, 'coop': 0},
}
for model, rate in rates[self.state].items():
self.set_edge_request_rate(model, rate)7.6 Edge Result Staleness Management
When the network degrades, the vehicle may still have recent edge results that remain valid. The validity window depends on the type of result and the vehicle's speed:
| Edge Result Type | Freshness Window | Rationale |
|---|---|---|
| Detection enhancement | 100-200ms | Objects move; stale detections may mislocate |
| VLM scene description | 2-5s | Scene semantics change slowly |
| World model prediction | 500ms-2s | Predictions are inherently future-looking |
| Cooperative perception | 100-200ms | Other vehicles move |
| Map updates | 30-60s | Map changes are slow |
| Fleet state | 5-10s | Fleet-level coordination is coarse-grained |
| Neural map prior | Minutes-hours | Map priors are quasi-static |
At 10 km/h (typical apron speed), a vehicle moves 2.8 m/s. A 200ms-stale detection is off by ~0.56m -- within the safety margin for most obstacles but problematic for precision docking. The staleness window should be proportional to the required positional accuracy.
8. Security and Privacy
8.1 Threat Model
Data transmitted over the airport wireless network faces several threat categories:
THREAT MODEL FOR EDGE-CLOUD INFERENCE:
┌──────────────────────────────────────────────────────────────┐
│ THREAT 1: Eavesdropping │
│ Attacker intercepts sensor data or model outputs │
│ Risk: Exposure of airport layout, operational patterns │
│ Mitigation: TLS 1.3 for all vehicle-edge communication │
├──────────────────────────────────────────────────────────────┤
│ THREAT 2: Man-in-the-middle (injection) │
│ Attacker injects false edge results (phantom detections) │
│ Risk: Vehicle stops unnecessarily or ignores real obstacles │
│ Mitigation: Mutual TLS (mTLS) + signed inference results │
│ + consistency check with on-vehicle detections │
├──────────────────────────────────────────────────────────────┤
│ THREAT 3: Denial of service │
│ Attacker floods 5G network, preventing edge communication │
│ Risk: Loss of edge enhancement (vehicles go autonomous) │
│ Mitigation: 5G network slicing + graceful degradation │
│ (vehicles are safe without edge — by design) │
├──────────────────────────────────────────────────────────────┤
│ THREAT 4: Model extraction │
│ Attacker probes edge server to extract model weights │
│ Risk: IP theft, competitive intelligence │
│ Mitigation: Triton access control, API-only access │
│ (no direct GPU access), rate limiting │
├──────────────────────────────────────────────────────────────┤
│ THREAT 5: Data poisoning via vehicle │
│ Compromised vehicle sends corrupted features to edge │
│ Risk: Degrades cooperative perception for all vehicles │
│ Mitigation: Per-vehicle authentication, anomaly detection │
│ on input features, Byzantine-robust fusion │
│ (see federated-learning-fleet-scale.md FLTrust) │
├──────────────────────────────────────────────────────────────┤
│ THREAT 6: Multi-tenant data leakage │
│ Edge server serves multiple airlines/handlers at same airport │
│ Risk: Cross-tenant data exposure │
│ Mitigation: Kubernetes namespace isolation, separate model │
│ instances per tenant, no shared caches │
└──────────────────────────────────────────────────────────────┘8.2 Encryption and Authentication
All vehicle-to-edge communication runs over mTLS (mutual Transport Layer Security):
Vehicle Edge Server
┌──────────┐ ┌──────────┐
│ Vehicle │ │ Server │
│ cert (X.509) │ cert (X.509)
│ signed by │ │ signed by │
│ fleet CA │ │ fleet CA │
│ │ ──── TLS 1.3 ──── │ │
│ Verifies │ ClientHello + │ Verifies │
│ server │ CertificateVerify │ vehicle │
│ identity │ │ identity │
└──────────┘ └──────────┘
Certificate management:
- Fleet CA: reference airside AV stack-operated, issues certs to vehicles and edge servers
- Vehicle cert: Stored in Orin's secure element (if available) or TPM
- Rotation: Certificates rotate every 90 days via OTA
- Revocation: Compromised vehicle cert revoked immediately; vehicle
continues operating autonomously but cannot access edge8.3 Model IP Protection
Model weights on the edge server represent significant IP. Protection strategies:
No weight download: Vehicles send features and receive results. They never receive model weights. The edge exposes an inference API only.
Triton access control: Each vehicle has a unique API key. Requests are rate-limited (no more than 10 Hz per model per vehicle). Anomalous request patterns trigger alerts.
Obfuscated outputs: Edge returns final detections (boxes, scores, labels) rather than intermediate representations. An attacker cannot reverse-engineer the model architecture from bounding boxes.
Physical security: Edge server is in a locked airport data center with badge access, CCTV, and tamper detection.
8.4 Multi-Tenant Isolation
Large airports serve multiple ground handling companies. If the edge server is shared:
Edge Server Multi-Tenancy:
┌────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Namespace: │ │ Namespace: │ │
│ │ airside_av │ │ handler-b │ │
│ │ │ │ │ │
│ │ Triton (GPU0,│ │ Triton (GPU2,│ │
│ │ GPU1)│ │ GPU3)│ │
│ │ │ │ │ │
│ │ Models: v2.3 │ │ Models: v1.8 │ │
│ │ Vehicles: │ │ Vehicles: │ │
│ │ third-generation tug-001 │ │ TRACTOR-X │ │
│ │ third-generation tug-002 │ │ TRACTOR-Y │ │
│ │ ... │ │ ... │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ GPU isolation: MIG (Multi-Instance GPU) or │
│ time-sharing with strict scheduling │
│ Network isolation: Calico network policies │
│ Storage isolation: Separate PVCs │
│ No shared state between namespaces │
└────────────────────────────────────────────────┘8.5 Regulatory Compliance
| Regulation | Requirement | Edge-Cloud Implication |
|---|---|---|
| GDPR (EU) | Data minimization, purpose limitation | Feature maps (not raw camera) minimize personal data; edge processes and discards |
| EU AI Act | High-risk AI system transparency | Audit logs of every edge inference decision |
| Airport security (ICAO Annex 17) | Screening of all equipment entering restricted area | Edge server undergoes airport security assessment |
| NIS2 Directive (EU) | Cybersecurity for critical infrastructure | Edge server treated as critical infrastructure element |
| Data sovereignty (varies) | Data may not leave country/airport | Edge server physically at airport; cloud in same jurisdiction |
| FAA CertAlert 24-02 | Autonomous vehicle safety requirements | Edge enhancement documented in safety case as non-safety-critical |
9. Cost-Benefit Analysis
9.1 Edge Server vs. Vehicle Upgrade
The fundamental economic question: is it cheaper to add a shared edge server or upgrade every vehicle's on-board compute?
Option A: Edge Server (shared infrastructure)
| Component | 20-Vehicle Fleet | 50-Vehicle Fleet | 100-Vehicle Fleet |
|---|---|---|---|
| Edge hardware (4x A100) | $50,000 | $120,000 | $300,000 |
| Redundancy (secondary) | $30,000 | $70,000 | $200,000 |
| 5G vehicle modems (per vehicle) | $10,000 ($500 ea.) | $25,000 ($500 ea.) | $50,000 ($500 ea.) |
| Installation + integration | $15,000 | $25,000 | $40,000 |
| Software development | $40,000 | $40,000 | $40,000 |
| Annual maintenance (HW + SW) | $15,000 | $25,000 | $50,000 |
| Annual power + cooling | $3,000 | $7,000 | $15,000 |
| Total Year 1 | $163,000 | $312,000 | $695,000 |
| Per vehicle Year 1 | $8,150 | $6,240 | $6,950 |
| Annual ongoing per vehicle | $900 | $640 | $650 |
Option B: On-Vehicle Upgrade (per-vehicle compute)
| Component | Per Vehicle | 20 Vehicles | 50 Vehicles | 100 Vehicles |
|---|---|---|---|---|
| Orin → Thor module | $2,000-5,000 | $40K-100K | $100K-250K | $200K-500K |
| Carrier board redesign | $500-1,500 | $10K-30K | $25K-75K | $50K-150K |
| Thermal redesign | $300-800 | $6K-16K | $15K-40K | $30K-80K |
| Power supply upgrade (60W→130W) | $200-500 | $4K-10K | $10K-25K | $20K-50K |
| Integration + testing per vehicle | $1,000-2,000 | $20K-40K | $50K-100K | $100K-200K |
| Software port (Ampere→Blackwell) | $20,000 (one-time) | $20K | $20K | $20K |
| Total | $4K-10K | $100K-216K | $220K-510K | $420K-1M |
| Per vehicle | $4K-10K | $5K-10.8K | $4.4K-10.2K | $4.2K-10K |
Key comparison:
COST PER VEHICLE TO ACHIEVE EQUIVALENT CAPABILITY:
Edge Server Vehicle Upgrade (Thor)
────────── ──────────────────────
20 vehicles: $8,150 Y1 $5,000-10,800
50 vehicles: $6,240 Y1 $4,400-10,200
100 vehicles: $6,950 Y1 $4,200-10,000
Year 2+: $900/vehicle/yr $0 (hardware paid)
5-YEAR TCO PER VEHICLE:
20 vehicles: $11,750 $5,000-10,800
50 vehicles: $8,800 $4,400-10,200
100 vehicles: $9,550 $4,200-10,000At first glance, per-vehicle upgrade appears cheaper over 5 years. However, the edge approach offers advantages not captured in raw cost:
9.2 Non-Monetary Advantages of Edge
| Factor | Edge Server | Vehicle Upgrade |
|---|---|---|
| Time to deploy | 4-6 weeks (one server) | 6-12 months (retrofit fleet) |
| Model update speed | Minutes (update edge server) | Weeks (OTA to fleet) |
| Model size limit | 80 GB HBM (A100) per model | 64 GB unified (Orin/Thor shared) |
| Fleet-wide consistency | All vehicles get same model version instantly | Staggered rollout, version fragmentation |
| Cooperative perception | Natural (all features at edge) | Requires peer-to-peer V2V (complex) |
| Experimentation velocity | Run A/B tests on edge instantly | Each experiment requires OTA cycle |
| Hardware refresh | Replace 1 server, fleet benefits | Replace every vehicle's compute |
| Power on vehicle | No additional vehicle power draw | +70W per vehicle (Thor vs Orin) |
| Vehicle complexity | No vehicle HW changes | Carrier board, thermal, power redesign |
| VLM capability | 7B+ parameter models feasible | 2B max on Orin, ~4B on Thor |
9.3 The Hybrid Answer
The optimal strategy is not either/or but both:
Today (Orin fleet): Deploy edge server to unlock VLM, world model, cooperative perception capabilities that Orin cannot run locally. Cost: $50-150K for the first airport.
Future (Thor upgrade cycle): When vehicles naturally reach hardware refresh (3-5 year cycle), upgrade to Thor. Thor handles more on-vehicle but edge still provides fleet fusion, VLM 7B+, and faster experimentation.
Multi-airport scale: Edge infrastructure at each airport, cloud for cross-airport training and federation. Edge cost amortizes as fleet grows.
9.4 ROI Calculation
SCENARIO: 20-vehicle fleet, first airport
COSTS (Year 1):
Edge server (4x A100 + redundancy): $80,000
5G modems (20 vehicles): $10,000
Integration + software: $55,000
Annual maintenance: $18,000
─────────────────────────────────────────────────────
Total Year 1: $163,000
BENEFITS (Year 1):
VLM anomaly detection:
Prevents 2-3 incidents/year at $50K avg. $100,000-150,000
Cooperative perception:
+18-22% detection AP → fewer safety events
Prevents 1-2 near-misses/year $50,000-100,000
World model prediction:
Smoother planning → 5-10% efficiency gain
20 vehicles * 16hr/day * 5% more missions $75,000-150,000
Faster model iteration:
Edge A/B testing saves 2-4 weeks/quarter
Engineering time saved $40,000-80,000
─────────────────────────────────────────────────────
Total Year 1 benefit: $265,000-480,000
NET YEAR 1: $102,000-317,000 positive
PAYBACK PERIOD: 4-8 months9.5 Multi-Airport Edge Economics
| Airport # | Edge Server Cost | Incremental Vehicles | Per-Vehicle Edge Cost (Y1) |
|---|---|---|---|
| 1st (hub) | $163,000 | 20 | $8,150 |
| 2nd (hub) | $130,000 (copy playbook) | 30 | $4,333 |
| 3rd (regional) | $80,000 (smaller server) | 10 | $8,000 |
| 4th (regional) | $80,000 | 10 | $8,000 |
| 5th+ | $60,000 (standardized) | 10-20 | $3,000-6,000 |
Edge software and model management is developed once and deployed to each airport. Subsequent airports benefit from the first airport's development investment.
10. Integration with Existing reference airside AV stack Systems
10.1 ROS Noetic Integration Architecture
The reference airside AV stack runs ROS Noetic. The edge server does not run ROS -- it runs NVIDIA Triton. The bridge between them uses a lightweight transport layer.
ON-VEHICLE (ROS Noetic) EDGE SERVER (Non-ROS)
┌─────────────────────────┐ ┌─────────────────────┐
│ │ │ │
│ /lidar/merged │ │ NVIDIA Triton │
│ ↓ │ │ Inference Server │
│ /perception/backbone │ │ │
│ ↓ │ │ gRPC endpoint: │
│ /edge/bev_features │ gRPC/HTTP2 │ edge-server:8001 │
│ ↓ │ ─────────────> │ │
│ edge_client_node │ │ Process request │
│ (ROS node that sends │ │ Return results │
│ features via gRPC) │ │ │
│ │ gRPC/HTTP2 │ │
│ edge_result_node │ <───────────── │ │
│ (ROS node that receives │ │ │
│ and publishes results) │ │ │
│ ↓ │ │ │
│ /edge/detections_3d │ │ │
│ /edge/vlm_description │ │ │
│ /edge/world_prediction │ │ │
│ ↓ │ │ │
│ /perception/fused_dets │ │ │
│ │ │ │
└─────────────────────────┘ └─────────────────────┘Why not rosbridge_server? Rosbridge (WebSocket-based) adds 2-5ms serialization overhead and does not support binary data efficiently. For high-throughput, low-latency communication, a direct gRPC client is superior. The Triton client library (tritonclient) provides both gRPC and HTTP interfaces with native numpy/tensor support.
10.2 Edge Client ROS Node
#!/usr/bin/env python3
"""
edge_inference_client.py - ROS node for edge server communication.
Subscribes to on-vehicle BEV features, sends to edge Triton server,
publishes results back to ROS topics.
Dependencies: tritonclient[grpc], rospy, numpy, lz4
"""
import rospy
import numpy as np
import threading
import time
import lz4.frame
import tritonclient.grpc as triton_grpc
from sensor_msgs.msg import PointCloud2
from std_msgs.msg import String
from geometry_msgs.msg import PoseStamped
# Custom message types (would be defined in edge_msgs package)
# from edge_msgs.msg import BEVFeatureMap, Detection3DArray, VLMResult
class EdgeInferenceClient:
"""
Asynchronous edge inference client.
Sends features to Triton, receives results without blocking
the on-vehicle perception pipeline.
"""
def __init__(self):
rospy.init_node('edge_inference_client')
# Parameters
self.edge_url = rospy.get_param(
'~edge_server_url', 'edge-server.local:8001'
)
self.vehicle_id = rospy.get_param('~vehicle_id', 'third-generation tug-001')
self.enable_vlm = rospy.get_param('~enable_vlm', True)
self.enable_world_model = rospy.get_param('~enable_world_model', True)
# Triton client (gRPC, async)
self.triton_client = triton_grpc.InferenceServerClient(
url=self.edge_url,
verbose=False
)
# Check server health
if not self.triton_client.is_server_live():
rospy.logwarn("Edge server not reachable, starting in "
"offline mode")
self.connected = False
else:
self.connected = True
rospy.loginfo(f"Connected to edge server at {self.edge_url}")
# Subscribers
self.bev_sub = rospy.Subscriber(
'/perception/bev_features', PointCloud2,
self.bev_feature_cb, queue_size=1
)
self.pose_sub = rospy.Subscriber(
'/localization/pose', PoseStamped,
self.pose_cb, queue_size=1
)
# Publishers (edge results)
self.edge_det_pub = rospy.Publisher(
'/edge/detections_3d', PointCloud2, queue_size=1
)
self.edge_vlm_pub = rospy.Publisher(
'/edge/vlm_description', String, queue_size=1
)
# Latency tracking
self.rtt_history = []
self.rtt_pub = rospy.Publisher(
'/edge/rtt_ms', String, queue_size=1
)
# Async inference thread pool
self.executor = threading.ThreadPoolExecutor(max_workers=4)
# Health monitoring
self.health_timer = rospy.Timer(
rospy.Duration(1.0), self.health_check_cb
)
def bev_feature_cb(self, msg):
"""Non-blocking: submit edge inference request."""
if not self.connected:
return
# Submit async (does not block this callback)
self.executor.submit(self._run_edge_inference, msg)
def _run_edge_inference(self, feature_msg):
"""Run on thread pool. Sends to Triton, publishes result."""
try:
t_start = time.monotonic()
# Prepare Triton input
features_np = self._msg_to_numpy(feature_msg)
input_tensors = [
triton_grpc.InferInput(
'bev_features', features_np.shape, 'FP16'
),
]
input_tensors[0].set_data_from_numpy(features_np)
# Request detection + VLM + world model in one call
# (Triton ensemble pipeline handles routing)
output_names = ['boxes_3d', 'scores', 'labels']
if self.enable_vlm:
output_names.append('vlm_text')
outputs = [
triton_grpc.InferRequestedOutput(name)
for name in output_names
]
result = self.triton_client.infer(
model_name='airside_perception_ensemble',
inputs=input_tensors,
outputs=outputs,
client_timeout=0.1, # 100ms timeout
headers={
'x-vehicle-id': self.vehicle_id,
'x-timestamp': str(feature_msg.header.stamp.to_sec())
}
)
rtt = (time.monotonic() - t_start) * 1000
self.rtt_history.append(rtt)
# Publish results to ROS topics
self._publish_detections(result, feature_msg.header)
if self.enable_vlm and 'vlm_text' in output_names:
self._publish_vlm(result)
# Publish RTT for monitoring
self.rtt_pub.publish(String(data=f"{rtt:.1f}"))
except Exception as e:
rospy.logwarn_throttle(5.0,
f"Edge inference failed: {e}. Vehicle continues "
f"autonomously."
)
def health_check_cb(self, event):
"""Periodic edge server health check."""
try:
is_live = self.triton_client.is_server_live()
if is_live and not self.connected:
rospy.loginfo("Edge server connection restored")
self.connected = True
elif not is_live and self.connected:
rospy.logwarn("Edge server connection lost")
self.connected = False
except Exception:
if self.connected:
rospy.logwarn("Edge server health check failed")
self.connected = False10.3 Latency Monitoring Dashboard
Every edge inference request is instrumented with timestamps at each stage. A Prometheus metrics endpoint on the edge client node exposes:
# Prometheus metrics exported by edge_inference_client
# RTT histogram (ms)
edge_inference_rtt_ms{vehicle="third-generation tug-001", model="ptv3_detection"}
# Request rate (req/s)
edge_inference_requests_total{vehicle="third-generation tug-001", model="ptv3_detection"}
# Error rate
edge_inference_errors_total{vehicle="third-generation tug-001", error_type="timeout"}
# Network state
edge_network_state{vehicle="third-generation tug-001"} # 0=offline, 1=minimal, 2=degraded, 3=full
# Queue depth at edge server
edge_server_queue_depth{model="ptv3_detection"}
# GPU utilization on edge
edge_gpu_utilization{gpu="0"}Grafana dashboard panels:
| Panel | Metric | Alert Threshold |
|---|---|---|
| P50/P95/P99 RTT per vehicle | edge_inference_rtt_ms | P99 > 100ms: warn |
| Request success rate | 1 - errors/requests | < 95%: warn, < 80%: alert |
| Fleet network state heatmap | edge_network_state per vehicle | Any vehicle offline > 60s |
| Edge GPU utilization | edge_gpu_utilization | > 90% sustained: capacity alert |
| Model queue depth | edge_server_queue_depth | > 20 requests: scaling needed |
10.4 ROS Topic Architecture with Edge Integration
/lidar/merged (PointCloud2, 10 Hz)
│
├──> /perception/preprocessor
│ │
│ ├──> /perception/pillars (on-vehicle backbone)
│ │ │
│ │ ├──> /perception/pointpillars_det (6.84ms, safety)
│ │ │ │
│ │ │ └──> /perception/detections_local
│ │ │
│ │ └──> /perception/bev_features
│ │ │
│ │ └──> edge_inference_client ──> [5G] ──> Edge
│ │ │
│ │ edge_result_subscriber <── [5G] <────────┘
│ │ │
│ │ ├──> /edge/detections_3d
│ │ ├──> /edge/vlm_description
│ │ ├──> /edge/world_prediction
│ │ └──> /edge/cooperative_map
│ │
│ └──> /perception/segmentation (FlatFormer, on-vehicle)
│
└──> /localization/vgicp ──> /localization/pose
/perception/detections_local ──┐
├──> perception_fusion_node
/edge/detections_3d ───────────┘ │
└──> /perception/fused_detections
│
└──> /planning/frenet
│
└──> /control/cmd_vel11. Industry Approaches
11.1 Mobileye "Slow-Think" Cloud VLM Architecture
Mobileye has publicly described a dual-loop architecture where a fast, on-vehicle perception loop runs in real-time while a "slow-think" cloud-based VLM processes the same scene at a lower frequency for higher-level reasoning. Presented at CES 2025, this architecture:
- Fast loop (on-vehicle, EyeQ Ultra): 176 TOPS, runs detection, free space, lane detection at 10+ Hz. Produces driving commands.
- Slow loop (cloud/edge VLM): Processes camera frames at ~1 Hz. Provides scene understanding, anomaly detection, and planning verification.
- Reconciliation: If slow loop identifies a risk the fast loop missed (e.g., "construction zone ahead, workers on road"), it sends a constraint to the fast loop that tightens safety margins.
Relevance to reference airside AV stack: This is precisely the Pattern D (speculative execution) architecture. Mobileye validates the approach for production: the vehicle always acts on the fast loop's output, and cloud/edge reasoning arrives asynchronously to refine behavior.
Key difference for airside: Mobileye's slow loop runs in cloud (100+ ms RTT) because highway driving lacks local edge infrastructure. the reference airside AV stack's edge server provides 20-90ms RTT, enabling the slow loop to contribute within the same or next planning cycle -- a significant advantage.
11.2 Waymo Cloud-Based Perception Refinement
Waymo has documented (in public talks and patents) a cloud-based perception pipeline that:
- Runs resource-intensive models on cloud GPUs for sensor data uploaded from the fleet
- Identifies perception errors (false negatives, misclassified objects) using larger models than can run on-vehicle
- Generates automatic labels for fine-tuning on-vehicle models
- Provides offline route analysis to pre-compute expected perception challenges
Waymo's approach is batch/offline (minutes to hours latency), not real-time. It focuses on improving the on-vehicle model rather than augmenting it in real-time.
Relevance to reference airside AV stack: Waymo's cloud pipeline corresponds to Tier 3 in our architecture. The auto-labeling and model improvement functions are directly applicable. The difference is that Waymo does not operate edge servers for real-time enhancement (highway AVs cannot rely on connectivity), while reference airside AV stack can.
11.3 Apollo Cloud-Based HD Map Updates
Baidu Apollo uses a cloud-based map pipeline:
- Vehicles continuously upload mapping data (point clouds, images, localization)
- Cloud pipeline detects map changes (new construction, lane changes)
- Updated map tiles are pushed to vehicles
This is essentially the map change detection function of our Tier 2/3. The "fleet-based map maintenance" architecture described in hd-map-change-detection.md follows a similar pattern but with the fusion happening at the airport edge server (lower latency, privacy-preserving).
11.4 Tesla Dojo and On-Vehicle Only Inference
Tesla takes the opposite approach: all inference runs on-vehicle (HW3/HW4), and Dojo is used exclusively for training. Tesla's rationale:
- Cannot depend on connectivity (highway driving)
- Latency requirements are stringent at highway speeds
- Custom silicon (FSD chip) is highly optimized for their models
Relevance to reference airside AV stack: Tesla's constraint (no reliable connectivity) does not apply to airport operations. However, Tesla's principle of "the vehicle must work without the network" is critical and directly maps to our Simplex-based degradation strategy.
11.5 Motional / Hyundai Edge Computing Approach
Motional (Hyundai's L4 subsidiary, previously Aptiv-Hyundai JV) has piloted edge computing for:
- Intersection perception augmentation using roadside units (RSUs)
- V2I communication for traffic signal phase and timing (SPaT)
- Cloud-based remote assistance for edge cases
Motional's RSU-based perception is closest to the airport V2I cooperative perception concept. Their finding that infrastructure sensors add 15-25% detection AP aligns with DAIR-V2X benchmark results cited in our cooperative perception documents.
11.6 UISEE Airport Edge Approach
UISEE, the leading airside AV company (1,000+ vehicles deployed, Changi driverless tractors), has not publicly detailed their compute architecture. However, based on their published specifications and Changi deployment:
- Vehicles run local perception and planning
- Connected to airport operations center via 5G (Changi uses Singtel private 5G)
- Central dispatch/scheduling system communicates with vehicles
- Remote monitoring with human oversight
UISEE appears to use centralized fleet management (Tier 3 equivalent) but it is unclear whether they use edge-based perception enhancement (Tier 2). Their vehicles reportedly run on custom compute platforms, not NVIDIA Orin.
11.7 Comparative Summary
| Company | On-Vehicle | Edge | Cloud | Connectivity |
|---|---|---|---|---|
| reference airside AV stack (proposed) | Safety stack (Orin) | VLM + world model + coop. fusion | Training + analytics | Private 5G |
| Mobileye | Fast perception (EyeQ) | N/A | Slow-think VLM | Public cellular |
| Waymo | Full perception + planning | N/A | Auto-label + map + refinement | Public cellular |
| Apollo | Full perception + planning | N/A | Map updates + training | Public cellular |
| Tesla | Full stack (FSD chip) | N/A | Training only (Dojo) | WiFi (OTA only) |
| Motional | Full perception + planning | RSU perception fusion | Remote assistance | V2I + cellular |
| UISEE | Perception + planning | Unknown | Fleet management | Private 5G |
the reference airside AV stack's edge approach is differentiated because the airport environment uniquely enables it. No highway AV company can rely on edge compute because connectivity is not guaranteed on public roads. This is a structural advantage of airport operations that should be exploited.
12. Implementation Roadmap
12.1 Phase 1: Foundation ($15,000-25,000, 6 weeks)
Goal: Edge server operational, VLM offloading working end-to-end.
| Week | Task | Deliverable |
|---|---|---|
| 1-2 | Edge server hardware setup and software installation | Triton running, accessible from vehicle network |
| 2-3 | Edge client ROS node development | gRPC client node, feature serialization, result deserialization |
| 3-4 | VLM deployment on Triton | InternVL2-7B serving, camera frame intake, text output |
| 4-5 | End-to-end integration | Vehicle sends camera frames, receives VLM descriptions |
| 5-6 | Latency measurement and optimization | P50 < 60ms RTT, monitoring dashboard live |
Hardware for Phase 1:
- 1x server with 2x A100 40GB (can be rented: ~$5K/month) or purchased (~$25K used)
- 1x 5G modem per test vehicle (Cradlepoint E3000, ~$500)
Success criteria:
- VLM co-pilot running at 1-2 Hz from vehicle
- RTT < 80ms at P95
- Vehicle operates safely when edge server is unplugged (graceful degradation)
12.2 Phase 2: Split Inference ($20,000-35,000, 8 weeks)
Goal: Feature offload pattern operational for cooperative perception.
| Week | Task | Deliverable |
|---|---|---|
| 1-2 | BEV feature extraction on-vehicle | Backbone feature map published as ROS topic |
| 2-3 | Feature transport optimization | LZ4 compression, gRPC streaming, bandwidth profiling |
| 3-5 | Edge detection head deployment | PTv3/foundation model head on Triton, batched inference |
| 5-6 | Perception fusion node | Confidence-weighted merge of on-vehicle + edge detections |
| 6-7 | Cooperative perception fusion | Multi-vehicle feature aggregation on edge (Where2comm) |
| 7-8 | A/B testing framework | Side-by-side evaluation of edge-enhanced vs local-only |
Success criteria:
- Feature offload running at 10 Hz for 5+ vehicles simultaneously
- Edge detection AP measurably higher than on-vehicle only (shadow mode comparison)
- Cooperative perception fusion showing new detections in occluded zones
- Graceful degradation tested: kill edge, verify vehicle continues safely
12.3 Phase 3: Fleet Orchestration ($15,000-25,000, 6 weeks)
Goal: Full graceful degradation, fleet-level optimization, production hardening.
| Week | Task | Deliverable |
|---|---|---|
| 1-2 | Network state machine implementation | 4-state degradation with hysteresis |
| 2-3 | Edge server redundancy | Dual-server failover, tested with kill switch |
| 3-4 | Dynamic batching optimization | Triton batch tuning for fleet-scale throughput |
| 4-5 | World model deployment on edge | LiDAR-native world model serving 5+ vehicles |
| 5-6 | Load testing | Simulate 20-50 concurrent vehicles, identify bottlenecks |
Success criteria:
- All 4 network degradation levels tested and validated
- Edge server failover < 5 seconds
- 20 vehicles served simultaneously at < 80ms P95 RTT
- World model predictions available at 5 Hz per vehicle
12.4 Phase 4: Multi-Airport ($10,000-20,000, 4 weeks)
Goal: Repeatable edge deployment pattern for second and subsequent airports.
| Week | Task | Deliverable |
|---|---|---|
| 1 | Edge deployment automation (Ansible/Terraform) | One-command edge server provisioning |
| 2 | Airport-specific model configuration | Per-airport model variants, A/B testing by airport |
| 3 | Cross-airport fleet management | Single dashboard for multiple airport edge servers |
| 4 | Documentation and handoff | Ops playbook for edge server at new airports |
Success criteria:
- Second airport edge server deployed in < 1 week
- Per-airport model serving with shared base + per-airport LoRA
- Centralized monitoring across airports
12.5 Total Investment Summary
| Phase | Duration | Cost | Cumulative |
|---|---|---|---|
| Phase 1: Foundation | 6 weeks | $15K-25K | $15K-25K |
| Phase 2: Split inference | 8 weeks | $20K-35K | $35K-60K |
| Phase 3: Fleet orchestration | 6 weeks | $15K-25K | $50K-85K |
| Phase 4: Multi-airport | 4 weeks | $10K-20K | $60K-105K |
| Total software development | 24 weeks | $60K-105K | |
| Edge hardware (first airport) | — | $50K-150K | |
| Grand total (first airport) | 24 weeks | $110K-255K |
This is comparable to the cost of a single vehicle (third-generation tug production cost) but benefits the entire fleet.
13. Key Takeaways
Orin AGX is sufficient for safety, insufficient for advanced AI. The safety-critical stack (PointPillars, Frenet, CBF, Simplex) fits within 55ms on Orin. Adding VLMs, world models, foundation backbones, and cooperative fusion simultaneously requires 500ms+ -- a 5x overrun. Edge offloading resolves this without replacing vehicle hardware.
Airports uniquely enable edge-cloud inference. Private 5G with 5-20ms RTT, bounded geography, owned infrastructure, and co-located fleets create conditions that highway AVs cannot exploit. The 55ms typical end-to-end edge latency fits within a single 100ms perception cycle.
Feature offload (Pattern B) is the optimal primary pattern. Sending compressed BEV features (50-200 KB) instead of raw sensor data (4-12 MB) reduces bandwidth 20-60x while enabling the edge to run multiple model heads on the same features. A 20-vehicle fleet needs 5-17 Mbps uplink per vehicle -- well within 5G capacity.
The vehicle must always operate safely without the network. The Simplex analogy applies: edge-enhanced perception is the advanced controller, on-vehicle perception is the baseline controller. Network loss degrades capability but never safety. Fast degradation, slow recovery (5s hysteresis) prevents oscillation.
Edge economics favor fleets of 20+ vehicles. A 4x A100 edge server ($50K) amortized across 20 vehicles is $2,500/vehicle -- less than a single Orin-to-Thor upgrade. The edge provides VLM 7B+ capabilities that even Thor cannot match, plus cooperative fleet fusion that no per-vehicle upgrade enables.
Dynamic batching across vehicles is the edge's key efficiency. Triton's batched inference processes multiple vehicles' features in a single GPU call. Batching 4 vehicles costs ~30ms instead of 4x20ms = 80ms. This is why a shared edge server is more GPU-efficient than per-vehicle compute for fleet workloads.
Network slicing with URLLC guarantees safety-critical V2X. 5G QoS profiles ensure V2X safety messages (1 Mbps) get guaranteed bit rate even when analytics traffic (100+ Mbps) saturates the network. This decouples safety communication from enhancement communication.
Cooperative perception is the highest-value edge function. Where2comm on the edge fuses features from all vehicles into fleet-level perception, providing +18-22% detection AP and eliminating occlusion blind spots. This is impossible without centralized edge compute -- pure V2V peer-to-peer does not scale.
Multi-tenant isolation is critical for airline customers. Airports serve multiple ground handlers. Kubernetes namespace isolation, MIG GPU partitioning, and separate Triton instances per tenant prevent cross-customer data leakage while sharing hardware cost.
Phase 1 (VLM offloading) delivers value in 6 weeks for $15-25K. The fastest path to demonstrating edge value is offloading a VLM co-pilot -- a capability that cannot run on Orin at acceptable frequency. This provides immediate anomaly detection and scene reasoning at 1-2 Hz with zero vehicle hardware changes.
14. References
Academic Papers
He, Z., Shorinwa, O., et al. "CoBEVFlow: Robust Asynchronous Collaborative 3D Object Detection." AAAI 2024. Asynchronous cooperative perception with BEV flow compensation up to 200ms delay.
Hu, Y., Fang, S., et al. "Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps." NeurIPS 2022. Achieves 95.3% of full-sharing performance at 1/64 bandwidth via learned attention masks.
Li, Y., et al. "InternVL2: Better than the Best -- Expanding the Boundaries of Open-Source Multimodal Models." 2024. InternVL2-2B achieves practical VLM performance in 300ms on Orin; 7B achieves SOTA on edge/cloud GPUs.
Wu, X., et al. "Point Transformer V3: Simpler, Faster, Stronger." CVPR 2024 (Oral). 80.4% mIoU on nuScenes, 3x faster, 10x less memory than PTv2.
Lang, A., et al. "PointPillars: Fast Encoders for Object Detection from Point Clouds." CVPR 2019. 6.84ms INT8 on Orin with TensorRT -- the safety baseline detection model.
Yang, B., et al. "Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion." ICLR 2024. LiDAR future prediction via discrete diffusion on tokenized point clouds.
Agand, P., et al. "UnO: Unsupervised Occupancy Fields for Perception and Forecasting." CVPR 2024. Self-supervised LiDAR occupancy forecasting outperforming supervised baselines.
Xu, Y., et al. "V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer." ECCV 2022. Transformer-based V2X fusion achieving +18% AP over single-vehicle.
Li, Y., et al. "HEAL: An Extensible Framework for Open Heterogeneous Collaborative Perception." ICLR 2024. Heterogeneous agent fusion without retraining existing agents.
Tian, S., et al. "DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models." 2024. Chain-of-thought reasoning architecture for VLM driving co-pilots.
Industry Publications
NVIDIA. "Triton Inference Server: Model Serving at Scale." NVIDIA Developer Documentation, 2025. Dynamic batching, model ensembles, GPU scheduling for multi-model serving.
NVIDIA. "Jetson AGX Orin Developer Guide." JetPack 6.x, 2025. Hardware specifications, power modes, TensorRT deployment.
OnGo Alliance. "Private Wireless Revolution: CBRS at DFW Airport." 2024. DFW $10M private 5G deployment case study.
Singtel. "5G Aviation Testbed at Changi Airport." 2023-2026. Private 5G for autonomous tractor operations.
3GPP. "TS 23.501: System Architecture for the 5G System." Release 17, 2023. 5G QoS framework, network slicing, URLLC specifications.
3GPP. "TS 23.287: Application Layer Support for V2X Services." Release 17, 2023. C-V2X sidelink and network-based V2X communication.
ETSI. "MEC 003: Multi-access Edge Computing -- Framework and Reference Architecture." v3.1.1, 2023. MEC deployment architecture for edge servers co-located with 5G infrastructure.
ETSI. "GR MEC 022: Multi-access Edge Computing -- Study on MEC Support for V2X Use Cases." 2020. Edge computing for V2X applications including cooperative perception.
NVIDIA. "NVIDIA DGX Station A100 Datasheet." 2024. 4x A100 80GB, 320 GB total HBM, 6.4 kW power.
Mobileye. "Mobileye Drive: System Architecture." CES 2025 presentation. Dual-loop fast/slow-think architecture with cloud VLM integration.
Standards and Regulations
ISO 3691-4:2023. "Industrial trucks -- Safety requirements and verification -- Part 4: Driverless industrial trucks." Harmonized with EU Machinery Directive.
EU AI Act (Regulation 2024/1689). High-risk AI system requirements including robustness, transparency, and human oversight. Effective August 2024, compliance by August 2026.
NIS2 Directive (Directive 2022/2555). Network and information security requirements for critical infrastructure. Transport sector (including airports) in scope.
EU GDPR (Regulation 2016/679). Data protection requirements applicable to sensor data containing personal information (camera images of ground crew).
ICAO Annex 17. Aviation security standards applicable to equipment in restricted airside zones.
Software and Products
NVIDIA Triton Inference Server. https://github.com/triton-inference-server/server. Open-source inference serving with dynamic batching, GPU scheduling, and model management.
Cradlepoint E3000 Series. Enterprise 5G router for vehicle-mounted deployment. Multi-carrier, dual-modem, CBRS/sub-6/mmWave support.
Sierra Wireless AirLink XR90. Rugged 5G router for industrial vehicles. IP67 rated, -40C to +70C operating temperature.
NVIDIA Isaac ROS. GPU-accelerated ROS packages for robot perception. Includes nvblox, visual SLAM, and freespace segmentation.
Kubernetes + NVIDIA GPU Operator. Container orchestration with GPU resource management for edge server deployment.