Real-Time Occupancy Grid Mapping for Airport Airside Autonomous Vehicles
Occupancy grid mapping is the foundational spatial representation that sits between raw sensor data and motion planning. Unlike learned occupancy prediction (see 30-autonomy-stack/world-models/occupancy-deployment-orin.md) which uses neural networks to hallucinate unseen space, classical occupancy grid mapping uses probabilistic raycasting to build a live, evidence-based map of what is free, what is occupied, and what is unknown -- directly from LiDAR measurements. For the reference airside AV stack's 4-8 RoboSense LiDAR stack processing 400K-1.2M points every 100ms, this is the first line of spatial reasoning: a continuously-updated volumetric representation that feeds the Frenet planner's costmap, provides collision avoidance guarantees, and -- when shared across vehicles -- creates fleet-wide situational awareness of the airport apron. This document covers the theory (log-odds Bayesian update, inverse sensor models), the data structures (dense grids, octrees, voxel hashmaps, spatial hashing), GPU-accelerated implementation on NVIDIA Orin, multi-LiDAR fusion, dynamic object handling, multi-resolution strategies, TSDF/ESDF alternatives, costmap generation for path planning, fleet sharing protocols, and airside-specific adaptations for the unique geometry of aircraft stands, jet blast zones, and FOD detection. The existing reference airside AV stack has no explicit occupancy grid layer -- RANSAC segmentation feeds obstacle clusters directly to the Frenet planner. Adding a proper occupancy grid is the single highest-value perception infrastructure upgrade available today, costing approximately $25-40K in development while providing the spatial backbone for every downstream capability: neural occupancy, flow prediction, CBF safety filtering, and fleet cooperative perception.
Table of Contents
- Occupancy Grid Fundamentals
- 3D Voxel Representations
- GPU-Accelerated Raycasting
- Multi-LiDAR Occupancy Fusion
- Dynamic Object Handling
- Multi-Resolution Grids
- TSDF vs Occupancy vs ESDF
- Costmap Generation for Planning
- Fleet-Shared Occupancy
- Airside-Specific Considerations
- Implementation with ROS Noetic
- Performance on Orin
- Key Takeaways
- Implementation Roadmap
- References
1. Occupancy Grid Fundamentals
1.1 The Problem
An autonomous vehicle needs to answer a deceptively simple question at every planning cycle: "Which regions of space around me are free to drive through, which are blocked, and which have I never observed?" This question must be answered:
- In real-time (10 Hz minimum for 30 km/h operations)
- Probabilistically (a single LiDAR return is noisy; repeated observations increase confidence)
- Volumetrically (aircraft wings overhang at 5m height; ground-level FOD is <10cm)
- Consistently (sensor noise should not cause the map to flicker between free and occupied)
The occupancy grid, introduced by Moravec and Elfes (1985), discretizes space into cells (2D) or voxels (3D) and maintains a probability of occupancy for each. Forty years later, it remains the dominant representation for real-time spatial reasoning in robotics -- not because nothing better has been proposed, but because its mathematical simplicity, update speed, and interpretability make it uniquely suited as the foundation layer that everything else builds upon.
1.2 2D vs 3D Occupancy Grids
| Property | 2D Grid | 3D Grid |
|---|---|---|
| Representation | Top-down bird's-eye-view | Volumetric voxel grid |
| Cell content | P(occupied) for ground plane | P(occupied) per voxel |
| Memory (100m x 100m) | 250K cells at 0.2m | 250K x 40 = 10M voxels at 0.2m |
| Planning use | Direct costmap for 2D planner | Requires projection to 2D for ground planner |
| Vertical resolution | None (height collapsed) | Resolves overhanging structures |
| Airside problem | Cannot distinguish wing overhang (passable) from fuselage (blocked) | Correctly models 3D aircraft geometry |
| Compute cost | Trivial (<1ms) | Significant (5-50ms) |
| Standard message | nav_msgs/OccupancyGrid | octomap_msgs/Octomap or custom |
For airside operations, 3D is mandatory. Aircraft wings overhang at 4-6m above ground. A 2D grid that collapses height sees the wing shadow as "occupied" and blocks all paths under the wing. A 3D grid correctly marks the wing voxels as occupied at z=5m while showing the ground-level voxels as free. The third-generation tug's height is approximately 2.5m -- it can pass under most aircraft wings, but only a 3D representation can make that determination.
However, the planner itself operates in 2D (the Frenet planner generates lateral/longitudinal trajectories on a reference path). The architecture is therefore: maintain a 3D occupancy grid, project to a height-filtered 2D costmap for planning. Section 8 covers this projection in detail.
1.3 Binary vs Probabilistic Occupancy
Binary occupancy assigns each cell a hard label: occupied (1), free (0), unknown (-1). Simple and fast, but fragile:
Binary update: single noisy LiDAR return marks cell as occupied.
Next scan: no return in that cell → cell flips to free.
Result: map flickers every frame. Planner sees phantom obstacles.Probabilistic occupancy maintains a continuous probability P(occupied) in [0, 1] for each cell. Multiple consistent observations drive the probability toward 0 or 1. A single noisy reading barely moves the probability. This is the correct approach for any real-world deployment.
1.4 Log-Odds Representation
Maintaining probabilities directly is numerically problematic. The recursive Bayesian update requires multiplication:
P(occ | z_{1:t}) = P(z_t | occ) * P(occ | z_{1:t-1}) / P(z_t)Repeatedly multiplying probabilities causes underflow. The log-odds representation (Thrun, Burgard, Fox 2005) solves this:
l(x) = log( P(occ=1|x) / P(occ=0|x) )
Properties:
l = 0 → P = 0.5 (unknown, no evidence)
l → +∞ → P → 1.0 (certainly occupied)
l → -∞ → P → 0.0 (certainly free)
l = +2.2 → P ≈ 0.9
l = -2.2 → P ≈ 0.1The Bayesian update in log-odds is pure addition:
l(x | z_{1:t}) = l(x | z_{1:t-1}) + l(x | z_t) - l_0
Where:
l(x | z_{1:t-1}) = prior log-odds (accumulated from all previous observations)
l(x | z_t) = log-odds from current measurement (inverse sensor model)
l_0 = log(P_prior / (1 - P_prior)) = 0 when P_prior = 0.5Why log-odds is perfect for real-time systems:
- Addition instead of multiplication -- numerically stable, no underflow
- O(1) per cell update -- no normalization needed
- Trivially parallelizable -- each cell updates independently (GPU-friendly)
- Clamping is simple -- clamp l to [-L_max, +L_max] to prevent over-confidence
- Memory-efficient -- store as int16 (65,536 levels) or even int8 (256 levels)
"""
Log-odds occupancy update -- the core algorithm.
This runs per-cell, per-observation. GPU version parallelizes over all cells.
"""
import numpy as np
# Configuration
L_OCC = 0.85 # Log-odds increment when a ray hits this cell (occupied)
L_FREE = -0.4 # Log-odds increment when a ray passes through this cell (free)
L_MAX = 4.6 # Clamp value (corresponds to P ≈ 0.99)
L_MIN = -4.6 # Clamp value (corresponds to P ≈ 0.01)
def log_odds_update(grid, ray_cells, hit_cell):
"""
Update occupancy grid along a single LiDAR ray.
Args:
grid: np.array of log-odds values (float32 or int16)
ray_cells: list of (i, j, k) cell indices along the ray (excluding hit)
hit_cell: (i, j, k) cell index where the ray terminated (the LiDAR return)
"""
# All cells the ray passed through are evidence of "free"
for cell in ray_cells:
grid[cell] += L_FREE
grid[cell] = max(grid[cell], L_MIN) # Clamp
# The cell where the ray terminated is evidence of "occupied"
grid[hit_cell] += L_OCC
grid[hit_cell] = min(grid[hit_cell], L_MAX) # Clamp
def log_odds_to_probability(l):
"""Convert log-odds to probability."""
return 1.0 - 1.0 / (1.0 + np.exp(l))1.5 Inverse Sensor Model for LiDAR
The inverse sensor model maps a LiDAR measurement to log-odds updates for each cell. For LiDAR, the model is straightforward because LiDAR provides direct range measurements along known ray directions:
Given: LiDAR at position p_sensor, measurement at point p_hit
with range r = ||p_hit - p_sensor||
For a cell at position p_cell:
d = distance from p_sensor to p_cell along the ray direction
If d < r - ε: → cell is FREE (ray passed through it)
If |d - r| < ε: → cell is OCCUPIED (ray terminated here)
If d > r + ε: → cell is UNKNOWN (no information beyond the hit point)
Where ε is a small tolerance (typically half a voxel size) Sensor Hit
══╤══ ─────────────────────── ●═══ ?????
║ ║
║ FREE cells (ray passed) ║ OCC ║ Unknown
║ ║ ║ (behind)Asymmetric update strengths are critical. A ray passing through a cell provides strong evidence of "free" (nothing was there). A ray hitting a cell provides somewhat weaker evidence of "occupied" (could be noise, dust, rain). Typical values:
| Parameter | Value | Rationale |
|---|---|---|
L_OCC (hit) | +0.7 to +1.0 | Single hit is moderate evidence |
L_FREE (pass-through) | -0.3 to -0.5 | Pass-through is weaker per-ray but accumulates fast |
L_MAX (clamp) | +3.5 to +5.0 | P=0.97-0.99, prevents over-confidence |
L_MIN (clamp) | -3.5 to -5.0 | P=0.01-0.03, prevents over-confidence |
L_PRIOR | 0.0 | P=0.5, no prior information |
Why asymmetric? A cell that has been confirmed free by 10 passing rays should not become "occupied" from a single noisy return. The asymmetry makes the map conservative -- it takes more evidence to mark something as occupied than as free, preventing phantom obstacles.
Airside tuning note: For airport operations, the asymmetry should be reversed for safety-critical zones (near aircraft, in jet blast areas). In those zones, a single occupied observation should immediately raise the probability, while many free observations are needed to clear the cell. This implements the "assume occupied until proven otherwise" safety principle.
1.6 Comparison with Occupancy Prediction Networks
| Aspect | Classical Occupancy Grid | Neural Occupancy Prediction |
|---|---|---|
| Method | Raycasting + Bayesian update | Forward pass through CNN/Transformer |
| Training data | None | Thousands of annotated frames |
| Occlusion handling | Reports "unknown" | Predicts behind occlusions |
| Semantic labels | No (binary free/occupied) | Yes (per-class) |
| Novel objects | Detects any solid object | Detects if trained on similar objects |
| Accuracy | Exact to sensor noise | ~30-40 mIoU (benchmark) |
| Latency | 2-20ms | 10-100ms |
| Deterministic | Yes | No (inference variance) |
| Certifiable | Yes (fully inspectable math) | Difficult (black box) |
| Use case | Safety baseline, planning costmap | Scene understanding, prediction |
The two are complementary, not competing. Classical occupancy provides the certifiable safety baseline (see 60-safety-validation/runtime-assurance/runtime-verification-monitoring.md -- Simplex BC). Neural occupancy provides richer understanding. The architecture is:
Classical occupancy grid (always running, 5-20ms, certifiable)
→ Safety costmap → Frenet planner (safety-critical path)
Neural occupancy prediction (running in parallel, 20-100ms)
→ Semantic costmap → Neural planner (performance path)2. 3D Voxel Representations
2.1 The Memory Problem
A naive dense 3D voxel grid is unusable for large spaces:
Airport apron coverage: 200m x 200m x 8m (ground to aircraft tail)
At 0.2m resolution: 1000 x 1000 x 40 = 40,000,000 voxels
Memory per voxel:
- float32 log-odds: 4 bytes → 160 MB
- int16 log-odds: 2 bytes → 80 MB
- int8 log-odds: 1 byte → 40 MB
At 0.1m resolution: 2000 x 2000 x 80 = 320,000,000 voxels
- float32: 1.28 GB — unacceptable on Orin with 32 GB shared memoryFor a 200m apron at fine resolution, dense grids consume too much memory. But most of that space is empty air -- only a thin shell near surfaces and the ground plane is interesting. This motivates sparse representations.
2.2 Representation Comparison
| Representation | Memory | Query O() | Insert O() | GPU-Friendly | Dynamic Update | Notes |
|---|---|---|---|---|---|---|
| Dense 3D array | O(N^3) | O(1) | O(1) | Excellent | Excellent | Unusable for large spaces |
| Octree (OctoMap) | O(N_occupied * log N) | O(log N) | O(log N) | Poor | Moderate | Classic, CPU-bound |
| Voxel hashmap | O(N_occupied) | O(1) amortized | O(1) amortized | Good | Good | VDBFusion, nvblox |
| Spatial hashing | O(N_occupied) | O(1) | O(1) | Excellent | Excellent | Custom GPU implementations |
| Block-sparse (VDB) | O(N_blocks) | O(1) | O(1) | Good (OpenVDB GPU) | Good | OpenVDB heritage |
| Multi-resolution dense | O(sum N_i^3) | O(1) | O(1) | Excellent | Excellent | Multiple grids at different resolutions |
2.3 OctoMap: The Classic
OctoMap (Hornung et al., 2013) was the first widely-used 3D occupancy framework in robotics. It uses an octree -- a tree where each node has 8 children, recursively subdividing 3D space.
Architecture:
Root node (covers entire map volume)
├── Child 0 (octant 0)
│ ├── Child 0.0 ... (subdivide further if heterogeneous)
│ └── Child 0.7
├── ...
└── Child 7 (octant 7)
Key property: homogeneous regions are represented by a single node.
- Large empty area → single "free" node covering whole region
- Large solid area → single "occupied" node
- Only surfaces require fine subdivision
Memory: proportional to surface area, not volume
- For a 200m x 200m apron at 0.1m: ~2-10M nodes (~50-250 MB)
- Much better than 320M voxels for dense gridOctoMap strengths:
- Proven in thousands of real robot deployments
- Lossless multi-resolution (query at any level)
- Built-in log-odds occupancy update
octomap_serverROS package provides full pipeline- Serialization for saving/loading maps
OctoMap weaknesses for reference airside AV stack:
- CPU-only: Tree traversal is inherently sequential, resists GPU parallelization
- Slow raycasting: Each ray requires tree traversal for every cell it passes through
- Cache-unfriendly: Pointer-chasing through the tree causes L1/L2 cache misses
- Performance: ~3-10 Hz on a single LiDAR at automotive resolution on modern CPUs
- Insufficient throughput: 4-8 LiDARs at 10 Hz = 40-80 updates/second; OctoMap cannot keep up
OctoMap benchmarks (approximate, single-threaded CPU):
| Configuration | Resolution | Update Rate | Memory |
|---|---|---|---|
| 1 LiDAR, 100K pts | 0.1m | 3-8 Hz | ~200 MB |
| 1 LiDAR, 100K pts | 0.2m | 8-15 Hz | ~80 MB |
| 4 LiDARs merged, 400K pts | 0.1m | 1-3 Hz | ~400 MB |
| 4 LiDARs merged, 400K pts | 0.2m | 3-6 Hz | ~150 MB |
Verdict: OctoMap is a useful reference implementation and works for prototyping, but its CPU-bound nature makes it unsuitable for production multi-LiDAR operation at the reference airside AV stack's data rates.
2.4 VDBFusion: OpenVDB Meets Robotics
VDBFusion (Vizzo et al., RAL 2022) brings OpenVDB -- the hierarchical sparse data structure developed by DreamWorks Animation for visual effects -- to real-time TSDF mapping.
OpenVDB data structure:
Level 0: Root node (hash map of Level 1 pointers)
Level 1: Internal node (32^3 = 32K children, 4KB bitmask)
Level 2: Internal node (16^3 = 4K children, 512B bitmask)
Level 3: Leaf node (8^3 = 512 voxels, dense)
Key properties:
- Root hash map provides O(1) access to any region of space
- Internal nodes use bitmasks (not pointers) for active children
- Leaf nodes are dense 8^3 blocks — cache-friendly
- Empty regions consume zero memory
- Adjacent leaf nodes are likely allocated nearby in memory → spatial localityVDBFusion performance (from paper):
| Dataset | Resolution | FPS (CPU) | Memory | Comparison to OctoMap |
|---|---|---|---|---|
| Newer College (LiDAR) | 0.1m | 40+ Hz | ~100 MB | 5-10x faster |
| Newer College (LiDAR) | 0.05m | 20+ Hz | ~300 MB | 4-8x faster |
| KITTI (LiDAR) | 0.1m | 30+ Hz | ~80 MB | 4-6x faster |
VDBFusion strengths:
- 5-10x faster than OctoMap on CPU
- Memory-efficient sparse representation
- Handles unbounded environments (no pre-allocated grid)
- TSDF output (useful for mesh extraction, ESDF computation)
- Open source (MIT license):
https://github.com/PRBonn/vdbfusion
VDBFusion limitations:
- Still CPU-based (OpenVDB GPU support is experimental as of 2025)
- TSDF, not occupancy (requires conversion -- see Section 7)
- Less mature than OctoMap in the ROS ecosystem
- No built-in multi-resolution (though VDB natively supports it)
2.5 Spatial Hashing for GPU
The most GPU-friendly approach is spatial hashing: use a hash function to map 3D voxel coordinates to a flat array.
/**
* GPU spatial hash for occupancy grid.
* Each voxel is addressed by hashing its (x, y, z) integer coordinate.
* Collisions handled by open addressing (linear probing).
*/
// Hash function (FNV-1a variant for 3D coordinates)
__device__ uint32_t spatial_hash(int3 coord, uint32_t table_size) {
uint32_t h = 2166136261u;
h ^= (uint32_t)coord.x; h *= 16777619u;
h ^= (uint32_t)coord.y; h *= 16777619u;
h ^= (uint32_t)coord.z; h *= 16777619u;
return h % table_size;
}
struct VoxelEntry {
int3 coord; // Voxel coordinate (or EMPTY sentinel)
int16_t log_odds; // Occupancy log-odds
uint16_t timestamp; // Last observation time (for recency weighting)
};
// Table: flat array of VoxelEntry, pre-allocated on GPU
// Typical size: 4-16M entries (~32-128 MB for 8-byte entries)Spatial hashing advantages:
- O(1) lookup and insert (amortized, with good hash function)
- Massively parallel: thousands of CUDA threads update simultaneously
- Fixed memory: pre-allocate the hash table, no dynamic allocation on GPU
- Cache-friendly: flat array, sequential memory access for nearby voxels
- Simple: ~200 lines of CUDA code for complete implementation
Spatial hashing disadvantages:
- Load factor matters: >70% occupancy causes collision chains, performance degrades
- No inherent multi-resolution (must maintain separate tables)
- No tree structure for hierarchical queries (e.g., "is this entire 1m^3 region free?")
- Pre-allocation means fixed maximum occupied voxels
For Orin: Spatial hashing is the recommended approach for custom GPU occupancy grids. Pre-allocate a 4-8M entry table (~32-64 MB), which can represent 4-8M occupied or observed voxels -- more than sufficient for a 200m airport apron.
2.6 nvblox Internal Representation
nvblox (NVIDIA) uses a block-based spatial hashing approach internally:
nvblox data structure:
- Space divided into blocks (default: 8x8x8 voxels per block)
- Blocks stored in a GPU hash map
- Each block is a dense 8^3 array (512 voxels, contiguous memory)
- Only blocks near surfaces are allocated
This is essentially VDB Level 3 (leaf nodes only) with a hash map root.
Performance benefit:
- Hash map lookup to find the block: O(1)
- Dense array index within block: O(1)
- No pointer chasing (unlike octrees)
- Blocks are cache-line sized for GPUnvblox is covered in detail in 30-autonomy-stack/world-models/occupancy-deployment-orin.md Section 6. This document focuses on the broader occupancy grid architecture, where nvblox is one implementation option.
2.7 Representation Decision Matrix for reference airside AV stack
| Criterion | OctoMap | VDBFusion | Spatial Hash (Custom) | nvblox |
|---|---|---|---|---|
| Throughput (4-8 LiDARs) | Insufficient | Marginal | Excellent | Good |
| GPU utilization | None | Minimal | Full | Full |
| Memory efficiency | Good | Good | Moderate | Good |
| Multi-resolution | Native | Possible | Manual | Block-level |
| ROS integration | Excellent | Moderate | Custom | ROS 2 (needs bridge) |
| Dynamic objects | Manual | Manual | Manual | Built-in (ESDF decay) |
| Development effort | Low (pkg exists) | Medium | High | Medium |
| Certifiability | Good (simple math) | Good | Good | Moderate (NVIDIA binary) |
| Recommended phase | Prototype only | Not recommended | Phase 2 (custom perf) | Phase 1 (fast start) |
Recommendation: Start with nvblox (Phase 1) for rapid deployment, then develop a custom GPU spatial hashing occupancy node (Phase 2) for full control, certifiability, and maximum performance.
3. GPU-Accelerated Raycasting
3.1 The Raycasting Problem
For each LiDAR point, we must:
- Cast a ray from the sensor origin to the hit point
- Mark all voxels along the ray as "free" (log-odds decrement)
- Mark the voxel containing the hit point as "occupied" (log-odds increment)
For 4-8 LiDARs at 10 Hz producing 400K-1.2M points per 100ms, this means 400K-1.2M rays per frame, each traversing 10-500 voxels (depending on range and resolution). Total voxel updates per frame: 4M-100M+.
This is inherently parallel -- each ray is independent -- making it ideal for GPU computation.
3.2 Ray Marching Algorithms
Two primary algorithms for traversing a ray through a voxel grid:
Bresenham's 3D Line Algorithm:
Extends 2D Bresenham to 3D. Steps through voxels one at a time,
always choosing the voxel that minimizes error from the ideal ray line.
Pros: Integer-only arithmetic, no floating point needed
Cons: Branches on every step (bad for GPU warp divergence)
Not constant-time per step3D-DDA (Digital Differential Analyzer):
Computes the parametric t-value at which the ray crosses each axis-aligned
voxel boundary. Advances to the nearest boundary at each step.
For ray: p(t) = origin + t * direction
Initialize:
t_max_x = t at first x-boundary crossing
t_max_y = t at first y-boundary crossing
t_max_z = t at first z-boundary crossing
t_delta_x = t to cross one x-voxel = voxel_size / |dir.x|
t_delta_y = t to cross one y-voxel = voxel_size / |dir.y|
t_delta_z = t to cross one z-voxel = voxel_size / |dir.z|
Step:
If t_max_x < t_max_y and t_max_x < t_max_z:
advance in x, t_max_x += t_delta_x
Elif t_max_y < t_max_z:
advance in y, t_max_y += t_delta_y
Else:
advance in z, t_max_z += t_delta_z
Pros: Simple branches (3-way), predictable step count
Well-suited for GPU (minimal divergence within warps)
Cons: Floating point arithmetic (but GPU has fast FP units)For GPU implementation, 3D-DDA is strongly preferred. The regular step pattern and minimal branching result in better warp utilization on NVIDIA hardware.
3.3 CUDA Raycasting Kernel
/**
* GPU kernel: raycast all LiDAR points into occupancy grid.
* One CUDA thread per LiDAR point (ray).
*
* Grid stored as spatial hash table (Section 2.5).
* Log-odds updates via atomicAdd for thread safety.
*/
__global__ void raycast_occupancy_kernel(
const float3* __restrict__ points, // LiDAR hit points (N x 3)
const float3 sensor_origin, // Sensor position in world frame
const int num_points,
VoxelEntry* __restrict__ hash_table, // GPU spatial hash table
const uint32_t table_size,
const float voxel_size,
const int16_t l_occ, // Log-odds increment for occupied
const int16_t l_free, // Log-odds decrement for free
const int16_t l_max, // Clamp max
const int16_t l_min, // Clamp min
const uint16_t timestamp // Current frame timestamp
) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx >= num_points) return;
float3 hit = points[idx];
float3 dir = make_float3(
hit.x - sensor_origin.x,
hit.y - sensor_origin.y,
hit.z - sensor_origin.z
);
float ray_length = sqrtf(dir.x*dir.x + dir.y*dir.y + dir.z*dir.z);
// Skip invalid points (too close or too far)
if (ray_length < 0.5f || ray_length > 150.0f) return;
// Normalize direction
float inv_len = 1.0f / ray_length;
dir.x *= inv_len; dir.y *= inv_len; dir.z *= inv_len;
// 3D-DDA initialization
float inv_voxel = 1.0f / voxel_size;
int3 voxel = make_int3(
__float2int_rd(sensor_origin.x * inv_voxel),
__float2int_rd(sensor_origin.y * inv_voxel),
__float2int_rd(sensor_origin.z * inv_voxel)
);
int3 hit_voxel = make_int3(
__float2int_rd(hit.x * inv_voxel),
__float2int_rd(hit.y * inv_voxel),
__float2int_rd(hit.z * inv_voxel)
);
int3 step = make_int3(
(dir.x >= 0) ? 1 : -1,
(dir.y >= 0) ? 1 : -1,
(dir.z >= 0) ? 1 : -1
);
float3 t_delta = make_float3(
fabsf(voxel_size / dir.x),
fabsf(voxel_size / dir.y),
fabsf(voxel_size / dir.z)
);
// Compute initial t_max for each axis
float3 t_max;
t_max.x = ((dir.x >= 0 ? (voxel.x + 1) : voxel.x) * voxel_size
- sensor_origin.x) / dir.x;
t_max.y = ((dir.y >= 0 ? (voxel.y + 1) : voxel.y) * voxel_size
- sensor_origin.y) / dir.y;
t_max.z = ((dir.z >= 0 ? (voxel.z + 1) : voxel.z) * voxel_size
- sensor_origin.z) / dir.z;
// March along the ray, marking cells as free
int max_steps = (int)(ray_length * inv_voxel) + 2;
for (int s = 0; s < max_steps; s++) {
// Check if we've reached the hit voxel
if (voxel.x == hit_voxel.x &&
voxel.y == hit_voxel.y &&
voxel.z == hit_voxel.z) {
break;
}
// Update this voxel as FREE
uint32_t h = spatial_hash(voxel, table_size);
// Insert-or-update with atomic operations
atomic_log_odds_update(hash_table, h, voxel, l_free, l_min, l_max,
timestamp, table_size);
// DDA step: advance to next voxel boundary
if (t_max.x < t_max.y) {
if (t_max.x < t_max.z) {
voxel.x += step.x;
t_max.x += t_delta.x;
} else {
voxel.z += step.z;
t_max.z += t_delta.z;
}
} else {
if (t_max.y < t_max.z) {
voxel.y += step.y;
t_max.y += t_delta.y;
} else {
voxel.z += step.z;
t_max.z += t_delta.z;
}
}
}
// Mark the hit voxel as OCCUPIED
uint32_t h = spatial_hash(hit_voxel, table_size);
atomic_log_odds_update(hash_table, h, hit_voxel, l_occ, l_min, l_max,
timestamp, table_size);
}
/**
* Atomic log-odds update with spatial hash collision handling.
* Uses compare-and-swap for thread-safe insertion.
*/
__device__ void atomic_log_odds_update(
VoxelEntry* table, uint32_t hash_start, int3 coord,
int16_t delta, int16_t l_min, int16_t l_max,
uint16_t timestamp, uint32_t table_size
) {
uint32_t h = hash_start;
for (int probe = 0; probe < 32; probe++) { // Max 32 probes
VoxelEntry* entry = &table[h];
// Try to claim empty slot
int old = atomicCAS((int*)&entry->coord.x, EMPTY_SENTINEL, coord.x);
if (old == EMPTY_SENTINEL || old == coord.x) {
// Either we claimed it or it's already ours
if (old == EMPTY_SENTINEL) {
entry->coord = coord; // Initialize
}
if (entry->coord.x == coord.x &&
entry->coord.y == coord.y &&
entry->coord.z == coord.z) {
// Atomic log-odds update
atomicAdd(&entry->log_odds, delta);
// Clamp (race-safe: slight over-clamp is acceptable)
if (entry->log_odds > l_max) entry->log_odds = l_max;
if (entry->log_odds < l_min) entry->log_odds = l_min;
entry->timestamp = timestamp;
return;
}
}
// Collision: linear probe
h = (h + 1) % table_size;
}
// Table full or too many collisions — drop this update (safe: conservative)
}3.4 Raycasting Performance Analysis
Theoretical throughput on Orin AGX (2048 CUDA cores, ~1.3 GHz):
Per ray: ~50-500 voxel updates (depends on range / voxel_size)
Average ray: ~100 voxel updates at 0.2m resolution, 20m average range
Per voxel update: ~20 cycles (hash + atomic + DDA step)
→ Per ray: ~2000 cycles → ~1.5 microseconds
Points per frame: 300K (4 LiDARs) to 900K (8 LiDARs)
At 2048 cores:
300K rays: 300K / 2048 * 1.5us ≈ 0.22ms per wave * ~150 waves ≈ 0.2ms
(Actual: 2-5ms due to memory bandwidth, hash collisions, warp divergence)
Memory bandwidth:
300K rays * 100 voxels * 8 bytes (read+write) = 240 MB / frame
Orin bandwidth: 204.8 GB/s → 240 MB / 204.8 GB/s ≈ 1.2ms
(Bandwidth-limited, not compute-limited)Expected GPU raycasting performance on Orin AGX:
| Points/Frame | Resolution | Est. Latency | Bottleneck |
|---|---|---|---|
| 150K (2 LiDARs) | 0.2m | 1-3ms | Compute |
| 300K (4 LiDARs) | 0.2m | 3-6ms | Memory bandwidth |
| 600K (6 LiDARs) | 0.2m | 5-10ms | Memory bandwidth |
| 900K (8 LiDARs) | 0.2m | 8-15ms | Memory bandwidth |
| 300K (4 LiDARs) | 0.1m | 5-10ms | Both (2x more voxels per ray) |
| 300K (4 LiDARs) | 0.4m | 2-4ms | Compute |
Key insight: At 0.2m resolution with 4-8 LiDARs, GPU raycasting on Orin fits comfortably within the 100ms planning cycle, consuming 3-15ms of GPU time and leaving ample headroom for other tasks.
3.5 Optimization Techniques
1. Ray subsampling: Not every ray needs to update every voxel. For long-range rays (>50m), subsample free-space updates to every Nth voxel. The hit point always updates at full resolution.
Subsampling strategy:
Range 0-20m: update every voxel (full raycasting)
Range 20-50m: update every 2nd voxel along ray
Range 50-100m: update every 4th voxel along ray
Range >100m: only update hit voxel + 3 nearest free voxels
Result: ~40-60% fewer voxel updates, <5% accuracy loss2. Skip-empty optimization: If the hash table has no entry for a voxel and the voxel is along a free-space ray, skip the update entirely (the default state is "unknown" = 0 log-odds, which is fine for free space confirmation along long rays). Only create hash entries near surfaces.
3. Batch coalesced access: Sort rays by direction (using a quick radix sort on the octant) so that rays in the same warp traverse nearby voxels, improving GPU cache utilization.
4. Reduced free-space range: Instead of raycasting from sensor origin, start free-space marking 2-3m in front of the sensor (ego vehicle body is always free). Saves ~10-15% of voxel updates.
4. Multi-LiDAR Occupancy Fusion
4.1 The reference airside AV stack Multi-LiDAR Configuration
reference airside vehicles use 4-8 RoboSense LiDARs (RSHELIOS, RSBP) mounted around the vehicle for 360-degree coverage. Each LiDAR runs at 10 Hz, producing 50K-150K points per scan.
Typical third-generation tug configuration (8 LiDARs):
Front-left RSHELIOS ────┐
Front-right RSHELIOS ────┤
Rear-left RSHELIOS ────┤ Merged in /lidar_merger node
Rear-right RSHELIOS ────┤ → single PointCloud2 topic
Left RSBP ────┤
Right RSBP ────┤
Front-center RSBP ────┤
Rear-center RSBP ────┘
Total points per cycle: 400K-1.2M
Total data rate: ~40-120 MB/s (xyzi, float32)4.2 Fusion Strategies
Strategy 1: Merge-then-raycast (current approach recommendation)
All LiDARs → transform to vehicle frame → merge into single cloud
→ single GPU raycasting pass
Pros:
- Single raycasting pass (efficient)
- Overlapping FOVs contribute evidence from different viewpoints
- Simple pipeline
Cons:
- Must wait for all LiDAR scans (synchronization)
- Ego-motion during scan period (100ms) causes smearing
- Single sensor origin approximation (all rays from vehicle center)
Mitigation:
- Use per-LiDAR sensor origin during raycasting (array of origins)
- Ego-motion compensate each point using IMU (already done in reference airside AV stack)Strategy 2: Per-LiDAR grids with probabilistic fusion
Each LiDAR → separate occupancy grid → fuse log-odds
Fusion: l_fused(x) = l_1(x) + l_2(x) + ... + l_N(x)
(This works because log-odds are additive for independent observations!)
Pros:
- No synchronization needed (each LiDAR updates independently)
- Correct sensor origins for each LiDAR
- Can handle different LiDAR rates/resolutions
Cons:
- N separate raycasting passes (more GPU time)
- N separate hash tables (more GPU memory)
- Over-counting evidence in overlapping FOVs
Over-counting mitigation:
- Reduce per-LiDAR L_OCC and L_FREE by factor 1/sqrt(N_overlap)
- Or: use covariance intersection instead of naive additionStrategy 3: Hybrid — merge close LiDARs, separate far ones
Front-left + Front-right → merge (overlapping FOV)
Rear-left + Rear-right → merge (overlapping FOV)
Left + Right → separate (non-overlapping)
Front-center → separate
Rear-center → separate
Result: 4-5 raycasting passes instead of 8
- Overlapping pairs are merged (correct, efficient)
- Non-overlapping LiDARs raycast independently (correct sensor origin)Recommendation: Strategy 1 (merge-then-raycast) with per-LiDAR sensor origins. This is simplest, most GPU-efficient, and accurate enough given that ego-motion compensation is already in the pipeline.
4.3 Ego-Motion Compensation
During a 100ms scan cycle at 15 km/h, the vehicle moves ~42cm -- more than a 0.2m voxel. Without compensation, the map smears.
"""
Ego-motion compensation for multi-LiDAR occupancy.
Uses the existing GTSAM pose at each LiDAR timestamp.
"""
def compensate_ego_motion(points, timestamps, pose_buffer):
"""
Transform each point to the reference time (end of scan).
Args:
points: (N, 3) point cloud
timestamps: (N,) per-point timestamps (from LiDAR hardware)
pose_buffer: GTSAM pose history at 500 Hz (IMU rate)
Returns:
compensated_points: (N, 3) points in reference frame
"""
t_ref = timestamps[-1] # Reference time = end of scan
T_ref = pose_buffer.interpolate(t_ref) # SE(3) pose at reference time
T_ref_inv = T_ref.inverse()
compensated = np.empty_like(points)
for i in range(len(points)):
T_i = pose_buffer.interpolate(timestamps[i]) # Pose when point was captured
T_delta = T_ref_inv @ T_i # Relative motion since capture
compensated[i] = T_delta @ points[i] # Transform to reference frame
return compensated
# In practice: vectorized with batch interpolation + matrix multiply on GPU
# The GTSAM 500 Hz pose buffer provides sub-millimeter interpolation accuracyIntegration with existing reference airside AV stack: The GTSAM localization node already produces high-rate poses. The occupancy node subscribes to /gtsam/pose and uses it for ego-motion compensation. No new sensor fusion is needed.
4.4 Temporal Consistency
Occupancy grids accumulate evidence over time. But when should old observations be forgotten?
Problem scenarios:
1. Vehicle drives past a parked baggage cart.
→ Cart reflected in occupancy grid. Vehicle moves on.
→ 10 seconds later, cart drives away.
→ Occupancy grid still shows "occupied" (no new observations from that area).
→ Ghost obstacle!
2. Aircraft pushback: tail sweeps through 15m arc over 30 seconds.
→ Old tail positions remain "occupied" in grid.
→ Planner sees a solid wall where the tail used to be.
→ Unnecessary detour!Solutions:
| Method | Approach | Pros | Cons |
|---|---|---|---|
| Log-odds decay | Multiply all log-odds by factor <1 each frame | Simple, uniform forgetting | Forgets everything equally |
| Timestamp-based | Ignore observations older than T seconds | Sharp cutoff, tunable | Sudden transitions |
| Recency-weighted | Weight observations by age: w = exp(-alpha * age) | Smooth forgetting | More complex update |
| Detection-gated | Only forget cells associated with detected dynamic objects | Preserves static map | Requires object detection |
Recommended approach for reference airside AV stack:
Two-grid architecture:
Static grid (persistent):
- Updated only by points classified as static (ground, buildings, parked aircraft)
- Very slow decay (hours to days)
- Provides the persistent map layer
Dynamic grid (short-memory):
- Updated by all points
- Fast decay: log-odds *= 0.9 every 200ms (half-life ~1.3 seconds)
- Captures moving objects, temporary obstacles
Merged grid (for planning):
- l_merged(x) = max(l_static(x), l_dynamic(x))
- Static map provides baseline, dynamic grid adds transient occupancyThis two-grid approach prevents ghost obstacles from dynamic objects while maintaining a stable map of static infrastructure.
5. Dynamic Object Handling
5.1 The Static-Dynamic Separation Problem
An airport apron contains:
- Static: Buildings, fixed infrastructure, parked aircraft (for minutes-hours)
- Quasi-static: Parked GSE (may move without warning), ground crew (stationary but will move)
- Dynamic: Moving GSE, walking personnel, taxiing aircraft, pushback operations
The occupancy grid should treat these differently:
| Category | Map behavior | Safety behavior |
|---|---|---|
| Static | Permanent, high confidence | Plan around with fixed margins |
| Quasi-static | Persistent until observed moving | Plan around, monitor for change |
| Dynamic | Short-memory, rapidly updating | Track, predict, avoid with velocity-dependent margins |
5.2 Moving Object Segmentation
Before updating the occupancy grid, separate static from dynamic points:
"""
Simple moving object segmentation using map consistency.
Points that are occupied in the existing static map but shouldn't be
(or vice versa) are classified as dynamic.
"""
def segment_dynamic_points(points, static_grid, detection_boxes):
"""
Classify points as static or dynamic.
Method 1: Detection-based (primary)
Points inside tracked object bounding boxes → dynamic
All other points → static
Method 2: Map-consistency (secondary)
Points in cells that were recently free → likely dynamic
Points in cells that were occupied and still occupied → static
Method 3: Multi-scan difference (tertiary)
Compare current scan to ego-compensated previous scan
Points with no corresponding point in previous scan → possibly dynamic
Returns:
static_mask: (N,) boolean — True if point is static
dynamic_mask: (N,) boolean — True if point is dynamic
"""
N = len(points)
static_mask = np.ones(N, dtype=bool)
dynamic_mask = np.zeros(N, dtype=bool)
# Method 1: Detection-based
for box in detection_boxes:
inside = points_in_box(points, box)
dynamic_mask[inside] = True
static_mask[inside] = False
# Method 2: Map-consistency for remaining points
for i in range(N):
if dynamic_mask[i]:
continue
cell = point_to_cell(points[i])
if static_grid[cell] < -2.0: # Cell was confidently free
dynamic_mask[i] = True # New point in free space → dynamic
static_mask[i] = False
return static_mask, dynamic_mask5.3 Detection-Occupancy Pipeline Coordination
The occupancy grid and the object detector (PointPillars/CenterPoint) should coordinate:
Point cloud arrives
│
├── PointPillars detection (6.84ms, parallel on GPU)
│ → Bounding boxes for known objects
│
├── Occupancy grid raycasting (3-15ms, parallel on GPU)
│ → Full volumetric occupancy
│
└── Coordination step (1-2ms):
1. Points inside detection boxes → excluded from static grid update
2. Detection boxes → immediate costmap inflation (even before occupancy converges)
3. Occupied cells NOT inside any detection → unknown obstacle (FOD, novel object)
→ Higher cost in planner, alert operatorThe "unknown occupied" category is critical for airside. A cell that is occupied but does not match any detected object class is either:
- FOD (foreign object debris) -- requires alerting and possible stop
- Novel GSE type not in training set -- requires caution
- Sensor artifact -- recency-weighted, will fade
This is one of the key advantages of maintaining both detection AND occupancy: the occupancy grid catches what the detector misses.
5.4 Recency-Weighted Occupancy for Dynamic Environments
"""
Recency-weighted occupancy: recent observations count more.
Implemented as exponential decay of log-odds toward zero.
"""
def decay_occupancy_grid(grid, timestamps, current_time, half_life_seconds):
"""
Decay all occupied cells toward unknown (log-odds = 0).
For cells that haven't been observed recently:
l(t) = l(t_obs) * exp(-lambda * (t - t_obs))
where lambda = ln(2) / half_life
Args:
grid: occupancy hash table
timestamps: per-voxel last-observation timestamps
current_time: current frame time
half_life_seconds: time for log-odds to decay by 50%
"""
lam = 0.693 / half_life_seconds # ln(2) / half_life
for voxel in grid.active_voxels():
age = current_time - voxel.timestamp
if age > 0:
decay = exp(-lam * age)
voxel.log_odds *= decay
# If decayed below threshold, remove from hash table
if abs(voxel.log_odds) < 0.1:
grid.remove(voxel)Recommended decay parameters for airside:
| Object category | Half-life | Rationale |
|---|---|---|
| Ground surface | Never (static grid) | Ground doesn't move |
| Buildings/infrastructure | Never (static grid) | Static structure |
| Parked aircraft | 5 minutes | May pushback at any time |
| Parked GSE | 30 seconds | May drive away without warning |
| Moving objects | 1.0 seconds | ~3-5 frames persistence (eliminates flicker) |
| Jet blast zone markers | 30 seconds | Persist until engines confirmed off |
| Personnel | 0.5 seconds | Fast decay -- safety-critical, must track actively |
6. Multi-Resolution Grids
6.1 The Resolution Tradeoff
| Resolution | Detects | Memory (200m x 200m x 8m) | Voxels/ray (20m) | Use Case |
|---|---|---|---|---|
| 0.05m | 5cm FOD, bolt heads | 10.24 B voxels (40 GB) | 400 | Impractical at scale |
| 0.1m | 10cm FOD, small tools | 1.28 B voxels (5 GB) | 200 | Close range only |
| 0.2m | Personnel limbs, barriers | 160M voxels (640 MB) | 100 | Primary resolution |
| 0.4m | Personnel body, GSE | 20M voxels (80 MB) | 50 | Medium range |
| 0.8m | Vehicles, aircraft parts | 2.5M voxels (10 MB) | 25 | Long range |
| 1.6m | Large obstacles only | 312K voxels (1.2 MB) | 12 | Very long range / overview |
No single resolution works everywhere. 0.1m catches FOD but is too expensive. 0.8m is efficient but misses personnel.
6.2 Multi-Resolution Architecture
Multi-resolution occupancy grid:
Zone 1: NEAR (0-15m from vehicle)
Resolution: 0.1m
Grid size: 30m x 30m x 6m = 300 x 300 x 60 = 5.4M voxels
Memory: ~11 MB (int16)
Purpose: FOD detection, precise clearance for docking
Update rate: 10 Hz (every frame)
Zone 2: MEDIUM (15-50m)
Resolution: 0.4m
Grid size: 100m x 100m x 8m = 250 x 250 x 20 = 1.25M voxels
Memory: ~2.5 MB (int16)
Purpose: GSE/personnel detection, path planning
Update rate: 10 Hz
Zone 3: FAR (50-150m)
Resolution: 0.8m
Grid size: 300m x 300m x 8m = 375 x 375 x 10 = 1.4M voxels
Memory: ~2.8 MB (int16)
Purpose: Large obstacle awareness, route planning
Update rate: 5 Hz (every other frame)
Total voxels: ~8M
Total memory: ~16 MB
Total update: ~5-10ms GPU (well within budget)6.3 Implementation Approaches
Approach 1: Separate grids (simplest)
Three separate spatial hash tables, one per resolution.
Each point is assigned to a zone based on range from sensor.
Points are raycasted into the appropriate grid.
Pro: Simple, each grid is independent
Con: Zone boundaries need overlap to avoid gaps
Points near boundaries update two gridsApproach 2: Hierarchical octree (OctoMap-style)
Single octree with adaptive resolution.
Near surfaces: subdivide to fine resolution.
Empty space: coarse nodes.
Pro: Single data structure, adaptive
Con: CPU-bound (as discussed in Section 2.3)Approach 3: Cascaded dense grids (recommended for GPU)
Three dense GPU arrays, each centered on the vehicle.
Each frame:
1. Ego-shift all grids (circular buffer pointer offset)
2. Raycast into appropriate grid based on point range
3. Merge for costmap: near grid overrides medium overrides far
GPU implementation:
Grid 1 (near): cudaMalloc(5.4M * sizeof(int16_t)) → 10.8 MB
Grid 2 (medium): cudaMalloc(1.25M * sizeof(int16_t)) → 2.5 MB
Grid 3 (far): cudaMalloc(1.4M * sizeof(int16_t)) → 2.8 MB
Total: 16.1 MB GPU — trivial on 32 GB Orin
Pro: Dense arrays → perfect GPU memory coalescing
No hash table overhead
Circular buffer means no data copy on ego-shift
Con: Fixed grid extent (but 300m is enough for airport)
No fine detail beyond 15m6.4 Circular Buffer for Ego-Centric Grids
As the vehicle moves, the grid must shift to stay centered. Copying the entire grid each frame is wasteful. A circular buffer avoids this:
/**
* Circular buffer occupancy grid.
* Grid wraps around in x and y. Moving the vehicle just changes the offset.
* Cells that wrap to the "back" are cleared (new unknown territory).
*/
class CircularOccupancyGrid {
int16_t* grid; // Dense 3D array on GPU
int3 dimensions; // (nx, ny, nz)
int3 offset; // Current circular offset
float voxel_size;
float3 origin; // World position of grid center
// Convert world position to array index (with wrapping)
__device__ int3 world_to_index(float3 world_pos) {
int3 grid_coord = make_int3(
(int)((world_pos.x - origin.x) / voxel_size) + dimensions.x / 2,
(int)((world_pos.y - origin.y) / voxel_size) + dimensions.y / 2,
(int)((world_pos.z - origin.z) / voxel_size)
);
// Wrap with circular offset
return make_int3(
(grid_coord.x + offset.x) % dimensions.x,
(grid_coord.y + offset.y) % dimensions.y,
grid_coord.z // z does not wrap (vehicle doesn't fly)
);
}
// Shift grid when vehicle moves (just update offset + clear new cells)
void shift(float3 delta_position) {
int3 delta_cells = make_int3(
(int)(delta_position.x / voxel_size),
(int)(delta_position.y / voxel_size),
0
);
if (delta_cells.x == 0 && delta_cells.y == 0) return;
offset.x = (offset.x + delta_cells.x) % dimensions.x;
offset.y = (offset.y + delta_cells.y) % dimensions.y;
origin.x += delta_cells.x * voxel_size;
origin.y += delta_cells.y * voxel_size;
// Clear newly exposed cells (GPU kernel: set to 0)
clear_new_cells_kernel<<<...>>>(grid, offset, delta_cells, dimensions);
}
};Performance: Shifting the grid takes <0.1ms (only clearing new edge cells). This is vastly more efficient than rebuilding or copying the entire grid.
7. TSDF vs Occupancy vs ESDF
7.1 Three Distance-Based Representations
| Representation | Cell stores | Used for | Sign convention |
|---|---|---|---|
| Occupancy | P(occupied) or log-odds | Collision detection, costmap | +occupied, -free |
| TSDF | Truncated signed distance to nearest surface | Surface reconstruction, mesh extraction | +outside, -inside |
| ESDF | Euclidean distance to nearest occupied cell | Path planning (safe distance) | Always positive |
Surface
─────────|████████|─────────
TSDF: +d +d 0 -d -d (truncated beyond ±τ)
Occ: free free OCC occ occ (behind surface is unknown or occupied)
ESDF: 3m 2m 1m 0 0 (distance to nearest surface)7.2 When to Use Each
Occupancy (log-odds):
- Best for: binary "can I drive here?" decisions
- Best for: Bayesian evidence accumulation from multiple observations
- Best for: simple costmap generation (threshold → binary → inflate)
- nvblox can produce this alongside TSDF
- Mathematical foundation: Thrun et al. (2005)
TSDF (Truncated Signed Distance Function):
- Best for: surface reconstruction and mesh extraction
- Best for: incremental mapping (TSDF fusion is weighted averaging)
- nvblox default representation
- Cannot distinguish "free space" from "unobserved space" without occupancy layer
- Truncation (typically ±3 voxels) means no information far from surfaces
ESDF (Euclidean Signed Distance Function):
- Best for: gradient-based path planning (CHOMP, TrajOpt)
- Provides exact distance-to-obstacle at every point
- Expensive to compute from scratch but can be incrementally updated from TSDF
- nvblox computes ESDF from TSDF automatically
- The gradient of ESDF gives the "push direction" away from obstacles
7.3 Comparison for Planning
| Feature | Log-Odds Occupancy | TSDF | ESDF |
|---|---|---|---|
| Costmap generation | Direct threshold | Must convert to occupancy first | Direct (distance = cost) |
| Inflation layers | Must compute separately | Must convert | Built-in (ESDF = distance) |
| Speed-dependent margins | Manual per-speed inflation | Manual | Natural (threshold ESDF at speed-dependent distance) |
| Gradient for optimization | Not available | Surface only | Full gradient field |
| Memory per voxel | 2 bytes (int16 log-odds) | 4 bytes (float16 distance + weight) | 2 bytes (float16 distance) |
| Update complexity | O(rays * voxels_per_ray) | O(points) | O(changed_voxels * neighborhood) |
| reference airside AV stack Frenet planner | Good (binary costmap) | Overkill | Excellent but not needed |
7.4 Recommended Architecture
LiDAR points
│
├── GPU Raycasting → Log-Odds Occupancy Grid
│ │ (primary representation)
│ │
│ ├── Threshold → Binary 2D Costmap → Frenet Planner (current)
│ │
│ └── Optionally: ESDF computation → CBF safety filter (future)
│
└── nvblox (parallel) → TSDF → ESDF → mesh
│ (visualization, future neural planner input)
└── ROS 2 bridge or C++ API
Phase 1: Log-odds occupancy only → costmap → Frenet
Phase 2: Add ESDF for CBF integration (see planning/cbf-safety-critical.md)
Phase 3: Add nvblox for mesh/TSDF if cameras are addedFor the Frenet planner, log-odds occupancy is sufficient and simplest. ESDF becomes valuable when integrating CBF safety filters or optimization-based planners, which use the distance field gradient for constraint enforcement.
8. Costmap Generation for Planning
8.1 From 3D Occupancy to 2D Costmap
The Frenet planner operates in 2D (lateral/longitudinal along a reference path). The 3D occupancy grid must be projected to a 2D costmap:
"""
Project 3D occupancy grid to 2D costmap with height filtering.
Critical for airside: must filter by vehicle height to allow passage under wings.
"""
def occupancy_3d_to_costmap_2d(
grid_3d, # 3D occupancy grid (nx, ny, nz)
vehicle_height=2.5, # third-generation tug height in meters
ground_clearance=0.3, # Minimum ground clearance
z_min=-0.5, # Below ground (ramps, depressions)
z_max=None, # Will be set to vehicle_height + margin
safety_margin=0.3 # Extra height margin above vehicle
):
"""
For each (x, y) column in the 3D grid, check if any voxel
between z_min and z_max (vehicle envelope) is occupied.
z_max = vehicle_height + safety_margin = 2.5 + 0.3 = 2.8m
This means:
- Wing at z=5m → NOT projected (above vehicle)
- Wing at z=2.5m → PROJECTED (interferes with vehicle)
- Ground obstacle at z=0.1m → PROJECTED
- Depressed drain at z=-0.3m → PROJECTED (wheel hazard)
"""
if z_max is None:
z_max = vehicle_height + safety_margin
nz_min = int((z_min - grid_3d.z_origin) / grid_3d.voxel_size)
nz_max = int((z_max - grid_3d.z_origin) / grid_3d.voxel_size)
costmap_2d = np.zeros((grid_3d.nx, grid_3d.ny), dtype=np.uint8)
for ix in range(grid_3d.nx):
for iy in range(grid_3d.ny):
# Check all voxels in the vehicle height envelope
max_occ = 0
for iz in range(nz_min, nz_max + 1):
log_odds = grid_3d.get(ix, iy, iz)
if log_odds > 0.5: # Likely occupied
max_occ = max(max_occ, log_odds)
if max_occ > 2.0:
costmap_2d[ix, iy] = 254 # Lethal (hard obstacle)
elif max_occ > 0.5:
costmap_2d[ix, iy] = 200 # High cost (likely obstacle)
# else: 0 (free)
return costmap_2d
# GPU version: one thread per (x, y) column → trivially parallel
# Expected: <0.5ms for 1000x1000 grid on Orin8.2 Inflation Layers
Raw occupancy gives a binary obstacle map. The planner needs graduated cost zones around obstacles -- inflation layers:
Obstacle inflation for airside:
Layer 1: LETHAL (0m from obstacle)
Cost: 254 (impassable)
Layer 2: INSCRIBED (vehicle inscribed radius, ~1.5m for third-generation tug)
Cost: 253 (collision guaranteed if center is here)
Layer 3: INFLATION (variable, speed-dependent)
Cost: decays linearly or exponentially from 253 to 0
At 5 km/h: 1.5m inflation radius
At 15 km/h: 3.0m inflation radius
At 25 km/h: 5.0m inflation radius
Layer 4: AIRCRAFT SPECIAL (near detected aircraft)
Additional 3m buffer around fuselage
Additional 50m behind engines (jet blast zone)
Additional 5m around intake (FOD ingestion zone)
Cost: proportional to proximity, capped at 2538.3 Speed-Dependent Costmaps
At different speeds, the planner needs different safety margins. Rather than a single costmap, maintain multiple inflation configurations or dynamically recompute:
"""
Speed-dependent costmap inflation.
Called every planning cycle with current vehicle speed.
"""
def inflate_costmap(binary_costmap, vehicle_speed_mps, vehicle_radius=1.5):
"""
Args:
binary_costmap: 2D grid (0=free, 254=occupied)
vehicle_speed_mps: current speed in m/s
vehicle_radius: inscribed radius in meters
Returns:
inflated_costmap: 2D grid with graduated costs
"""
# Stopping distance (dry tarmac, μ=0.7): d = v² / (2 * μ * g)
stopping_distance = vehicle_speed_mps**2 / (2 * 0.7 * 9.81)
# Reaction distance (at 10 Hz planning: 1 frame = 100ms)
reaction_distance = vehicle_speed_mps * 0.1
# Total inflation radius
inflation_radius = vehicle_radius + stopping_distance + reaction_distance + 0.5
# 0.5m additional safety margin
# Example at different speeds:
# 5 km/h (1.4 m/s): 1.5 + 0.14 + 0.14 + 0.5 = 2.28m
# 15 km/h (4.2 m/s): 1.5 + 1.28 + 0.42 + 0.5 = 3.70m
# 25 km/h (6.9 m/s): 1.5 + 3.46 + 0.69 + 0.5 = 6.15m
inflated = apply_distance_transform(binary_costmap, inflation_radius)
return inflated8.4 Height-Specific Costmap Layers for Airside
Aircraft present a unique challenge: the wing overhangs at 4-6m, the fuselage is at 2-5m, and the engines/gear are at 0-3m. A single height-filtered costmap cannot capture this complexity. The solution is multiple height-band costmaps:
Height band costmaps:
Band 0: GROUND (-0.5m to 0.3m)
Captures: FOD, curbs, drainage channels, ramp lips, chains
Use: FOD detection, surface traversability
Band 1: VEHICLE BODY (0.3m to 2.8m)
Captures: GSE, personnel, aircraft gear, engine nacelles, belt loaders
Use: PRIMARY planning costmap — this is what the Frenet planner uses
Band 2: OVERHEAD (2.8m to 6.0m)
Captures: Aircraft wings, jet bridges, building overhangs
Use: Route planning for tall vehicles (POD is taller than third-generation tug)
Band 3: TALL (6.0m to 12.0m)
Captures: Aircraft tail, control tower, light masts
Use: Not relevant for ground vehicles, but useful for map building
Primary costmap = Band 1
Vehicle-specific: each vehicle type uses its own height envelope
third-generation tug: 2.5m → Band 1 up to 2.8m (2.5 + 0.3 margin)
POD: 3.0m → Band 1 up to 3.3m8.5 Integration with Frenet Planner
The current reference airside AV stack Frenet planner generates 420 candidate trajectories per cycle. Each candidate is evaluated against the costmap:
Current reference airside AV stack obstacle check:
Point obstacles (from RANSAC clusters) → check if trajectory passes near
Proposed occupancy-based check:
For each candidate trajectory (420 per cycle):
Sample N points along trajectory (N=20-50)
For each point:
Look up 2D costmap value
Accumulate cost
Total cost = sum of costmap values along trajectory
Select trajectory with lowest (cost + smoothness + reference_deviation)
Advantages of costmap-based over point-cluster-based:
1. Continuous cost field (no gaps between sparse clusters)
2. Implicit inflation (already baked into costmap)
3. Unknown-occupied cells naturally avoided
4. Fleet-shared obstacles included automatically
5. Speed-dependent margins applied uniformly9. Fleet-Shared Occupancy
9.1 Why Fleet Sharing Matters for Airside
Airport aprons are large, open spaces with significant occlusion (aircraft fuselage blocks 60m of width). A single vehicle cannot observe the far side of a parked aircraft. But another vehicle on the other side can. Fleet-shared occupancy creates cooperative situational awareness.
Vehicle A Vehicle B
●═══► ─── cannot see ───█████████ ─── sees fine ─── ◄═══●
Aircraft
(60m long)
Without sharing: Vehicle A blind to activity behind aircraft
With sharing: Vehicle B's occupancy fills Vehicle A's blind spot9.2 What to Share
Sharing the raw occupancy grid is too expensive. Instead, share incremental updates:
| Data type | Size per update | Update rate | Bandwidth |
|---|---|---|---|
| Full 3D grid | 16-640 MB | 1 Hz | 16-640 MB/s (infeasible) |
| Changed voxels only | 10-100 KB | 5 Hz | 50-500 KB/s |
| Compressed changed voxels | 2-20 KB | 5 Hz | 10-100 KB/s |
| Detected obstacle list | 1-5 KB | 10 Hz | 10-50 KB/s |
| BEV occupancy image (compressed) | 5-20 KB | 2 Hz | 10-40 KB/s |
Recommended: BEV occupancy image at 2 Hz. Compress the 2D height-filtered costmap to a PNG/JPEG-like format. At 0.4m resolution covering 200m x 200m = 500 x 500 pixels:
Raw: 500 x 500 x 1 byte = 250 KB
PNG compressed (mostly empty): ~5-15 KB
At 2 Hz: 10-30 KB/s per vehicle
20-vehicle fleet: 200-600 KB/s total bandwidth
Airport 5G capacity: 1+ Gbps → negligible load9.3 Coordinate Transforms
Each vehicle maintains an ego-centric occupancy grid. Sharing requires transformation to a common frame:
"""
Fleet occupancy fusion.
Each vehicle publishes its occupancy in a shared world frame.
"""
class FleetOccupancyFusion:
def __init__(self, world_frame="airport_map"):
self.world_frame = world_frame
self.vehicle_grids = {} # vehicle_id → (grid, timestamp, pose)
def receive_update(self, vehicle_id, bev_grid, vehicle_pose, timestamp):
"""
Receive a BEV occupancy update from another vehicle.
Args:
vehicle_id: unique identifier
bev_grid: 2D occupancy grid in vehicle's frame (500x500, uint8)
vehicle_pose: SE(2) pose in world frame (x, y, theta)
timestamp: observation time
"""
# Transform grid to world frame
world_grid = transform_grid(bev_grid, vehicle_pose,
self.world_resolution)
self.vehicle_grids[vehicle_id] = (world_grid, timestamp, vehicle_pose)
def get_fused_grid(self, ego_pose, current_time, max_age_seconds=2.0):
"""
Fuse all vehicle grids into ego-centric costmap.
Staleness: discard observations older than max_age_seconds.
Conflict: take maximum occupancy (conservative fusion).
"""
ego_grid = self.local_occupancy.copy()
for vid, (grid, ts, pose) in self.vehicle_grids.items():
age = current_time - ts
if age > max_age_seconds:
continue # Too stale, discard
# Staleness weight: recent observations count more
weight = max(0, 1.0 - age / max_age_seconds)
# Transform to ego frame
ego_frame_grid = transform_to_ego(grid, pose, ego_pose)
# Conservative fusion: max of local and remote
ego_grid = np.maximum(ego_grid,
(ego_frame_grid * weight).astype(np.uint8))
return ego_grid9.4 Temporal Staleness and Conflict Resolution
Fleet-shared occupancy data is always stale by the network latency + processing time (typically 50-200ms on airport 5G).
Staleness handling:
| Age | Confidence | Action |
|---|---|---|
| 0-200ms | High | Trust fully, use in planning |
| 200-500ms | Medium | Use with reduced weight (0.5-0.8) |
| 500-1000ms | Low | Use for awareness only, not planning |
| >1000ms | Expired | Discard (revert to local-only) |
| Network failure | Unknown | Degrade to local-only (safe default) |
Conflict resolution when two vehicles disagree:
Vehicle A says cell (50, 30) is OCCUPIED (log-odds +3.0)
Vehicle B says cell (50, 30) is FREE (log-odds -2.0)
Resolution strategies:
1. CONSERVATIVE (recommended for safety): max(+3.0, -2.0) = +3.0 → occupied
→ Never ignores occupied evidence, even if outnumbered
2. MAJORITY: if more vehicles say free than occupied → free
→ Dangerous: one vehicle with direct line of sight is more reliable
3. RECENCY: most recent observation wins
→ Reasonable for dynamic objects, but wrong for static (building)
4. VIEWPOINT-WEIGHTED: weight by angle to cell
→ Vehicle with direct line of sight gets higher weight
→ Best accuracy but most complexRecommendation: Conservative (max) fusion for occupied cells, recency-weighted for free cells. This ensures that a detected obstacle is never ignored, while allowing cells to be cleared when the observing vehicle confirms they are free.
9.5 ROS Message Design
# FleetOccupancyUpdate.msg (custom ROS message)
Header header # Timestamp and frame_id
string vehicle_id # Source vehicle identifier
geometry_msgs/Pose2D vehicle_pose # Vehicle pose in world frame
float32 grid_resolution # Meters per cell
uint32 grid_width # Number of cells in x
uint32 grid_height # Number of cells in y
float32 origin_x # World x of grid origin
float32 origin_y # World y of grid origin
uint8[] data # Compressed occupancy data (PNG or run-length encoded)
float32 compression_ratio # For debugging/monitoring
uint32 num_changed_cells # For monitoring update density10. Airside-Specific Considerations
10.1 Airport Geometry Challenges
Airport aprons are fundamentally different from urban streets:
| Challenge | Urban road | Airport apron | Impact on occupancy grid |
|---|---|---|---|
| Open space | 10-30m wide roads | 200m+ wide aprons | Need large grid extents |
| Obstacle scale | Cars (4m) | Aircraft (30-80m) | Single object spans hundreds of voxels |
| Height variation | Buildings (uniform edge) | Wings, engines, gear (complex 3D) | Must maintain true 3D, not just BEV |
| Ground objects | Curbs, barriers | FOD (bolts, rags, panel fragments) | Need 0.1m resolution near vehicle |
| Invisible hazards | None | Jet blast, intake suction zone | Cannot be detected by occupancy alone |
| Dynamic scale | Cars at 0-60 km/h | Pushback at 1-3 km/h, aircraft taxi at 20-30 km/h | Wide range of dynamic speeds |
| Reflective surfaces | Glass buildings | Wet tarmac, aircraft skin | LiDAR multipath → phantom voxels |
| Ground texture | Asphalt (uniform) | Paint markings, manhole covers, drainage | Ground clutter at fine resolution |
10.2 Aircraft as Occupancy Objects
A single aircraft occupies a massive, complex 3D volume:
A320 dimensions in occupancy grid (at 0.2m resolution):
Length: 37.6m → 188 voxels
Wingspan: 35.8m → 179 voxels
Height: 11.8m → 59 voxels (but most volume is empty)
Actual occupied voxels (fuselage + wings + gear + engines): ~50,000-80,000
Bounding box voxels: 188 * 179 * 59 = 1,985,348
Fill ratio: ~3-4% (extremely sparse)
Aircraft occupancy zones:
WING ZONE (z = 4-6m):
└── Overhang: GSE can pass under if height < 4m
└── Tip clearance: 3m minimum lateral
FUSELAGE ZONE (z = 1-5m):
└── Solid obstacle, impassable
└── Doors may open (dynamic zone change)
ENGINE ZONE (z = 1-3m):
└── Physical obstacle + intake suction zone (5m forward)
└── Jet blast zone (50m+ behind) -- NOT visible in occupancy
GEAR ZONE (z = 0-2m):
└── Tight clearance, risk of FOD damage to aircraft
└── Nose gear capture zone for pushback (±10cm tolerance)10.3 Jet Blast and Invisible Hazards
Occupancy grids detect physical objects via LiDAR returns. Jet blast is invisible to LiDAR. The occupancy grid must be augmented with virtual obstacles:
"""
Virtual occupancy zones for invisible hazards.
These are injected into the occupancy grid based on aircraft state,
not based on sensor observations.
"""
def inject_virtual_obstacles(occupancy_grid, aircraft_states):
"""
Add virtual occupied zones for hazards that LiDAR cannot detect.
Sources:
- ADS-B / MLAT: aircraft position and heading
- A-CDM: engine start time (EOBT)
- Thermal camera: jet exhaust boundary (if available)
"""
for aircraft in aircraft_states:
if aircraft.engines_running:
# Jet blast zone: cone behind engines
# Length depends on thrust setting:
# Idle: 30-50m, breakaway: 100m+
# Width: expands at ~15 degrees from engine centerline
blast_zone = compute_jet_blast_cone(
engine_positions=aircraft.engine_positions,
heading=aircraft.heading,
thrust_level=aircraft.estimated_thrust,
safety_factor=1.5
)
# Mark blast zone as LETHAL in occupancy grid
for voxel in blast_zone:
occupancy_grid.set_occupied(voxel, log_odds=L_MAX)
# Intake suction zone: 5m hemisphere in front of engines
for engine_pos in aircraft.engine_positions:
intake_zone = compute_hemisphere(
center=engine_pos,
direction=aircraft.forward_direction,
radius=5.0
)
for voxel in intake_zone:
occupancy_grid.set_occupied(voxel, log_odds=L_MAX)10.4 FOD Detection at Ground Level
Foreign Object Debris (FOD) on airport surfaces is a critical safety concern. Objects as small as a bolt can cause $250K+ in aircraft engine damage. The occupancy grid can detect FOD if the near-field resolution is fine enough:
FOD detection requirements:
- Object size: >5cm (ICAO minimum reportable)
- Detection range: 0-20m ahead of vehicle
- Detection rate: >90% for objects >10cm
Occupancy grid for FOD:
- Near-field grid at 0.05m resolution: detects objects >5cm (2+ voxels)
- Near-field grid at 0.1m resolution: detects objects >10cm (2+ voxels)
Problem: at 0.05m, a 20m x 20m x 0.5m near-field grid is:
400 x 400 x 10 = 1.6M voxels → only 3.2 MB → feasible!
Enhanced FOD detection pipeline:
1. Height filter: retain only points at z = -0.1m to 0.3m (near ground)
2. Ground plane subtraction: remove ground surface
3. Remaining points in near-field → update 0.05m grid
4. Clusters of occupied voxels at ground level = FOD candidates
5. Filter: must persist for 3+ frames (eliminates splash, leaves, etc.)10.5 Reflective Surface Handling
Wet tarmac and polished aircraft skin cause LiDAR multipath reflections, creating phantom points below the ground plane or behind aircraft:
Mitigation strategies:
1. Ground plane filter: discard points below z = -0.3m (below ground surface)
→ Catches most wet-tarmac reflections (mirror image below ground)
2. Intensity filter: multipath reflections have lower intensity
→ Set minimum intensity threshold for occupancy updates
3. Multi-frame consistency: phantom reflections appear at different positions
each frame (specular reflection angle changes with vehicle motion)
→ Require 3+ consistent observations before marking occupied
4. Known reflector map: map locations of highly reflective surfaces
→ Reduce L_OCC in these areas (require more evidence)10.6 Occupancy Grid Parameters for Airside
| Parameter | Urban road value | Airport airside value | Rationale |
|---|---|---|---|
| Grid extent (x, y) | 100m x 100m | 200m x 200m | Larger open spaces, aircraft approach distance |
| Grid extent (z) | -3m to 3m | -0.5m to 8m | Aircraft tail height |
| Near resolution | 0.1m | 0.05-0.1m | FOD detection |
| Medium resolution | 0.2m | 0.2m | Standard for GSE/personnel |
| Far resolution | 0.5m | 0.8m | Large obstacles at range |
| L_OCC (general) | 0.85 | 0.85 | Standard |
| L_OCC (near aircraft) | 0.85 | 1.2 | Conservative: assume occupied faster |
| L_FREE | -0.4 | -0.3 | Slightly more conservative freeing |
| L_MAX | 4.6 | 3.5 | Prevent over-confidence (allows dynamic updates) |
| Decay half-life (dynamic) | 2.0s | 1.0s | Faster decay for slow-moving airport vehicles |
| Decay half-life (static) | 30min | 5min | Aircraft may pushback |
| Update rate | 10 Hz | 10 Hz | Matching LiDAR rate |
| Costmap publish rate | 10 Hz | 10 Hz | Matching planning rate |
11. Implementation with ROS Noetic
11.1 ROS Node Architecture
ROS Node Graph for Occupancy Grid:
/lidar_merger (existing)
Publishes: /merged_points (sensor_msgs/PointCloud2)
Rate: 10 Hz
Content: All LiDARs merged, ego-motion compensated
/gtsam_localization (existing)
Publishes: /vehicle_pose (geometry_msgs/PoseStamped)
Rate: 100 Hz (IMU-driven)
Content: SE(3) pose in map frame
/object_detector (existing or new: PointPillars)
Publishes: /detections (custom msg or jsk_msgs/BoundingBoxArray)
Rate: 10 Hz
Content: 3D bounding boxes of detected objects
/occupancy_grid_node (NEW — this document)
Subscribes:
/merged_points (10 Hz)
/vehicle_pose (100 Hz, buffered)
/detections (10 Hz, optional)
/fleet_occupancy (2 Hz, from fleet manager)
Publishes:
/occupancy_3d (custom OccupancyGrid3D msg, 10 Hz)
/costmap (nav_msgs/OccupancyGrid, 10 Hz)
/costmap_inflated (nav_msgs/OccupancyGrid, 10 Hz)
/fleet_occupancy_out (FleetOccupancyUpdate, 2 Hz)
/fod_candidates (PointCloud2, 10 Hz)
/occupancy_viz (visualization_msgs/MarkerArray, 2 Hz)
Parameters:
~voxel_size_near: 0.1 # meters
~voxel_size_medium: 0.2
~voxel_size_far: 0.8
~near_range: 15.0 # meters
~far_range: 50.0
~max_range: 150.0
~l_occ: 0.85
~l_free: -0.3
~l_max: 3.5
~l_min: -3.5
~decay_dynamic_halflife: 1.0 # seconds
~vehicle_height: 2.5
~inflation_inscribed: 1.5
~publish_rate: 10.0
~fleet_publish_rate: 2.011.2 Node Implementation Skeleton
/**
* GPU Occupancy Grid Node for ROS Noetic.
* Implemented as a nodelet for zero-copy point cloud subscription.
*/
#include <ros/ros.h>
#include <nodelet/nodelet.h>
#include <sensor_msgs/PointCloud2.h>
#include <nav_msgs/OccupancyGrid.h>
#include <geometry_msgs/PoseStamped.h>
#include <tf2_ros/buffer.h>
#include <tf2_ros/transform_listener.h>
// CUDA headers
#include <cuda_runtime.h>
namespace airside_occupancy {
class OccupancyGridNodelet : public nodelet::Nodelet {
public:
void onInit() override {
ros::NodeHandle& nh = getNodeHandle();
ros::NodeHandle& pnh = getPrivateNodeHandle();
// Load parameters
pnh.param("voxel_size_near", voxel_near_, 0.1);
pnh.param("voxel_size_medium", voxel_medium_, 0.2);
pnh.param("voxel_size_far", voxel_far_, 0.8);
pnh.param("near_range", near_range_, 15.0);
pnh.param("far_range", far_range_, 50.0);
pnh.param("vehicle_height", vehicle_height_, 2.5);
// Initialize GPU grids
initGPUGrids();
// Subscribers (zero-copy via nodelet)
sub_points_ = nh.subscribe("/merged_points", 1,
&OccupancyGridNodelet::pointCloudCallback, this);
sub_pose_ = nh.subscribe("/vehicle_pose", 100,
&OccupancyGridNodelet::poseCallback, this);
sub_detections_ = nh.subscribe("/detections", 1,
&OccupancyGridNodelet::detectionsCallback, this);
// Publishers
pub_costmap_ = nh.advertise<nav_msgs::OccupancyGrid>(
"/costmap", 1);
pub_costmap_inflated_ = nh.advertise<nav_msgs::OccupancyGrid>(
"/costmap_inflated", 1);
pub_fleet_ = nh.advertise<FleetOccupancyUpdate>(
"/fleet_occupancy_out", 1);
// Timer for periodic publishing
timer_ = nh.createTimer(ros::Duration(1.0 / 10.0),
&OccupancyGridNodelet::publishCallback, this);
NODELET_INFO("Occupancy grid initialized: near=%.2fm, med=%.2fm, far=%.2fm",
voxel_near_, voxel_medium_, voxel_far_);
}
void pointCloudCallback(const sensor_msgs::PointCloud2::ConstPtr& msg) {
// 1. Extract points from PointCloud2 (zero-copy if possible)
// 2. Upload to GPU
// 3. Separate into near/medium/far by range
// 4. GPU raycasting into each resolution grid
// 5. Decay dynamic grid
// 6. Project to 2D costmap
// 7. Inflate costmap
auto start = ros::WallTime::now();
// Upload points to GPU
uploadPointsToGPU(msg);
// GPU raycasting (see Section 3.3)
launchRaycastKernel(gpu_points_, num_points_,
sensor_origins_, num_sensors_,
gpu_grid_near_, gpu_grid_medium_, gpu_grid_far_);
// Decay dynamic cells
launchDecayKernel(gpu_grid_dynamic_, current_time_,
decay_halflife_);
// 3D→2D projection (see Section 8.1)
launchProjectionKernel(gpu_grid_near_, gpu_grid_medium_, gpu_grid_far_,
gpu_costmap_2d_, vehicle_height_);
// Inflate (see Section 8.2)
launchInflationKernel(gpu_costmap_2d_, gpu_costmap_inflated_,
inscribed_radius_, current_speed_);
// Download costmap from GPU
downloadCostmap(gpu_costmap_inflated_, costmap_msg_);
auto elapsed = (ros::WallTime::now() - start).toSec() * 1000;
NODELET_DEBUG("Occupancy update: %.1f ms (%d points)",
elapsed, num_points_);
}
private:
// GPU resources
float3* gpu_points_;
int16_t* gpu_grid_near_;
int16_t* gpu_grid_medium_;
int16_t* gpu_grid_far_;
uint8_t* gpu_costmap_2d_;
uint8_t* gpu_costmap_inflated_;
// ROS
ros::Subscriber sub_points_, sub_pose_, sub_detections_;
ros::Publisher pub_costmap_, pub_costmap_inflated_, pub_fleet_;
ros::Timer timer_;
// Parameters
double voxel_near_, voxel_medium_, voxel_far_;
double near_range_, far_range_;
double vehicle_height_;
double inscribed_radius_;
double decay_halflife_;
double current_speed_;
};
} // namespace airside_occupancy
#include <pluginlib/class_list_macros.h>
PLUGINLIB_EXPORT_CLASS(airside_occupancy::OccupancyGridNodelet,
nodelet::Nodelet)11.3 Existing ROS Packages
| Package | Description | Suitability for reference airside AV stack |
|---|---|---|
octomap_server | OctoMap-based 3D occupancy | Prototype only (CPU-bound, too slow for 4-8 LiDARs) |
costmap_2d | ROS Navigation stack 2D costmap | Good for 2D costmap layer management, inflation |
grid_map | ETH multi-layer grid map | Good for multi-layer 2D representation |
nvblox_ros | NVIDIA nvblox ROS wrapper | ROS 2 only (needs bridge or C++ API) |
voxblox | ETH TSDF/ESDF mapping | Mature, but CPU-based (slower than nvblox) |
gpu_voxels | GPU voxel grid library | Outdated, not maintained |
spatio_temporal_voxel_layer | Costmap layer with 3D voxels + decay | Excellent for costmap_2d integration |
Recommended integration path:
Phase 1 (prototype, 2-3 weeks):
octomap_server for 3D occupancy (CPU, single LiDAR only)
+ costmap_2d for 2D planning costmap
+ spatio_temporal_voxel_layer for decay
→ Validates the pipeline, identifies parameter tuning needs
Phase 2 (production, 6-10 weeks):
Custom GPU occupancy nodelet (CUDA raycasting, spatial hash)
+ costmap_2d for 2D costmap management
+ custom fleet sharing node
→ Full throughput, multi-LiDAR, meets 10 Hz at 4-8 LiDARs11.4 TF Frame Management
TF tree for occupancy:
map (GTSAM output, global frame)
└── vehicle_base (ego vehicle frame)
├── lidar_front_left
├── lidar_front_right
├── lidar_rear_left
├── lidar_rear_right
├── lidar_left
├── lidar_right
├── lidar_front_center
└── lidar_rear_center
Occupancy grid operates in:
- map frame (for fleet sharing, persistent map)
- vehicle_base frame (for ego-centric costmap, planning)
Each LiDAR→vehicle_base transform is static (from calibration).
vehicle_base→map transform is dynamic (from GTSAM at 100 Hz).11.5 spatio_temporal_voxel_layer for Quick Start
The spatio_temporal_voxel_layer ROS package is a drop-in costmap layer that provides 3D voxel tracking with built-in temporal decay -- closely matching the requirements described in this document:
# costmap_common_params.yaml
plugins:
- {name: spatio_temporal_voxel_layer, type: "spatio_temporal_voxel_layer/SpatioTemporalVoxelLayer"}
spatio_temporal_voxel_layer:
enabled: true
voxel_decay: 1.0 # seconds (maps to dynamic decay)
decay_model: 0 # 0=linear, 1=exponential
voxel_size: 0.2 # meters
track_unknown_space: true
mark_threshold: 0 # 0 = mark on first observation
update_footprint_enabled: true
combination_method: 1 # 1 = max (conservative)
origin_z: 0.0
z_resolution: 0.2
z_voxels: 15 # 15 * 0.2 = 3.0m height
observation_sources: merged_lidar
merged_lidar:
topic: /merged_points
sensor_frame: vehicle_base
observation_persistence: 0.0 # Use voxel_decay instead
expected_update_rate: 10.0
data_type: PointCloud2
min_obstacle_height: 0.1
max_obstacle_height: 2.8 # Vehicle height + margin
marking: true
clearing: true
min_z: 0.1
max_z: 2.8Limitations of spatio_temporal_voxel_layer:
- CPU-based (OpenVDB backend) -- marginal at 4-8 LiDARs
- No multi-resolution support
- No fleet sharing
- No GPU raycasting
- Suitable for Phase 1 prototype, not production
12. Performance on Orin
12.1 GPU Memory Budget
Orin AGX 64GB: 64 GB unified memory (shared CPU+GPU)
Orin AGX 32GB: 32 GB unified memory
Occupancy grid GPU memory allocation:
Near grid (0.1m, 30x30x6m):
300 x 300 x 60 = 5.4M voxels x 2 bytes = 10.8 MB
Medium grid (0.2m, 100x100x8m):
500 x 500 x 40 = 10M voxels x 2 bytes = 20 MB
(Dense. For sparse hash: ~2M occupied x 8 bytes = 16 MB)
Far grid (0.8m, 300x300x8m):
375 x 375 x 10 = 1.4M voxels x 2 bytes = 2.8 MB
Dynamic grid (0.2m, 100x100x4m):
500 x 500 x 20 = 5M voxels x 2 bytes = 10 MB
2D costmap (0.2m, 200x200m):
1000 x 1000 x 1 byte = 1 MB
2D costmap inflated:
1000 x 1000 x 1 byte = 1 MB
Point cloud buffer (GPU):
1.2M points x 16 bytes (xyzI) = 19.2 MB
Sensor origins:
8 x 12 bytes = 96 bytes
TOTAL GPU MEMORY: ~65 MB
(Out of 32 GB = 0.2% — negligible)12.2 CUDA Stream Management
Run occupancy grid operations on a dedicated CUDA stream to avoid blocking other GPU tasks:
// CUDA stream for occupancy (separate from PointPillars, localization)
cudaStream_t occupancy_stream;
cudaStreamCreate(&occupancy_stream);
// All occupancy kernels run on this stream
raycast_kernel<<<grid, block, 0, occupancy_stream>>>(...);
decay_kernel<<<grid, block, 0, occupancy_stream>>>(...);
projection_kernel<<<grid, block, 0, occupancy_stream>>>(...);
inflation_kernel<<<grid, block, 0, occupancy_stream>>>(...);
// Synchronize only when downloading result to CPU
cudaStreamSynchronize(occupancy_stream);
// Copy costmap to CPU for ROS publishing
cudaMemcpyAsync(cpu_costmap, gpu_costmap, size,
cudaMemcpyDeviceToHost, occupancy_stream);12.3 Pipeline Timing Breakdown
Per-frame occupancy pipeline on Orin AGX (4 LiDARs, ~300K points):
Step | GPU Time | Memory | Notes
------------------------------|----------|------------|--------
Point upload to GPU | 0.3ms | 4.8 MB | PCIe/unified memory
Static/dynamic separation | 0.5ms | negligible | Per-point classification
Near grid raycasting (0.1m) | 2-4ms | 10.8 MB | ~50K near points
Medium grid raycasting (0.2m) | 2-5ms | 20 MB | ~200K medium points
Far grid raycasting (0.8m) | 0.5-1ms | 2.8 MB | ~50K far points
Dynamic grid decay | 0.3ms | 10 MB | Parallel over all voxels
3D→2D projection | 0.3ms | 1 MB | Per-column max
Costmap inflation | 0.5ms | 1 MB | Distance transform
Fleet grid merge | 0.5ms | 1 MB | Max over remote grids
Download to CPU | 0.2ms | 1 MB | Async copy
------------------------------|----------|------------|--------
TOTAL | 7-13ms | ~65 MB | Well within 100ms budget
For 8 LiDARs (~900K points):
Raycasting steps roughly 2x: total ~12-22ms
Still within 100ms budget12.4 Concurrent Execution with Other Tasks
Orin GPU timeline (one 100ms planning cycle):
Time 0ms: PointPillars starts (6.84ms)
Time 0ms: Occupancy raycasting starts (parallel CUDA stream)
Time 7ms: PointPillars complete → detections available
Time 7ms: Static/dynamic separation uses detections
Time 10ms: Occupancy raycasting complete
Time 10ms: 3D→2D projection + inflation starts
Time 12ms: Costmap published → Frenet planner starts
Time 12ms: Occupancy visualization computed (background)
Time 22ms: Frenet planner completes (420 candidates evaluated)
Time 22ms: GTSAM localization update (parallel, 15-25ms)
...
Time 100ms: Next LiDAR scan arrives
GPU utilization: ~30-40% for occupancy + detection
Leaves ~60% for localization, future neural perception, etc.12.5 Performance Scaling
| Configuration | Points/frame | Raycasting (ms) | Total pipeline (ms) | Feasible? |
|---|---|---|---|---|
| 2 LiDARs | 150K | 2-4 | 4-7 | Easy |
| 4 LiDARs | 300K | 4-8 | 7-13 | Comfortable |
| 6 LiDARs | 600K | 7-14 | 12-20 | Good |
| 8 LiDARs | 900K | 10-20 | 15-28 | Sufficient (within 100ms) |
| 8 LiDARs + fleet | 900K + merge | 10-20 | 17-30 | Sufficient |
| 8 LiDARs + fleet + FOD | 900K + merge + fine | 10-20 | 20-35 | Sufficient |
Even the worst case (8 LiDARs + fleet + FOD detection) uses only 35ms of the 100ms budget, leaving ample room for other tasks.
12.6 Orin Power Modes
| Orin Power Mode | GPU Clock | Occupancy Latency | Use Case |
|---|---|---|---|
| MAXN (60W) | 1.3 GHz | 7-28ms | Full operation |
| 50W | 1.1 GHz | 9-35ms | Normal driving |
| 30W | 0.9 GHz | 12-45ms | Idle/low-speed |
| 15W | 0.6 GHz | 20-70ms | Parked, minimal awareness |
Even at 30W power mode, the occupancy grid runs within the 100ms budget. At 15W mode (parked/charging), occupancy can still maintain 10 Hz awareness for safety monitoring.
13. Key Takeaways
Classical occupancy grids are the mandatory foundation layer. Neural occupancy prediction (FlashOcc, SparseOcc) provides semantic understanding, but the geometric raycasting-based occupancy grid provides the certifiable, deterministic, zero-training-data safety baseline that every other perception module builds upon. The two are complementary, not competing.
Log-odds Bayesian update is the correct formulation. Addition instead of multiplication, numerically stable, trivially parallelizable on GPU, and O(1) per cell. Use asymmetric update strengths (L_OCC > |L_FREE|) to prevent phantom obstacles from noise while rapidly detecting real obstacles.
OctoMap is insufficient for production multi-LiDAR operation. Its CPU-bound octree traversal cannot process 400K-1.2M points at 10 Hz. Use it for prototyping only. Production requires GPU spatial hashing or nvblox.
GPU raycasting on Orin processes 8 LiDARs in 10-20ms. The 3D-DDA algorithm with spatial hashing achieves 7-28ms total pipeline latency, using only ~65 MB of GPU memory (0.2% of 32 GB Orin). This is comfortably within the 100ms planning cycle budget.
Multi-resolution is essential for airside. 0.1m near (FOD detection), 0.2m medium (personnel/GSE), 0.8m far (aircraft/building awareness). Total memory: ~16 MB. Cascaded dense grids with circular buffers provide optimal GPU performance.
Two-grid architecture solves the static/dynamic problem. A persistent static grid (slow decay) maintains infrastructure, while a fast-decay dynamic grid (1s half-life) tracks moving objects. Merge with max() for conservative planning.
Height-filtered costmap projection is critical for airside. Aircraft wings overhang at 4-6m; the third-generation tug at 2.5m can pass under. The 3D-to-2D projection must use the vehicle's height envelope, not a simple max-height projection.
Fleet-shared occupancy overcomes aircraft occlusion. A 60m aircraft fuselage blocks line-of-sight for a single vehicle. Sharing compressed BEV costmaps at 2 Hz (5-15 KB per update) over airport 5G fills blind spots with negligible bandwidth cost.
Virtual obstacles are required for invisible hazards. Jet blast, engine intake suction, and fuel spill zones are invisible to LiDAR. The occupancy grid must be augmented with state-derived virtual zones based on A-CDM aircraft status data.
FOD detection is feasible with a dedicated near-field fine grid. A 0.05m resolution grid covering 20m x 20m x 0.5m uses only 3.2 MB and can detect objects >5cm (2+ voxels). Multi-frame persistence filtering eliminates transient noise.
Reflective surfaces (wet tarmac, aircraft skin) cause phantom points. Mitigate with ground plane filtering, intensity thresholding, multi-frame consistency requirements, and known-reflector maps.
ESDF (Euclidean Signed Distance Field) is the optimal representation for future CBF integration. While log-odds occupancy is sufficient for the current Frenet planner, the ESDF gradient field is required for CBF-QP safety filters (see
30-autonomy-stack/planning/safety-critical-planning-cbf.md). nvblox computes ESDF from TSDF automatically.The
spatio_temporal_voxel_layerROS package provides a viable Phase 1 prototype in 2-3 weeks, but its CPU-based OpenVDB backend will not sustain 10 Hz with 4-8 LiDARs in production.Conservative fusion is the only safe fleet merge strategy. When two vehicles disagree on whether a cell is occupied, always assume occupied. A missed obstacle is catastrophic; a phantom obstacle merely causes a detour.
Occupancy grid adds the most value to the existing reference airside AV stack. The current RANSAC-cluster-to-Frenet pipeline has no explicit spatial representation between raw points and planning. Adding an occupancy grid provides: continuous cost fields (not sparse clusters), implicit inflation, unknown-object detection, fleet sharing, and the spatial backbone for every future capability.
The occupancy grid is fully deterministic and inspectable. Unlike neural perception, every cell value can be traced to specific LiDAR observations. This is a significant advantage for ISO 3691-4 certification and incident investigation.
Temporal decay must be class-aware. Personnel cells decay in 0.5s (must track actively), parked GSE in 30s (may move), static infrastructure never decays. Detection-gated decay prevents inappropriate forgetting of different object types.
14. Implementation Roadmap
14.1 Phased Plan
| Phase | Duration | Deliverable | Cost | Dependency |
|---|---|---|---|---|
| Phase 1: Prototype | 2-3 weeks | octomap_server + spatio_temporal_voxel_layer + costmap_2d integration. Single LiDAR, CPU. Validates pipeline and parameter tuning | $5-8K | None |
| Phase 2: GPU Core | 6-8 weeks | Custom CUDA raycasting nodelet. Multi-resolution grids. Spatial hashing. 4-8 LiDARs at 10 Hz. 2D costmap output for Frenet | $15-22K | Phase 1 (parameters) |
| Phase 3: Dynamic Handling | 3-4 weeks | Static/dynamic separation. Detection-gated decay. Recency weighting. Two-grid architecture | $8-12K | Phase 2, PointPillars detector |
| Phase 4: Fleet Sharing | 3-4 weeks | Compressed BEV publishing. Fleet fusion node. Staleness handling. Conservative merge | $8-12K | Phase 2, fleet communication infrastructure |
| Phase 5: Airside Tuning | 2-3 weeks | Virtual obstacle injection (jet blast, intake). FOD detection fine grid. Reflective surface handling. Height-band costmaps | $5-8K | Phase 3, A-CDM integration |
14.2 Total Investment
| Item | Cost |
|---|---|
| Phase 1 (prototype) | $5-8K |
| Phase 2 (GPU core) | $15-22K |
| Phase 3 (dynamic) | $8-12K |
| Phase 4 (fleet) | $8-12K |
| Phase 5 (airside) | $5-8K |
| Total | $41-62K |
| Timeline | 16-22 weeks |
14.3 Minimum Viable Product
Phase 1 + Phase 2 alone ($20-30K, 8-11 weeks) provide the core occupancy grid:
- Multi-LiDAR GPU raycasting at 10 Hz
- Multi-resolution grids (0.1m / 0.2m / 0.8m)
- 2D costmap generation for Frenet planner
- Height-filtered projection for aircraft wing clearance
- Basic temporal decay
This MVP immediately improves planning by replacing sparse RANSAC clusters with continuous occupancy-based costmaps, detecting unknown obstacles (FOD, novel objects), and providing the spatial backbone for all future perception capabilities.
15. References
Foundational Papers
- Moravec, H.P. and Elfes, A. "High resolution maps from wide angle sonar." ICRA, 1985. (The original occupancy grid paper)
- Thrun, S., Burgard, W., and Fox, D. Probabilistic Robotics. MIT Press, 2005. (Chapter 9: Occupancy Grid Mapping)
- Hornung, A. et al. "OctoMap: An Efficient Probabilistic 3D Mapping Framework Based on Octrees." Autonomous Robots, 2013. (OctoMap)
- Amanatides, J. and Woo, A. "A Fast Voxel Traversal Algorithm for Ray Tracing." Eurographics, 1987. (3D-DDA algorithm)
Modern Implementations
- Vizzo, I. et al. "VDBFusion: Flexible and Efficient TSDF Integration of Range Sensor Data." IEEE RA-L, 2022. (VDBFusion)
- Millane, A. et al. "nvblox: GPU-Accelerated Incremental Signed Distance Field Mapping." ICRA, 2024. (nvblox)
- Oleynikova, H. et al. "Voxblox: Incremental 3D Euclidean Signed Distance Fields for On-Board MAV Planning." IROS, 2017. (voxblox, CPU predecessor to nvblox)
- Niessner, M. et al. "Real-time 3D Reconstruction at Scale using Voxel Hashing." ACM TOG, 2013. (Spatial hashing for TSDF)
GPU Techniques
- Laine, S. and Karras, T. "Efficient Sparse Voxel Octrees." IEEE TVCG, 2011. (GPU sparse octrees)
- Steinbruecker, F. et al. "Volumetric 3D Mapping in Real-Time on a CPU." ICRA, 2014. (Real-time volumetric mapping)
ROS Packages
octomap:https://github.com/OctoMap/octomap(BSD license)octomap_server:https://github.com/OctoMap/octomap_mapping(BSD license)spatio_temporal_voxel_layer:https://github.com/SteveMacenski/spatio_temporal_voxel_layer(LGPL 2.1)costmap_2d: Part of ROS Navigation stack (BSD license)grid_map:https://github.com/ANYbotics/grid_map(BSD license)nvblox:https://github.com/nvidia-isaac/nvblox(Apache 2.0)voxblox:https://github.com/ethz-asl/voxblox(BSD license)VDBFusion:https://github.com/PRBonn/vdbfusion(MIT license)
Internal Repository References
| Topic | Document |
|---|---|
| Neural occupancy on Orin (FlashOcc, nvblox) | 30-autonomy-stack/world-models/occupancy-deployment-orin.md |
| Occupancy prediction world models | 30-autonomy-stack/world-models/occupancy-world-models.md |
| Occupancy flow and 4D scene understanding | 30-autonomy-stack/world-models/occupancy-flow-4d-scenes.md |
| LiDAR SLAM algorithms | 30-autonomy-stack/localization-mapping/overview/lidar-slam-algorithms.md |
| 3DGS for perception and mapping | 30-autonomy-stack/perception/overview/gaussian-splatting-driving.md |
| Sensor fusion architectures | 30-autonomy-stack/perception/overview/sensor-fusion-architectures.md |
| CBF safety-critical planning | 30-autonomy-stack/planning/safety-critical-planning-cbf.md |
| Runtime verification and monitoring | 60-safety-validation/runtime-assurance/runtime-verification-monitoring.md |
| Multi-LiDAR extrinsic calibration | 20-av-platform/sensors/multi-lidar-calibration.md |
| HD map standards for airside | 30-autonomy-stack/localization-mapping/maps/hd-map-standards-airside.md |
| Fleet cooperative perception | 30-autonomy-stack/perception/overview/collaborative-fleet-perception.md |
| Active perception and sensor scheduling | 30-autonomy-stack/perception/overview/active-perception-sensor-scheduling.md |
| Autonomous docking and precision positioning | 30-autonomy-stack/planning/autonomous-docking-precision-positioning.md |
| Night operations and thermal fusion | 30-autonomy-stack/perception/overview/night-operations-thermal-fusion.md |
| RoboSense LiDAR specifications | 20-av-platform/sensors/robosense-lidar.md |
| NVIDIA Orin technical specifications | 20-av-platform/compute/nvidia-orin-technical.md |
| TensorRT deployment guide | 20-av-platform/compute/tensorrt-deployment-guide.md |