Skip to content

Closed-Loop Data Flywheel for Airside Autonomous Operations

Executive Summary

The data flywheel is the core engine that transforms operational driving data into continuous model improvement. This document covers the ML-centric closed loop for the reference airside AV stack's airside operations: trigger-based data mining, auto-labeling pipelines, active learning selection, model training orchestration, deployment validation, and production monitoring — the intelligence layer that sits on top of the fleet data pipeline infrastructure (see 50-cloud-fleet/data-platform/fleet-data-pipeline.md). Tesla's data engine processes 160 petaflops daily across 10,000+ GPUs, training on billions of auto-labeled clips from 8.3B+ fleet miles. Waymo's content search system mines petabytes for specific scenarios. comma.ai's open fleet of 10,000+ devices enables rapid iteration with openpilot releasing every 2 weeks. For airport airside — where no public datasets exist and every frame has proprietary value — a well-designed flywheel is the difference between a static system and one that improves with every mile driven. This document provides the complete flywheel architecture scaled to the reference airside AV stack's current fleet (5-20 vehicles) with a path to 100+ vehicles across multiple airports.


Table of Contents

  1. The Data Flywheel Concept
  2. Trigger-Based Data Collection
  3. Auto-Labeling Pipeline
  4. Active Learning and Data Selection
  5. Model Training Orchestration
  6. Deployment Validation and A/B Testing
  7. Production Monitoring and Feedback
  8. Scenario Mining and Long-Tail Discovery
  9. Synthetic Data Augmentation
  10. Multi-Airport Transfer Learning
  11. Metrics and KPIs
  12. Cost Model and Scaling
  13. Implementation Roadmap
  14. Key Takeaways

1. The Data Flywheel Concept

1.1 What Makes a Flywheel, Not Just a Pipeline

A data pipeline moves data from vehicles to storage to training. A data flywheel creates a self-reinforcing cycle where each component's output improves the next:

┌─────────────────────────────────────────────────────────────────┐
│                     DATA FLYWHEEL                                │
│                                                                  │
│   ┌──────────┐    ┌───────────┐    ┌───────────┐               │
│   │ COLLECT  │───→│   MINE    │───→│   LABEL   │               │
│   │ (Fleet)  │    │ (Triggers)│    │ (Auto+QA) │               │
│   └────▲─────┘    └───────────┘    └─────┬─────┘               │
│        │                                  │                      │
│        │                                  ▼                      │
│   ┌────┴─────┐    ┌───────────┐    ┌───────────┐               │
│   │ DEPLOY   │←───│ VALIDATE  │←───│   TRAIN   │               │
│   │ (OTA)    │    │ (Shadow)  │    │ (GPU/TPU) │               │
│   └────┬─────┘    └───────────┘    └───────────┘               │
│        │                                                         │
│        ▼                                                         │
│   ┌──────────┐                                                  │
│   │ MONITOR  │──→ New triggers, failure cases, edge cases        │
│   │(Prod KPIs)│   feed back into COLLECT and MINE               │
│   └──────────┘                                                  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

The flywheel accelerates: more vehicles → more data → better models → fewer interventions → higher customer confidence → more deployments → more vehicles.

1.2 Industry Data Flywheel Benchmarks

CompanyFleet SizeData VolumeTraining ComputeRelease CadenceKey Metric
Tesla6M+ vehicles160 PF-days/training run10,000+ H100s (Cortex)Bi-weeklyMiles between interventions
Waymo2,500+ robotaxisPB/day100,000+ TPUv4ContinuousMiles per contact
comma.ai10,000+ devicesTB/day (uploaded subset)~500 GPUsBi-weekly% engaged miles
Cruise (pre-pause)400+ vehicles50TB/day~5,000 GPUsMonthlyTrips between issues
reference airside AV stack (current)5-20 vehicles200GB-1TB/day0 (no ML training)N/AN/A
reference airside AV stack (target)50-100 vehicles5-10TB/day8-64 GPUsMonthlyMissions per intervention

1.3 Why Airside Demands a Flywheel

The airside environment has characteristics that make a data flywheel especially critical:

  1. No public datasets: Cannot rely on academic benchmarks. Must build everything from operational data
  2. Long-tail safety events: Near-miss with aircraft, FOD encounter, jet blast exposure — rare but must be captured and trained on
  3. Environment diversity: Each airport is different (layout, aircraft types, ground equipment, weather patterns)
  4. Regulatory evidence: Safety cases require demonstrating continuous improvement from operational data
  5. High stakes per error: $250K average aircraft damage from GSE collision, potential $139M+ for structural damage
  6. Seasonal variation: Snow, de-icing operations, heat shimmer, different lighting — a summer model may fail in winter

2. Trigger-Based Data Collection

2.1 Why Not Upload Everything?

With 4-8 RoboSense LiDARs + cameras, each reference airside vehicle generates 200-400 GB/day. Uploading everything is:

  • Expensive: At $0.09/GB S3 storage, 10 vehicles × 300GB/day × 365 days = $98K/year in storage alone
  • Wasteful: 95%+ of driving is routine (straight taxiway, empty apron) — low information value
  • Bandwidth-constrained: Airport 5G upload realistic at ~100 Mbps → 1.1 TB/day max per vehicle

Solution: intelligent trigger-based collection that uploads only high-value data.

2.2 Trigger Taxonomy

Trigger CategoryTriggerUpload PriorityData WindowEstimated Frequency
SafetyOperator intervention (e-stop, takeover)Critical-30s to +10s1-5/day
SafetyMinimum clearance violation (<3m to aircraft)Critical-15s to +5s0-2/day
SafetyEmergency stop triggeredCritical-30s to +10s0-1/day
SafetySpeed limit violationHigh-10s to +5s0-5/day
PerceptionDetection confidence drop (<0.5)High-5s to +5s5-20/day
PerceptionTracking ID switch or lossHigh-10s to +5s10-30/day
PerceptionNovel object (no class match >0.3)High-5s to +10s2-10/day
PerceptionLocalization uncertainty spikeMedium-5s to +5s5-15/day
PlanningPath deviation >1m from plannedMedium-10s to +5s2-10/day
PlanningUnplanned stop (not at waypoint)Medium-5s to +10s5-20/day
EnvironmentWeather change (rain onset, fog)Medium-30s to +60s0-3/day
EnvironmentNight/dawn/dusk transitionLow-60s to +60s2/day
RandomTime-based sampling (every 30 min)Low30s window16-32/day
RandomDistance-based sampling (every 5 km)Low30s window10-20/day

2.3 On-Vehicle Trigger Engine

python
class DataTriggerEngine:
    """On-vehicle trigger engine for intelligent data collection.
    
    Runs as ROS node, monitors topics, triggers bag recording.
    """
    
    def __init__(self):
        self.triggers = self.load_trigger_config()
        self.ring_buffer = RingBuffer(duration_sec=60)  # always buffering
        self.upload_queue = PriorityQueue()
        self.daily_budget_gb = 50  # max upload per day
        self.daily_uploaded_gb = 0
        
    def monitor(self, msg, topic):
        """Called for every subscribed message."""
        # Always write to ring buffer
        self.ring_buffer.write(msg, topic)
        
        # Check triggers
        for trigger in self.triggers:
            if trigger.topic == topic and trigger.evaluate(msg):
                self.fire_trigger(trigger, msg)
    
    def fire_trigger(self, trigger, msg):
        """Extract data window and queue for upload."""
        # Extract from ring buffer
        window = self.ring_buffer.extract(
            start=rospy.Time.now() - rospy.Duration(trigger.pre_seconds),
            end=rospy.Time.now() + rospy.Duration(trigger.post_seconds)
        )
        
        # Estimate data size
        size_gb = window.estimate_size_gb()
        
        # Budget check
        if self.daily_uploaded_gb + size_gb > self.daily_budget_gb:
            if trigger.priority < Priority.CRITICAL:
                rospy.logwarn(f"Budget exceeded, skipping {trigger.name}")
                return
        
        # Create upload package
        package = UploadPackage(
            trigger_name=trigger.name,
            trigger_type=trigger.category,
            priority=trigger.priority,
            timestamp=rospy.Time.now(),
            vehicle_id=self.vehicle_id,
            airport=self.airport_id,
            data=window,
            metadata={
                'ego_pose': self.current_pose,
                'weather': self.weather_state,
                'nearby_objects': self.perception_state.get_objects(),
                'model_version': self.model_version,
            }
        )
        
        self.upload_queue.put((-trigger.priority.value, package))
        self.daily_uploaded_gb += size_gb

2.4 Upload Budget Optimization

With a 50 GB/day budget per vehicle (realistic for airport 5G):

PriorityDaily AllocationAvg Clip SizeClips/DayCoverage
Critical (safety)15 GB3 GB5100% capture
High (perception)20 GB1 GB20~60% capture
Medium (planning)10 GB0.5 GB20~40% capture
Low (sampling)5 GB0.1 GB50Systematic

Expected data yield per vehicle per month:

  • ~150 critical safety events (all captured)
  • ~400 perception edge cases (subset captured)
  • ~300 planning anomalies (subset captured)
  • ~1,500 random samples (systematic coverage)
  • Total: ~1.5 TB/month of high-value data per vehicle

3. Auto-Labeling Pipeline

3.1 Why Auto-Labeling Is Essential

Manual 3D LiDAR labeling costs $8-15 per frame (3D bounding boxes) or $15-25 per frame (occupancy). At the volumes needed for ML training:

Dataset SizeManual CostAuto-Label + QA CostSavings
10,000 frames$80K-150K$15K-25K70-85%
50,000 frames$400K-750K$50K-80K80-88%
200,000 frames$1.6M-3M$120K-200K88-93%

Auto-labeling produces initial annotations using ML models, which are then reviewed and corrected by human annotators — dramatically reducing per-frame cost.

3.2 Auto-Labeling Architecture

Raw Sensor Data (LiDAR + Camera + IMU + GPS)

┌───────────────────────────────────────────────┐
│            AUTO-LABELING PIPELINE              │
│                                                │
│  ┌──────────────┐   ┌──────────────────────┐  │
│  │ Multi-Frame   │   │ Foundation Model     │  │
│  │ Accumulation  │   │ (DINOv2/SAM/CLIP)   │  │
│  │ (10-20 frames)│   │ Image-level labels   │  │
│  └──────┬───────┘   └──────────┬───────────┘  │
│         ↓                       ↓              │
│  ┌──────────────┐   ┌──────────────────────┐  │
│  │ Offline 3D   │   │ 2D→3D Label Lifting  │  │
│  │ Detection     │   │ (Project 2D labels   │  │
│  │ (Larger model)│   │  to 3D points)       │  │
│  └──────┬───────┘   └──────────┬───────────┘  │
│         ↓                       ↓              │
│  ┌────────────────────────────────────────┐   │
│  │     Label Fusion & Consensus           │   │
│  │  - Multi-model agreement               │   │
│  │  - Temporal consistency check           │   │
│  │  - Confidence scoring                   │   │
│  └──────────────┬─────────────────────────┘   │
│                  ↓                              │
│  ┌────────────────────────────────────────┐   │
│  │     Quality Gate                        │   │
│  │  - High confidence → auto-accept        │   │
│  │  - Medium → human review                │   │
│  │  - Low → discard or flag                │   │
│  └──────────────┬─────────────────────────┘   │
│                  ↓                              │
│         Auto-Labeled Dataset                   │
└───────────────────────────────────────────────┘

3.3 Offline Multi-Frame Detection

Unlike online (real-time) detection, offline auto-labeling can use:

  • Multi-frame accumulation: Stack 10-20 LiDAR sweeps for dense point clouds
  • Larger models: No latency constraint — use 200M+ parameter detectors
  • Bi-directional temporal context: Future frames inform past detections
  • SLAM-refined poses: Better alignment than real-time odometry
python
class OfflineAutoLabeler:
    """Offline auto-labeling with multi-frame accumulation."""
    
    def __init__(self):
        self.detector = load_model('centerpoint_voxelnet_large')  # larger than real-time
        self.tracker = ABCTracker(max_age=30)
        self.foundation_model = DINOv2Backbone()
        self.slam_poses = None  # loaded from SLAM output
    
    def label_sequence(self, bag_path, slam_trajectory):
        """Auto-label an entire sequence."""
        self.slam_poses = slam_trajectory
        frames = self.load_frames(bag_path)
        
        # Forward pass: detect and track
        forward_tracks = self.forward_pass(frames)
        
        # Backward pass: detect and track (reversed)
        backward_tracks = self.backward_pass(frames)
        
        # Merge: bi-directional consensus
        merged_tracks = self.merge_bidirectional(forward_tracks, backward_tracks)
        
        # Smooth: temporal interpolation for missed detections
        smoothed = self.smooth_tracks(merged_tracks)
        
        # Multi-frame refinement: refine boxes using accumulated points
        refined = self.multiframe_refine(smoothed, frames)
        
        # Score confidence
        for track in refined:
            track.confidence = self.compute_confidence(track)
        
        return refined
    
    def multiframe_refine(self, tracks, frames):
        """Refine bounding boxes using accumulated point clouds."""
        for track in tracks:
            for det in track.detections:
                # Accumulate points from nearby frames
                accumulated = self.accumulate_points(
                    frames, det.timestamp, 
                    window=10,  # ±10 frames
                    box=det.box3d.expanded(1.5)  # search region
                )
                
                # Fit tight box to accumulated points
                refined_box = fit_oriented_bbox(accumulated)
                det.box3d = refined_box
                det.point_count = len(accumulated)
        
        return tracks

3.4 Foundation Model Labels

For semantic labels and novel object discovery:

python
class FoundationModelLabeler:
    """Use foundation models for semantic auto-labeling."""
    
    def __init__(self):
        self.sam = SAM2()           # Segment Anything 2
        self.clip = CLIP()          # Language-image matching
        self.dinov2 = DINOv2()      # Visual features
        
        # Airside vocabulary
        self.airside_classes = [
            "aircraft", "baggage cart", "tug vehicle", "belt loader",
            "fuel truck", "catering truck", "ground crew person",
            "safety cone", "jet bridge", "fire truck", "ambulance",
            "pushback tractor", "GPU (ground power unit)", "air starter",
            "lavatory truck", "water truck", "de-icing vehicle",
            "cargo loader", "passenger stairs", "foreign object debris"
        ]
    
    def label_image(self, image, lidar_points_2d):
        """Generate semantic labels from camera images."""
        # SAM2: generate masks
        masks = self.sam.generate_masks(image)
        
        # CLIP: classify each mask
        labels = []
        for mask in masks:
            # Crop mask region
            crop = image * mask.unsqueeze(-1)
            
            # CLIP zero-shot classification
            similarities = self.clip.similarity(crop, self.airside_classes)
            best_class = self.airside_classes[similarities.argmax()]
            confidence = similarities.max().item()
            
            if confidence > 0.25:  # threshold
                labels.append(SemanticLabel(
                    mask=mask,
                    class_name=best_class,
                    confidence=confidence
                ))
        
        # Lift to 3D: project 2D labels to LiDAR points
        for label in labels:
            points_in_mask = lidar_points_2d[label.mask[lidar_points_2d[:, 1], 
                                                          lidar_points_2d[:, 0]] > 0.5]
            label.points_3d = points_in_mask
        
        return labels

3.5 Quality Gate and Human Review

python
class QualityGate:
    """Route auto-labels to accept, review, or discard."""
    
    # Confidence thresholds (tuned per class)
    THRESHOLDS = {
        'aircraft':      {'auto_accept': 0.95, 'review': 0.7, 'discard': 0.3},
        'baggage_cart':  {'auto_accept': 0.90, 'review': 0.6, 'discard': 0.3},
        'ground_crew':   {'auto_accept': 0.85, 'review': 0.5, 'discard': 0.2},
        'fod':           {'auto_accept': 0.99, 'review': 0.8, 'discard': 0.5},
        # FOD: very high threshold — false negatives are dangerous
    }
    
    def route(self, auto_labels):
        """Route each label to appropriate quality tier."""
        accepted, review, discarded = [], [], []
        
        for label in auto_labels:
            thresholds = self.THRESHOLDS.get(
                label.class_name, 
                {'auto_accept': 0.90, 'review': 0.6, 'discard': 0.3}
            )
            
            if label.confidence >= thresholds['auto_accept']:
                # Also check temporal consistency
                if self.is_temporally_consistent(label):
                    accepted.append(label)
                else:
                    review.append(label)
            elif label.confidence >= thresholds['review']:
                review.append(label)
            elif label.confidence >= thresholds['discard']:
                review.append(label)  # borderline → human decides
            else:
                discarded.append(label)
        
        return {
            'accepted': accepted,      # ~60-70% of labels
            'needs_review': review,    # ~20-30% of labels
            'discarded': discarded     # ~5-10% of labels
        }

Expected auto-labeling throughput and cost:

MetricManual OnlyAuto-Label + QA
Frames/hour/annotator15-25100-200 (review only)
Cost per frame (3D boxes)$8-15$1.50-3.00
Cost per frame (occupancy)$15-25$3-6
Quality (mAP vs ground truth)95%+90-93% (auto) → 95%+ (after QA)
Turnaround (1000 frames)5-7 days1-2 days

4. Active Learning and Data Selection

4.1 The Core Problem

Not all data is equally valuable for training. A frame showing an empty taxiway contributes almost nothing to model improvement. A frame showing a partially occluded tug behind an aircraft wing is extremely valuable. Active learning selects the most informative samples for labeling.

4.2 Active Learning Strategies

StrategyMethodBest ForCompute Cost
Uncertainty samplingSelect frames where model is least confidentGeneral improvementLow
Committee disagreementSelect frames where ensemble members disagreeFinding blind spotsMedium
Gradient-basedSelect frames with highest expected gradient normMaximum learning signalHigh
Diversity samplingSelect frames that maximize feature space coverageAvoiding redundancyMedium
Error-drivenSelect frames where model produces errorsFixing known failuresLow
HybridCombine uncertainty + diversityBalance exploitation + explorationMedium

4.3 Airside Active Learning Pipeline

python
class AirsideActiveLearner:
    """Active learning pipeline for airside perception."""
    
    def __init__(self, model, unlabeled_pool, budget_frames=1000):
        self.model = model
        self.unlabeled_pool = unlabeled_pool
        self.budget = budget_frames
        self.feature_bank = FeatureBank()  # for diversity
    
    def select_batch(self):
        """Select most informative frames for labeling."""
        scores = {}
        
        for frame_id, frame in self.unlabeled_pool.items():
            # 1. Uncertainty score (epistemic)
            predictions = []
            for _ in range(5):  # MC Dropout
                pred = self.model.predict(frame, dropout=True)
                predictions.append(pred)
            uncertainty = self.compute_uncertainty(predictions)
            
            # 2. Novelty score (distance from labeled data)
            features = self.model.extract_features(frame)
            novelty = self.feature_bank.novelty_score(features)
            
            # 3. Safety relevance score
            safety_score = self.safety_relevance(frame, predictions[0])
            
            # Composite score (safety-weighted)
            scores[frame_id] = (
                0.3 * uncertainty + 
                0.3 * novelty + 
                0.4 * safety_score  # safety events get priority
            )
        
        # Select top-k by score, with diversity filtering
        selected = self.diverse_topk(scores, k=self.budget)
        
        return selected
    
    def safety_relevance(self, frame, prediction):
        """Prioritize frames with safety-relevant content."""
        score = 0.0
        
        # Frames near aircraft score higher
        for obj in prediction.objects:
            if obj.class_name == 'aircraft':
                distance = np.linalg.norm(obj.position[:2])
                score += max(0, 1.0 - distance / 50.0)  # higher when closer
        
        # Frames with operator intervention score maximum
        if frame.metadata.get('operator_intervention'):
            score = 1.0
        
        # Frames with novel objects
        for obj in prediction.objects:
            if obj.confidence < 0.5:
                score += 0.3
        
        return min(score, 1.0)
    
    def compute_uncertainty(self, mc_predictions):
        """Compute epistemic uncertainty from MC Dropout predictions."""
        # Per-object: variance of position predictions
        position_vars = []
        for obj_track in self.match_across_predictions(mc_predictions):
            positions = np.array([p.position for p in obj_track])
            position_vars.append(positions.var(axis=0).sum())
        
        if not position_vars:
            return 0.0
        
        # Also: entropy of class distributions
        class_probs = np.mean([p.class_distribution for p in mc_predictions], axis=0)
        entropy = -np.sum(class_probs * np.log(class_probs + 1e-8))
        
        return np.mean(position_vars) + 0.1 * entropy

4.4 Active Learning Effectiveness

Research shows active learning achieves target performance with 20-50% fewer labeled frames:

ScenarioRandom SelectionActive LearningReduction
Detection mAP = 505,000 frames2,500 frames50%
Detection mAP = 6015,000 frames8,000 frames47%
Detection mAP = 7050,000 frames30,000 frames40%
Occupancy mIoU = 3010,000 frames6,000 frames40%

At $3/frame (auto-labeled + QA), saving 20,000 frames = $60K saved per training iteration.

4.5 Curriculum Learning for Airside

Beyond active learning, curriculum learning orders training data from easy to hard:

PhaseDurationData FocusExpected Outcome
1. EasyEpochs 1-10Clear weather, few objects, straight pathsBase feature learning
2. MediumEpochs 11-25Moderate traffic, curves, parked aircraftObject recognition
3. HardEpochs 26-40Dense traffic, weather, night, occlusionRobustness
4. CriticalEpochs 41-50Safety events, edge cases, rare scenariosLong-tail coverage

5. Model Training Orchestration

5.1 Training Pipeline Architecture

┌─────────────────────────────────────────────────┐
│              TRAINING ORCHESTRATOR                │
│                                                   │
│  ┌─────────────┐   ┌──────────────┐             │
│  │ Data Loader  │   │ Experiment   │             │
│  │ (versioned)  │   │ Tracker      │             │
│  │ DVC + S3     │   │ (W&B/MLflow) │             │
│  └──────┬──────┘   └──────┬───────┘             │
│         │                  │                      │
│         ▼                  ▼                      │
│  ┌─────────────────────────────────────────┐     │
│  │         Training Job (GPU Cluster)       │     │
│  │                                          │     │
│  │  Pre-train (nuScenes/Waymo)             │     │
│  │       ↓                                  │     │
│  │  Fine-tune (airport data + LoRA)         │     │
│  │       ↓                                  │     │
│  │  Evaluate (held-out airport test set)    │     │
│  │       ↓                                  │     │
│  │  Export (TensorRT for Orin)              │     │
│  └──────────────┬──────────────────────────┘     │
│                  ↓                                │
│  ┌─────────────────────────────────────────┐     │
│  │      Model Registry                      │     │
│  │  - Version tagged                        │     │
│  │  - Metrics attached                      │     │
│  │  - Lineage tracked (data → model)        │     │
│  │  - Approval workflow                     │     │
│  └─────────────────────────────────────────┘     │
└─────────────────────────────────────────────────┘

5.2 Training Configuration

yaml
# training_config.yaml — PointPillars fine-tune for airside
experiment:
  name: "pointpillars_airside_v12"
  base_model: "pointpillars_nuscenes_pretrained"
  
data:
  train:
    - source: "s3://airside-data/airport-a/train/"
      version: "dvc://v2.3"
      frames: 45000
    - source: "s3://airside-data/airport-b/train/"
      version: "dvc://v1.1"  
      frames: 12000
  val:
    - source: "s3://airside-data/airport-a/val/"
      frames: 5000
  test:
    - source: "s3://airside-data/airport-a/test/"
      frames: 5000
      
model:
  backbone: "pillar_vfe"
  neck: "second_fpn"
  head: "center_head"
  classes:
    - aircraft
    - baggage_cart
    - tug_vehicle
    - belt_loader
    - fuel_truck
    - ground_crew
    - safety_cone
    - fod
  fine_tune:
    method: "lora"
    rank: 32
    alpha: 64
    target_modules: ["backbone", "neck"]  # freeze head initially
    
training:
  epochs: 50
  batch_size: 16
  lr: 0.001
  lr_schedule: "cosine_warmup"
  warmup_epochs: 5
  optimizer: "adamw"
  weight_decay: 0.01
  
  # Class weighting (safety-critical classes weighted higher)
  class_weights:
    aircraft: 5.0       # must never miss
    ground_crew: 3.0    # safety critical
    fod: 10.0           # highest priority
    baggage_cart: 1.0   # common, well-represented
    
export:
  format: "tensorrt"
  precision: "fp16"  # int8 for PointPillars backbone
  target: "orin_agx"
  calibration_dataset: "s3://airside-data/calibration/int8_500frames/"

5.3 Experiment Tracking

Every training run must be fully reproducible:

Tracked ArtifactToolPurpose
Dataset versionDVCExact data used for training
Code versionGit commit SHAExact code used
ConfigYAML in gitHyperparameters
Model weightsModel registryDeployable artifacts
Training metricsW&B / MLflowLoss curves, mAP progression
Evaluation resultsW&B / MLflowPer-class mAP, latency, failure cases
Data lineageDVC + metadataWhich bags → which frames → which model

5.4 Continuous Training Pipeline

python
class ContinuousTrainingPipeline:
    """Automated retraining when new data meets trigger criteria."""
    
    def __init__(self):
        self.data_registry = DataRegistry()
        self.model_registry = ModelRegistry()
        self.retrain_triggers = {
            'new_frames': 5000,         # retrain when 5K new frames available
            'performance_drop': 0.03,   # retrain when mAP drops 3%+
            'new_airport': True,        # retrain for every new airport
            'max_age_days': 30,         # retrain at least monthly
        }
    
    def check_retrain_needed(self):
        """Check if retraining should be triggered."""
        current_model = self.model_registry.get_production()
        
        # Count new frames since last training
        new_frames = self.data_registry.count_new_since(
            current_model.training_date
        )
        if new_frames >= self.retrain_triggers['new_frames']:
            return True, f"new_data: {new_frames} frames"
        
        # Check production performance
        prod_metrics = self.get_production_metrics(days=7)
        training_metrics = current_model.metrics
        if training_metrics['mAP'] - prod_metrics['mAP'] > self.retrain_triggers['performance_drop']:
            return True, f"perf_drop: {training_metrics['mAP']:.1f}{prod_metrics['mAP']:.1f}"
        
        # Check model age
        age_days = (datetime.now() - current_model.training_date).days
        if age_days >= self.retrain_triggers['max_age_days']:
            return True, f"age: {age_days} days"
        
        return False, "no trigger"

6. Deployment Validation and A/B Testing

6.1 Shadow Mode Evaluation

Before deploying a new model to production, it runs in shadow mode alongside the current production model (see 60-safety-validation/verification-validation/shadow-mode.md for infrastructure details). Here we focus on the ML-specific validation criteria.

Shadow Mode Metrics Gate:

MetricThresholdMeasurement PeriodRationale
mAP (overall)≥ production model1 week shadowMust not regress
mAP (aircraft)≥ production + 0%1 weekSafety-critical, never regress
mAP (ground crew)≥ production + 0%1 weekSafety-critical, never regress
mAP (FOD)≥ production + 0%1 weekSafety-critical, never regress
False positive rate≤ production + 5%1 weekAvoid phantom braking
Latency p99≤ production + 2ms1 weekMust fit timing budget
Operator interventions (shadow)≤ production2 weeksWould-have-intervened analysis
Edge case coverage> production1 weekMeasured on curated hard set

6.2 A/B Testing on Fleet

For a fleet of 10+ vehicles, split into control and treatment groups:

python
class FleetABTest:
    """A/B test new model across fleet subset."""
    
    def __init__(self, fleet_size, treatment_fraction=0.2):
        self.treatment_vehicles = self.select_treatment(
            fleet_size, treatment_fraction
        )
        self.metrics = ABMetrics()
    
    def select_treatment(self, fleet_size, fraction):
        """Select vehicles for treatment group.
        
        Stratify by: airport, vehicle type, shift pattern
        to ensure fair comparison.
        """
        vehicles = self.get_fleet()
        # Stratified random selection
        treatment = stratified_sample(
            vehicles,
            strata=['airport', 'vehicle_type', 'shift'],
            fraction=fraction
        )
        return treatment
    
    def analyze(self, duration_days=14):
        """Analyze A/B test results."""
        control_metrics = self.metrics.get_group('control', duration_days)
        treatment_metrics = self.metrics.get_group('treatment', duration_days)
        
        results = {}
        for metric in ['mAP', 'interventions_per_km', 'false_positive_rate',
                        'latency_p99', 'mission_completion_rate']:
            control_val = control_metrics[metric]
            treatment_val = treatment_metrics[metric]
            
            # Statistical significance (two-sample t-test)
            t_stat, p_value = ttest_ind(control_val, treatment_val)
            
            results[metric] = {
                'control': np.mean(control_val),
                'treatment': np.mean(treatment_val),
                'delta': np.mean(treatment_val) - np.mean(control_val),
                'p_value': p_value,
                'significant': p_value < 0.05
            }
        
        return results

6.3 Rollback Criteria

Automatic rollback if any of these triggers fire within 48 hours of deployment:

TriggerThresholdResponse
Safety-critical missAny aircraft/crew miss with <20m rangeImmediate rollback
Intervention spike2x baseline intervention rateRollback within 1 hour
Latency breachp99 > timing budget for >5 minRollback within 1 hour
Crash/exceptionAny model crashImmediate rollback
Operator complaint2+ operators report issuesPause, investigate

7. Production Monitoring and Feedback

7.1 Monitoring Dashboard

Real-time metrics that close the loop back to data collection:

┌────────────────────────────────────────────────────────┐
│  PERCEPTION MODEL HEALTH DASHBOARD                      │
│                                                          │
│  Model: pointpillars_airside_v12   Deployed: 2026-04-01│
│  Fleet: 12 vehicles, 3 airports                         │
│                                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│  │ mAP (7-day)  │  │ Interventions│  │ Latency p99  │ │
│  │   68.3%      │  │   0.8/100km  │  │   6.2ms      │ │
│  │   ↑ 2.1%     │  │   ↓ 15%     │  │   stable     │ │
│  └──────────────┘  └──────────────┘  └──────────────┘ │
│                                                          │
│  Per-Class Performance:                                  │
│  aircraft:     94.2% ████████████████████ ✓             │
│  baggage_cart: 78.5% ████████████████   ✓               │
│  ground_crew:  71.3% ██████████████    ✓                │
│  tug_vehicle:  75.8% ███████████████   ✓                │
│  belt_loader:  65.2% █████████████     ⚠ (below target)│
│  fod:          45.1% █████████         ⚠ (needs data)  │
│                                                          │
│  Alerts:                                                 │
│  ⚠ belt_loader mAP dropped 3.2% at Airport C (new type)│
│  ⚠ FOD detection below 50% target — active learning     │
│    requesting 500 more labeled FOD frames                │
│                                                          │
└────────────────────────────────────────────────────────┘

7.2 Automated Feedback Signals

Production monitoring generates signals that feed back into the flywheel:

SignalDetection MethodFlywheel Action
Class mAP dropRolling 7-day eval vs baselineTrigger retraining with class-weighted sampling
Novel object typeConfidence <0.3 on detected objectUpload clip, route to labeling, add to training
Domain shiftFeature distribution drift (KL divergence)Alert, collect more data from affected conditions
Seasonal performancemAP vs weather/time-of-day correlationTrigger seasonal retraining with recent data
Airport-specific gapPer-airport metrics divergeCollect airport-specific data, LoRA adapter

7.3 Failure Case Analysis

Every operator intervention triggers a failure analysis pipeline:

python
class FailureCaseAnalyzer:
    """Analyze perception failures from interventions."""
    
    def analyze_intervention(self, event):
        """Root-cause an operator intervention."""
        # Load sensor data around intervention
        data = self.load_event_data(event, window=(-30, 10))
        
        # Re-run perception with debug logging
        debug_output = self.model.predict_debug(data)
        
        # Classify failure mode
        failure_mode = self.classify_failure(event, debug_output)
        
        # Store for training
        self.failure_database.add(FailureCase(
            event_id=event.id,
            timestamp=event.timestamp,
            airport=event.airport,
            vehicle=event.vehicle_id,
            failure_mode=failure_mode,
            root_cause=self.estimate_root_cause(failure_mode, debug_output),
            sensor_data_path=data.path,
            model_version=self.model.version
        ))
        
        return failure_mode
    
    def classify_failure(self, event, debug_output):
        """Classify failure into actionable categories."""
        modes = {
            'false_negative': 'Object present but not detected',
            'false_positive': 'Detection with no real object',
            'misclassification': 'Detected but wrong class',
            'localization_error': 'Detected but position >1m off',
            'tracking_failure': 'ID switch or track loss',
            'latency': 'Detection too late for planning',
            'sensor_failure': 'Sensor data quality issue',
            'ood_input': 'Input outside training distribution',
        }
        # Logic to classify based on ground truth reconstruction
        ...

8. Scenario Mining and Long-Tail Discovery

8.1 The Long-Tail Problem

Autonomous driving follows a power law: 95% of scenarios are routine, but the remaining 5% contains 95% of the safety-critical situations. For airside operations:

Scenario CategoryFrequencyDifficultySafety Impact
Empty taxiway driving40%LowLow
Single aircraft at gate25%LowMedium
Multiple GSE at gate15%MediumMedium
Dense turnaround traffic10%HighHigh
Unusual equipment5%HighHigh
Weather degradation3%HighHigh
FOD on surface1%Very highCritical
Near-miss / emergency0.5%Very highCritical
Novel situation0.5%Very highCritical

8.2 Scenario Mining Pipeline

python
class ScenarioMiner:
    """Mine fleet data for specific scenario types."""
    
    def __init__(self):
        self.embedding_model = SceneEmbedder()  # encode scenes to vectors
        self.scenario_library = ScenarioLibrary()
        
    def mine_scenario(self, query, fleet_data):
        """Find clips matching a scenario description.
        
        Examples:
            "baggage cart crossing vehicle path within 10m"
            "aircraft pushback while vehicle is near gate"
            "FOD-like object on taxiway surface"
            "rain onset during active operation"
        """
        results = []
        
        for clip in fleet_data:
            # Structured query matching
            if self.matches_structured_query(query, clip):
                results.append(clip)
            
            # Embedding similarity (for fuzzy matching)
            clip_embedding = self.embedding_model.encode(clip)
            query_embedding = self.embedding_model.encode_text(query)
            similarity = cosine_similarity(clip_embedding, query_embedding)
            
            if similarity > 0.7:
                results.append((clip, similarity))
        
        return sorted(results, key=lambda x: x[1], reverse=True)
    
    def discover_novel_scenarios(self, fleet_data, known_scenarios):
        """Discover scenarios not in the known library."""
        # Embed all clips
        embeddings = [self.embedding_model.encode(clip) for clip in fleet_data]
        
        # Cluster
        clusters = HDBSCAN(min_cluster_size=5).fit(embeddings)
        
        # Find clusters far from known scenarios
        known_embeddings = [self.embedding_model.encode(s) for s in known_scenarios]
        
        novel_clusters = []
        for cluster_id in set(clusters.labels_):
            if cluster_id == -1:
                continue  # noise
            cluster_center = np.mean(embeddings[clusters.labels_ == cluster_id], axis=0)
            min_distance = min(cosine_distance(cluster_center, k) for k in known_embeddings)
            if min_distance > 0.5:
                novel_clusters.append((cluster_id, min_distance))
        
        return novel_clusters

8.3 Scenario Balancing for Training

Training data should overrepresent rare but important scenarios:

ScenarioReal FrequencyTraining FrequencyOversampling Factor
Routine driving40%15%0.4x
Single aircraft25%20%0.8x
Multiple GSE15%20%1.3x
Dense traffic10%20%2x
Unusual equipment5%10%2x
Weather3%8%2.7x
FOD1%4%4x
Near-miss0.5%2%4x
Novel0.5%1%2x

9. Synthetic Data Augmentation

9.1 Filling Gaps with Synthetic Data

For scenarios too rare or too dangerous to collect naturally (FOD, near-misses, extreme weather), synthetic data fills the gap (see 50-cloud-fleet/data-platform/synthetic-data-generation.md for tools).

Synthetic data integration in the flywheel:

Real Data (Fleet) ──┐
                     ├──→ Training Data Mixer ──→ Training
Synthetic Data ─────┘     (ratio: 70-80% real, 20-30% synthetic)


  Gap Analysis ←── Active Learning identifies gaps
                   that synthetic data can fill

9.2 Synthetic Data Budget

Gap TypeSynthetic MethodVolumeCost
FOD variations3DGS insertion into real scenes5,000 frames$2K compute
Night drivingNeural style transfer10,000 frames$3K compute
Rain/fogWeather augmentation on real data10,000 frames$2K compute
Novel aircraft types3D model insertion2,000 frames$5K (3D models)
Near-miss scenariosTrajectory perturbation3,000 frames$1K compute
New airport layoutDigital twin generation5,000 frames$10K (mapping + gen)
Total35,000 frames~$23K

9.3 Domain Randomization for Robustness

python
class AirsideDomainRandomizer:
    """Apply domain randomization to increase robustness."""
    
    def randomize(self, scene):
        """Apply random augmentations to training data."""
        augmentations = []
        
        # Lighting variations (time of day, clouds)
        if random.random() < 0.3:
            scene = self.vary_lighting(scene, 
                intensity_range=(0.3, 3.0),  # dawn to midday
                color_temp_range=(3500, 6500)  # warm to cool
            )
            augmentations.append('lighting')
        
        # Weather effects
        if random.random() < 0.2:
            weather = random.choice(['rain', 'fog', 'snow', 'heat_shimmer'])
            scene = self.add_weather(scene, weather, 
                severity=random.uniform(0.2, 0.8))
            augmentations.append(f'weather_{weather}')
        
        # Ground surface variations
        if random.random() < 0.15:
            scene = self.vary_surface(scene,
                options=['dry', 'wet', 'puddles', 'oil_spill', 'deicing_fluid'])
            augmentations.append('surface')
        
        # Aircraft livery randomization (different airlines)
        if random.random() < 0.25:
            scene = self.randomize_livery(scene)
            augmentations.append('livery')
        
        # LiDAR noise model variations
        if random.random() < 0.2:
            scene = self.vary_lidar_noise(scene,
                dropout_rate=random.uniform(0, 0.15),
                range_noise_std=random.uniform(0.01, 0.05))
            augmentations.append('lidar_noise')
        
        return scene, augmentations

10. Multi-Airport Transfer Learning

10.1 The Multi-Airport Challenge

Each airport has unique characteristics:

PropertyVariation Across Airports
LayoutCompletely different gate/taxiway geometry
Aircraft typesRegional vs international → different sizes
GSE fleetDifferent manufacturers, models
Surface markingsDifferent standards (ICAO vs FAA)
WeatherArctic (Helsinki) vs tropical (Singapore)
LightingHigh-mast (Europe) vs embedded (US)
Traffic density2 gates (regional) vs 200 gates (hub)

10.2 LoRA Adapters Per Airport

Rather than training a separate model per airport, use LoRA adapters (see 50-cloud-fleet/mlops/transfer-learning.md):

Base Model (trained on all airports)

     ├── LoRA Airport A (rank 16, 2.1M params)
     ├── LoRA Airport B (rank 16, 2.1M params)  
     ├── LoRA Airport C (rank 16, 2.1M params)
     └── LoRA Airport D (rank 16, 2.1M params)

Training data requirements per airport:

Data LevelFramesLabeling CostExpected mAPTimeline
Minimal (mapping only)100-500$500-2K45-551 week
Basic (LoRA fine-tune)500-2,000$2K-8K55-652-4 weeks
Standard (full fine-tune)5,000-10,000$15K-40K65-754-8 weeks
Production (+ active learning)20,000-50,000$30K-80K75-853-6 months

10.3 Airport Onboarding Flywheel

When deploying to a new airport:

Week 1: Mapping drives (manual, record data)
         → Auto-label with existing model
         → Identify domain gaps (novel GSE types, layout features)

Week 2: Label critical frames (500-1000, focus on gaps)
         → Train LoRA adapter
         → Shadow mode testing

Week 3-4: Shadow mode validation
           → Active learning selects hard cases
           → Label additional 500-1000 frames
           → Retrain LoRA adapter

Month 2: Supervised autonomous operation
          → Continuous data collection
          → Monthly retraining cycle starts
          → Performance converges to production level

Month 3+: Full autonomous operation
           → Flywheel self-sustaining
           → Airport LoRA adapter stabilizes

11. Metrics and KPIs

11.1 Flywheel Health Metrics

MetricTargetMeasurementCurrent (est.)
Flywheel cycle time<30 daysTime from data collection to model deploymentN/A (no ML)
Data yield rate>5% of collected data used in trainingUseful frames / total framesN/A
Auto-label accuracy>90% mAP vs human labelsPeriodic human auditN/A
Active learning efficiency>1.5x random baselinemAP gain per labeled frameN/A
Model improvement rate>2% mAP/quarterQuarterly evaluation on fixed test setN/A
Deployment success rate>90% of candidates pass validationCandidates deployed / candidates trainedN/A
Retrain trigger rate1-2/monthAutomatic retraining triggers per monthN/A

11.2 Perception Improvement Trajectory

Expected mAP progression with active flywheel:

TimelineData VolumeTrainingmAP (est.)Interventions/100km
Month 00 (no ML)N/AN/A (rules only)5-10
Month 35K frames (nuScenes transfer)Pre-train + 500 labeled45-553-5
Month 620K framesActive learning + LoRA60-681-3
Month 1280K framesFull fine-tune + curriculum70-780.5-1.5
Month 18200K framesContinuous retraining75-820.2-0.8
Month 24500K framesMulti-airport + synthetic80-850.1-0.5

11.3 ROI Model

Cost CategoryYear 1Year 2Year 3
Compute (training)$30K$60K$100K
Labeling$40K$50K$60K
Storage$15K$30K$50K
Engineering (1 ML engineer)$120K$130K$140K
Total$205K$270K$350K
Benefit CategoryYear 1Year 2Year 3
Reduced interventions$50K$150K$300K
Faster airport onboarding$0$100K$200K
Avoided incidents$100K$250K$500K
Competitive differentiationHard to quantify
Total$150K$500K$1M

Breakeven: ~Month 18. NPV positive by end of Year 2.


12. Cost Model and Scaling

12.1 Compute Requirements

Fleet SizeMonthly DataTraining ComputeStorageTotal Monthly
5 vehicles7.5 TB8 GPUs × 48h = $2K$675$2.7K
20 vehicles30 TB16 GPUs × 72h = $6K$2.7K$8.7K
50 vehicles75 TB32 GPUs × 96h = $15K$6.8K$21.8K
100 vehicles150 TB64 GPUs × 120h = $38K$13.5K$51.5K

Assumes:

  • H100 spot instances at $2.50/GPU-hour
  • S3 standard storage at $0.023/GB/month (first year retention)
  • Monthly retraining cycle
  • 5% of raw data retained long-term

12.2 Scaling Strategy

ScaleInfrastructureAutomation LevelHuman Effort
5 vehiclesLocal GPU server (8× A5000)Semi-manual triggers, manual labeling1 ML engineer (50%)
20 vehiclesCloud GPU (on-demand)Automated triggers, auto-label + QA1 ML engineer + 2 annotators
50 vehiclesDedicated cloud clusterFully automated flywheel2 ML engineers + 4 annotators
100 vehiclesMulti-region cloudSelf-optimizing flywheel3 ML engineers + 6 annotators

13. Implementation Roadmap

Phase 1: Foundation (Months 1-3) — $25K

TaskDurationDependenciesDeliverable
Trigger engine deployment2 weeksROS node developmentOn-vehicle data collection
Bag→training data pipeline3 weeksFleet data pipelineAutomated frame extraction
Auto-labeling v1 (CenterPoint offline)3 weeksPipeline90%+ auto-label on common classes
nuScenes pre-training1 weekGPU accessBase model weights
First LoRA fine-tune2 weeks500 labeled framesAirport-specific model v1
Shadow mode evaluation2 weeksModel v1Baseline metrics

Phase 2: Active Flywheel (Months 4-6) — $35K

TaskDurationDependenciesDeliverable
Active learning selection3 weeksPhase 1 completeIntelligent data selection
Foundation model auto-labeling4 weeksSAM + CLIP setupNovel object labels
Quality gate + annotation UI3 weeksAuto-labelingHuman review workflow
Experiment tracking (W&B)1 weekTraining pipelineReproducible experiments
First retrained model (v2)2 weeks5K labeled framesImproved model
A/B testing infrastructure2 weeksFleet > 5 vehiclesSplit deployment

Phase 3: Scaling (Months 7-12) — $75K

TaskDurationDependenciesDeliverable
Continuous retraining automation4 weeksPhase 2Automated flywheel
Scenario mining4 weeks6+ months of fleet dataLong-tail discovery
Synthetic data integration6 weeksGap analysisAugmented training set
Multi-airport LoRA system4 weeksSecond airport deploymentScalable adaptation
Production monitoring dashboard3 weeksFleet telemetryReal-time model health
Performance regression CI/CD3 weeksTest set curationAutomated quality gates

Phase 4: Optimization (Months 13-18) — $50K

TaskDurationDependenciesDeliverable
Curriculum learning3 weeksScenario libraryOptimized training
Self-supervised pre-training6 weeksLarge unlabeled datasetReduced label needs
Model distillation (smaller models)4 weeksProduction modelFaster inference
Fleet-wide learning (V2V data sharing)6 weeksCollaborative pipelineCross-vehicle learning
Flywheel KPI optimizationOngoingAll phasesSelf-improving system

14. Key Takeaways

  1. A data flywheel is not a data pipeline — the flywheel creates a self-reinforcing cycle where more vehicles → more data → better models → fewer interventions → more deployments → more vehicles

  2. Trigger-based collection uploads ~5% of raw data — intelligent triggers capture 100% of safety events, ~60% of perception edge cases, while staying within 50 GB/day upload budget per vehicle

  3. Auto-labeling reduces cost by 70-85% — from $8-15/frame manual to $1.50-3.00/frame with auto-label + human QA, enabling 5-10x more labeled data for the same budget

  4. Active learning achieves target mAP with 40-50% fewer labeled frames — safety-weighted selection prioritizes aircraft, ground crew, and FOD frames, saving ~$60K per training iteration at scale

  5. Monthly retraining cycle is the target cadence — triggered by 5,000 new frames, 3% mAP drop, or 30-day age, whichever comes first

  6. Shadow mode validation requires 1-2 weeks with strict gates: safety-critical classes (aircraft, crew, FOD) must never regress, even by 0.1% mAP

  7. Per-airport LoRA adapters need only 500-2,000 labeled frames ($2K-8K) to reach initial deployment quality, vs 20,000-50,000 frames for full training — enabling rapid airport onboarding

  8. Expected mAP trajectory: 45% (month 3) → 70% (month 12) → 82% (month 24) with interventions dropping from 5-10/100km to 0.1-0.5/100km

  9. Flywheel breakeven at ~Month 18 with NPV positive by end of Year 2, driven by reduced interventions ($300K/yr), faster onboarding ($200K/yr), and avoided incidents ($500K/yr)

  10. Scenario mining discovers long-tail events that comprise 5% of driving but 95% of safety risk — oversampling these by 2-4x in training is critical for safety performance

  11. Synthetic data fills gaps at $23K for 35,000 frames covering FOD, night, weather, novel aircraft, and near-misses — scenarios too rare or dangerous to collect naturally

  12. Foundation models (SAM + CLIP) enable zero-shot labeling of novel airside objects — critical for the first deployment when no training data exists for airside-specific classes

  13. Quality gate routes 60-70% of auto-labels to auto-accept, 20-30% to human review, 5-10% to discard — FOD requires highest confidence (0.99) for auto-accept due to safety criticality

  14. Failure case analysis closes the loop — every operator intervention generates a classified failure case that informs what data to collect next, what scenarios to mine, and what to prioritize in active learning

  15. Fleet of 100 vehicles generates ~150 TB/month requiring ~$52K/month for compute + storage, but the flywheel efficiency (auto-labeling, active learning, scenario balancing) means each dollar of data investment yields 3-5x more model improvement than naive approaches

  16. New airport onboarding drops from months to weeks once the flywheel is running — week 1 mapping, week 2 LoRA training, weeks 3-4 shadow validation, month 2 supervised autonomy


References

  1. Tesla AI Day 2022, 2023 — Data engine and auto-labeling pipeline architecture
  2. Waymo, "Content Search: Mining Real-World Data for Autonomous Driving," 2023
  3. Ren et al., "A Survey on Active Learning for Object Detection," IJCV 2024
  4. Sener & Savarese, "Active Learning for Convolutional Neural Networks: A Core-Set Approach," ICLR 2018
  5. Yoo & Kweon, "Learning Loss for Active Learning," CVPR 2019
  6. Settles, "Active Learning Literature Survey," 2009
  7. Wang et al., "Auto-Labeling 3D Objects with Differentiable Rendering and LiDAR," NeurIPS 2023
  8. Caesar et al., "nuScenes: A Multimodal Dataset for Autonomous Driving," CVPR 2020
  9. comma.ai, "openpilot: An open source driver assistance system," 2024
  10. Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models," ICLR 2022
  11. Bengio et al., "Curriculum Learning," ICML 2009
  12. NVIDIA, "Auto-Labeling for Autonomous Driving," Drive Sim Documentation, 2025

Document generated for reference airside AV stack industry research, April 2026. Covers the ML-centric data flywheel — for infrastructure (storage, transfer, DVC), see 50-cloud-fleet/data-platform/fleet-data-pipeline.md. For bag processing specifics, see 50-cloud-fleet/data-platform/data-engine-from-bags.md. For public datasets, see 50-cloud-fleet/data-platform/data-engines-datasets.md.

Public research notes collected from public sources.