Skip to content

Active Labeling and Budget Operations

Last updated: 2026-05-09

Why It Matters

Fleet learning is constrained twice: vehicles cannot upload everything, and humans cannot label everything that reaches the cloud. Active labeling operations decide which samples earn bandwidth, which uploaded samples earn annotation spend, which predictions can be reviewed instead of labeled from scratch, and which labels are good enough to promote into training or safety evidence.

This page covers budgeted labeling operations for perception, prediction, planning replay, and data-quality review.

Operating Model

  1. Maintain separate budgets for upload, auto-label inference, human annotation, expert review, and label QA. Do not spend human review on clips that are blocked by privacy, corruption, missing calibration, or duplicate coverage.
  2. Score candidates in two stages. On vehicle or edge storage, select clips under bandwidth and retention constraints. In the cloud, select from uploaded data under a global annotation budget.
  3. Balance uncertainty, diversity, coverage, and operational risk. DUAL frames this as distributed upload plus active labeling for resource-constrained fleets; the practical lesson is to avoid spending the global label budget on redundant local uploads.
  4. Use FiftyOne or equivalent dataset tooling to inspect embeddings, near-duplicates, hard examples, label mistakes, and model predictions before creating annotation tasks.
  5. Use Label Studio or equivalent annotation tooling for pre-annotations, ML backend predictions, interactive labeling, and human review. Predictions are not ground truth until reviewed and submitted.
  6. Promote labels by state: candidate, pre_labeled, human_labeled, qa_passed, approved_for_training, approved_for_safety_evidence, rejected.

Evidence Artifacts

ArtifactMinimum contentsOwner
Budget ledgerBudget type, allocation, spend, remaining quota, owner, periodLabel operations
Candidate score recordSource clip, score components, selected/not selected reason, dedupe clusterData platform
Upload selection manifestVehicle, local model version, storage constraint, selected sample IDsFleet data
Annotation batchTask IDs, label schema, instructions, source data snapshots, pre-label modelLabel operations
Pre-annotation recordModel version, prediction score, Label Studio prediction payload, review statusMLOps
QA reportInter-annotator checks, reviewer decisions, defect taxonomy, rework rateLabel QA
Promotion recordApproved label snapshot, allowed use, expiry, downstream dataset IDsData steward

Acceptance Checks

  • Selection decisions are reproducible from stored scores, budgets, and source snapshots.
  • The annotation batch has a fixed label schema, task instructions, and ODD scope.
  • Pre-labels are clearly distinguished from reviewed labels in storage and downstream manifests.
  • Label QA samples cover high-risk classes, rare classes, new airports, night/weather slices, and model-disagreement cases.
  • Duplicate and near-duplicate samples are controlled before spending annotation budget.
  • Labels promoted to safety evidence have stricter QA than labels used only for exploratory training.
  • Budget reports expose cost per accepted label, defect rate, rework rate, and downstream model or replay impact.

Failure Modes

Failure modeConsequenceControl
Label budget follows upload volumeCommon routes consume all annotation spendGlobal cloud selection with diversity and risk weighting
Unreviewed predictions enter trainingModel reinforces its own errorsSeparate pre_labeled from qa_passed states
Active learning chases only uncertaintyDataset fills with outliers and corrupt samplesCombine uncertainty with quality, diversity, and ODD coverage
Label instructions driftAnnotators create incompatible labelsVersion task instructions and schema with each batch
QA samples are random onlyRare safety classes are under-reviewedRisk-weight QA sampling
Duplicate clips are labeled repeatedlyBudget waste and biased training distributionNear-duplicate detection before task creation
Promotion has no allowed-use scopeExploratory labels become safety evidence by accidentRequire explicit promotion state and data steward approval
  • 50-cloud-fleet/mlops/data-flywheel-airside.md
  • 50-cloud-fleet/data-platform/fleet-data-pipeline.md
  • 50-cloud-fleet/data-platform/3d-annotation-tools.md
  • 50-cloud-fleet/data-platform/perception-slam-fleet-data-contract.md
  • 30-autonomy-stack/perception/datasets-benchmarks/fod-and-airport-apron-detection-datasets.md
  • 60-safety-validation/verification-validation/evaluation-benchmarks.md
  • 60-safety-validation/verification-validation/knowledge-base-evaluation-protocol.md

Sources

Public research notes collected from public sources.