Open3DTrack

What It Is

Open3DTrack is a 2024-2025 open-vocabulary 3D multi-object tracking method.
The full title is "Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking."
It formulates the open-vocabulary 3D tracking task: track known and novel object categories in 3D space.
It introduces dataset splits for open-vocabulary tracking scenarios.
It adapts a 3D tracking framework with open-vocabulary 2D detections and tracking-specific scoring.
It fills the gap between open-vocabulary 3D detection and persistent 3D tracks.

Use 2D open-vocabulary detections to provide category information for object classes not covered by a closed-set 3D detector.
Link those categories to 3D object proposals from existing 3D detectors.
Train the tracker to operate more class-agnostically so it can preserve trajectories for unseen classes.
Add confidence score prediction because 2D open-vocabulary confidence does not directly represent 3D proposal objectness.
Add track consistency scoring to stabilize labels and identities over time.
Evaluate base and novel classes separately so average tracking metrics do not hide novel-class collapse.

Input: 3D object proposals from a detector such as CenterPoint, MEGVII, or BEVFusion.
Input: 2D open-vocabulary detections or class prompts, for example from a YOLO-World-style detector.
Input metadata: camera-LiDAR calibration, timestamps, ego pose, and frame sequence.
Training input: 3D tracking labels for base classes and pseudo labels from 2D open-vocabulary detections.
Output: 3D object tracks with positions, velocities, class labels, and confidence scores.
Output: tracks for both known and novel categories under the evaluation split.
It does not produce dense occupancy or freespace.

Generate 3D proposals from a standard 3D detector.
Run 2D open-vocabulary detection over camera frames for base and novel categories.
Associate 2D detections with 3D proposals through projection and matching.
Use a 3DMOTFormer-style tracking framework as the base tracker.
Remove class-specific assumptions where possible and apply class-agnostic ground-truth assignment.
Predict proposal confidence scores for 3D tracking rather than inheriting unreliable 2D scores.
Use track consistency scoring so unknown detections receive stable labels across frames.

Open3DTrack evaluates on nuScenes with open-vocabulary tracking splits.
The paper reports overall AMOTA values around 0.567, 0.590, and 0.536 across three splits after adaptation.
It evaluates generalization across different 3D proposal sources, including CenterPoint, MEGVII, and BEVFusion.
Ablations identify confidence score prediction and track consistency scoring as important for novel-class tracking.
Novel-class AMOTA, AMOTP, identity switches, and class stability should be reported separately from base classes.
Performance can change depending on proposal quality and how 2D open-vocabulary detections are lifted to 3D.

Makes open-vocabulary 3D perception persistent over time instead of frame-local.
Compatible with mature 3D proposal detectors.
Separates the objectness/proposal problem from the open-vocabulary semantic labeling problem.
Track consistency helps reduce flicker for novel categories.
Useful for active learning because novel-category tracks are easier to review than isolated detections.
Provides evaluation splits that make closed-set overfitting visible.

Novel objects still depend on the 3D proposal detector generating a usable box.
2D-to-3D association is sensitive to calibration, occlusion, and sparse LiDAR returns.
Open-vocabulary 2D labels can be unstable across views and frames.
Class-agnostic tracking can improve continuity while increasing localization error for some categories.
The method tracks boxes, so irregular objects such as tow bars, hoses, chocks, and aircraft parts may be poorly represented.
It does not prove freespace or occupancy absence.

Strong fit for long-tail GSE and temporary objects that are not in road-driving taxonomies.
Useful for tracking rare equipment once detected: lavatory trucks, GPUs, tow bars, belt loaders, dollies, cones, chocks, and maintenance stands.
Persistent open-vocabulary tracks can feed operator review and data labeling workflows.
The method should be paired with dense LiDAR/radar occupancy near aircraft and personnel.
Airside prompts and class names need a controlled synonym list so tracks do not change labels every frame.
Novel tracks should trigger conservative behavior only when their geometry intersects the path or no-go buffer.

Maintain separate confidence fields for 3D proposal objectness, open-vocabulary semantic score, and track consistency.
Store the text prompt or vocabulary item that created each novel-class label.
Validate camera-LiDAR projection under vibration, thermal drift, and wide-FOV camera distortion.
Use airside-specific 3D size priors carefully; do not force unknown objects into road-vehicle dimensions.
Review false novel tracks around aircraft liveries, signage, reflections, and painted ramp markings.
Feed high-value novel tracks into the data flywheel for class promotion and retraining.