Pit30M Global Localization

Last updated: 2026-05-09

Pit30M, now released through Aurora as the Aurora Multi-Sensor Dataset, is a large-scale benchmark for city-scale retrieval-based global localization. It is important because it asks whether image and LiDAR retrieval can support sub-meter localization at the scale and diversity required by self-driving vehicles.

Scope

Item	Pit30M coverage
Primary domain	Pittsburgh metropolitan road driving
Scale	More than 30 million image/LiDAR frames
Time span	Captured between January 2017 and February 2018
Conditions	Seasons, weather, time of day, traffic, and occlusion variation
Main use	Sub-meter retrieval-based global localization at city scale
Data access	Public AWS Open Data S3 bucket under non-commercial academic terms

Pit30M is a relocalization benchmark, not just an odometry dataset. It is intended to test whether a system can retrieve the correct place under large-scale long-term variation.

Sensors And Metadata

Asset	Notes
Cameras	Seven 1920 x 1200 cameras, including a forward stereo pair and five wide-angle cameras for 360-degree coverage
LiDAR	64-beam Velodyne HDL-64E point clouds
Localization ground truth	Accurate vehicle localization ground truth from the collection platform
Weather metadata	Historical weather annotations
Astronomical metadata	Time-of-day and sun-related metadata
Semantic segmentation	Image and LiDAR semantic segmentation used as a proxy for occlusion
SDK	Python package and log-based readers

The semantic and weather metadata make Pit30M useful for stratified localization analysis: a failed retrieval can be related to occlusion, weather, season, or traffic rather than treated as a single error.

Tasks And Metrics

Task	Practical metric
Image retrieval localization	Recall@K within 1 m, 5 m, and 10 m; median and P95 position error
LiDAR retrieval localization	Recall@K, false-place matches, distance-to-nearest true match
Cross-condition relocalization	Recall by season, weather, time of day, and occlusion proxy
Map age robustness	Performance as database/query dates separate
Startup recovery	Time or frames needed to accept a correct global pose

For production use, report false accepts separately from missed localization. A missed match can trigger relocalization; a confident wrong place can corrupt routing and geofence logic.

Best Use

Use Pit30M to:

benchmark place-recognition descriptors at city scale;
compare image-only, LiDAR-only, and cross-modal retrieval;
test long-term map aging and appearance change;
evaluate relocalization under traffic occlusion and weather metadata;
separate global localization metrics from local odometry metrics.

It is a strong public anchor before building an airport-specific relocalization benchmark with repeated stands, service roads, depot routes, and terminal-side aliasing.

Airside Transfer

Airport autonomy has strong global-localization needs: startup at a depot, recovery after tow, route handoff between map tiles, and relocalization near visually similar gates. Pit30M can inform:

retrieval recall and false-accept thresholds;
map-age reporting by date and condition;
occlusion-aware evaluation where parked aircraft or GSE block landmarks;
database/query split design for repeated airport routes.

Airport transfer is not direct. Road-scale Pittsburgh data lacks repeated gate geometry, aircraft occlusion, apron markings, GSE clutter, terminal overhangs, and airport geofencing rules.

Limitations

It benchmarks retrieval-based localization more than full online SLAM.
The dataset is very large and requires careful storage/streaming strategy.
It is road-domain data, not airside or industrial-yard data.
Sensor and calibration assumptions may differ from a production airport vehicle.
License terms are non-commercial academic; review terms before commercial use.

SLAM Methods

Methods

Pit30M Global Localization ​

Scope ​

Sensors And Metadata ​

Tasks And Metrics ​

Best Use ​

Airside Transfer ​

Limitations ​

Sources ​