SHIFT Continuous Domain Shift
Last updated: 2026-05-09
SHIFT is a CVPR 2022 synthetic autonomous-driving dataset for continuous multi-task domain adaptation. It is valuable because it exposes gradual changes in weather, illumination, and traffic variables instead of treating robustness as a binary clean-versus-corrupted test.
Related pages: sensor corruption robustness benchmarks, weather robustness datasets, test-time adaptation for airside
Scope
| Item | SHIFT coverage |
|---|---|
| Primary domain | Synthetic road driving generated for perception research |
| Shift variables | Cloudiness, rain intensity, fog intensity, time of day, vehicle density, pedestrian density |
| Shift shape | Discrete domains and continuous intra-sequence shifts |
| Main use | Domain adaptation, domain generalization, test-time adaptation, multi-task robustness |
| Release support | Public benchmark toolkit and devkit |
SHIFT should be used when the question is "where does performance start to bend or fall off" rather than "does the model pass in a named weather bucket."
Sensors And Labels
| Asset | Notes |
|---|---|
| RGB images/video | Downloadable at image and video frame rates through the devkit |
| LiDAR point clouds | Synthetic point-cloud modality with calibration support |
| Depth | Dense depth supervision for monocular or multi-task learning |
| Optical flow | Motion supervision for temporal perception |
| 2D and 3D boxes | Object detection labels through devkit data groups |
| Semantic segmentation | Dense semantic masks |
| Instance segmentation | Instance-level masks and IDs |
| Multi-view support | Devkit supports selecting views, data groups, splits, frame rate, and shift type |
The devkit exposes discrete shift data and continuous variants, including multiple continuous sampling rates.
Tasks And Metrics
| Task | Practical metric |
|---|---|
| Semantic segmentation | mIoU by shift intensity and task combination |
| Object detection | 2D/3D AP by domain variable and severity |
| Depth estimation | AbsRel/RMSE as weather and lighting vary |
| Optical flow | Endpoint error under lighting/weather change |
| Multi-task adaptation | Per-task degradation curves and negative-transfer analysis |
| Test-time adaptation | Online recovery speed, stability, and catastrophic adaptation failures |
The most useful metric is a degradation curve over the continuous variable, not a single aggregate validation score.
Best Use
Use SHIFT to:
- plot ODD cliff points as rain, fog, darkness, or traffic density increases;
- compare static training, domain adaptation, and test-time adaptation;
- test multi-task training tradeoffs between segmentation, depth, flow, and detection;
- stress temporal perception with slowly changing conditions;
- create synthetic-to-real hypotheses before spending labeling budget on real airport logs.
SHIFT is especially useful for release-gate design because it encourages thresholds such as "degrade mode starts at this confidence and visibility range" rather than an unstructured adverse-weather bucket.
Airside Transfer
Airside operations also contain continuous shifts: drizzle to heavy rain, dusk to night, low to high apron traffic, dry to wet reflective pavement, and light to heavy fog. SHIFT can prototype:
- degradation-curve reporting for camera/LiDAR perception;
- adaptation triggers for airport onboarding;
- multi-task models that share depth, segmentation, and detection heads;
- ODD boundary logic that gates autonomous mode before perception collapses.
The transfer is methodological. The imagery, object taxonomy, road layout, and sensor physics are synthetic road-domain data. Final evidence needs airport logs with aircraft, GSE, ramp workers, stand markings, cones, chocks, glycol film, and jet-bridge geometry.
Limitations
- Synthetic data does not reproduce all camera, LiDAR, radar, lens-contamination, or wet-surface artifacts.
- Road-domain labels do not include airside classes or airport surface rules.
- Continuous variables are controlled simulator parameters, not direct measurements such as MOR visibility, rainfall rate, or sensor blockage.
- Good adaptation on SHIFT can still fail on real sensor noise and operations-driven distribution shifts.
- Treat results as robustness-screening evidence, not production acceptance evidence.