S2M

What It Is

S2M is a framework for out-of-distribution object segmentation in semantic segmentation.
The paper title is "Segment Every Out-of-Distribution Object".
S2M expands to Score To segmentation Mask.
It converts anomaly score maps into object-level segmentation masks.
The method was published at CVPR 2024.
It is designed for safety-critical settings such as autonomous driving where unknown objects should not be treated as background.
S2M is a mask-generation method, not an open-vocabulary semantic naming method.

Existing OoD segmentation methods often output pixel-level anomaly scores.
Turning those scores into masks usually requires threshold selection and can fragment objects.
S2M uses anomaly scores to generate prompts for a promptable segmentation model.
The promptable model then segments the entire OoD object rather than thresholding individual pixels.
This removes the need for a manually selected dataset-specific mask threshold.
The framework keeps the anomaly scorer and mask generator modular.
It converts uncertainty evidence into object-shaped masks.

Inputs are RGB images and anomaly score maps from an existing semantic anomaly detector.
The anomaly map can come from methods such as energy, rejection, or reconstruction-based OoD scoring.
S2M transforms high-anomaly regions into prompts such as boxes or points.
A promptable segmentation model consumes those prompts.
Outputs are binary masks for OoD objects.
The output does not include a semantic label for the unknown object.

The pipeline starts with a semantic segmentation or anomaly scoring model.
Anomaly scores identify candidate object regions without committing to a global thresholded mask.
Candidate prompts are produced from the score distribution.
Segment Anything style promptable segmentation generates full object masks.
Mask selection ranks or filters generated masks using the anomaly evidence.
Evaluation measures object-level anomaly segmentation quality rather than only pixel score quality.
Metrics include IoU and mean F1 on anomaly segmentation benchmarks.

S2M is primarily a post-processing and prompting framework around existing anomaly scores.
The CVPR paper evaluates Fishyscapes, Segment-Me-If-You-Can, and RoadAnomaly.
The authors report average improvements of about 20 percent in IoU and 40 percent in mean F1 over prior state of the art.
The method compares against thresholded anomaly maps and other OoD segmentation approaches.
Its performance depends on the upstream anomaly scorer and the promptable segmentation model.
The official code is available from the authors' GitHub repository.

Avoids brittle global threshold selection for anomaly masks.
Produces coherent object masks, which are easier to inspect and track than scattered anomaly pixels.
Can be plugged into multiple anomaly scoring backbones.
Uses foundation segmentation without requiring a new large anomaly dataset.
Evaluation on driving anomaly benchmarks makes it relevant to safety perception.
The output masks are useful for downstream review, tracking, or dataset mining.

If the anomaly scorer misses an object, S2M has no evidence to prompt from.
Promptable segmentation can fail on tiny, distant, overlapping, or heavily occluded objects.
It does not name the anomaly, so operator-facing systems need another labeling module.
High anomaly scores on boundaries, shadows, or sensor artifacts can generate false masks.
It is 2D and does not provide distance, velocity, or occupancy.
Dense groups of anomalies can produce merged or fragmented prompts depending on score topology.

S2M is useful for detecting and masking unexpected apron objects in camera images.
It can convert weak anomaly heatmaps for FOD, debris, animals, or unusual equipment into reviewable masks.
Object masks are better than raw heatmaps for remote assistance and annotation workflows.
Airside use needs upstream anomaly scoring trained or calibrated on apron imagery.
Outputs should be projected into 3D using depth or LiDAR before affecting motion planning.
It is a good data-mining tool for unknown-object collection, not a complete real-time hazard classifier.

Keep the upstream anomaly scorer replaceable; benchmark multiple scorers with the same S2M wrapper.
Store anomaly score maps alongside final masks for audit and threshold-free debugging.
Validate mask quality separately for small FOD, cones, chocks, hoses, and personnel equipment.
Add temporal consistency checks to suppress one-frame mask artifacts.
Use conservative downstream behavior when S2M produces large masks near aircraft or service vehicles.
Combine with semantic naming methods such as Clipomaly if operator-readable labels are required.