Skip to content

Robust Losses and M-Estimators: Huber, Cauchy, Tukey, and Geman-McClure

Robust Losses and M-Estimators: Huber, Cauchy, Tukey, and Geman-McClure curated visual

Visual: robust loss comparison plot showing squared, Huber, Cauchy, Tukey, and Geman-McClure loss, influence, IRLS weight, whitening scale, and outlier rejection behavior.

Robust losses are the small mathematical wrapper that keeps a large estimator from believing every residual equally. In SLAM, perception, calibration, and tracking, the residual model is often locally useful but globally imperfect: feature matches can be wrong, LiDAR points can belong to moving objects, radar returns can be ghosts, GNSS can jump under multipath, and labels can be noisy. Squared loss lets these residuals grow unlimited influence. Robust losses reduce that influence after a residual becomes too large to look like an inlier.

Why it matters for SLAM and perception

Robust losses show up wherever a stack minimizes residuals:

  • ICP and point-to-plane registration downweight dynamic objects, mixed pixels, rain or snow speckles, and bad nearest-neighbor matches.
  • Visual odometry and bundle adjustment downweight wrong feature tracks, rolling-shutter artifacts, occlusions, and reprojection outliers.
  • Pose graph SLAM downweights false loop closures and weak GPS or map priors.
  • GNSS and radar fusion downweight multipath and ghost returns after gating.
  • Perception training uses Smooth L1 or Huber-style losses for box regression when labels or decoded boxes have heavy-tailed errors.

The loss is not a substitute for the sensor model. A robust kernel should be applied after residuals are in comparable statistical units.

Core math

There are two related conventions to keep separate:

  • A scalar teaching convention writes the loss as rho_r(r), with influence psi(r) = d rho_r(r) / d r and IRLS weight w(r) = psi(r) / r.
  • Many solver and library APIs take the squared whitened residual norm s = ||e||^2, write the robustifier as rho_s(s), and use derivatives with respect to s. Library constants and factors of 2 therefore vary.

For a whitened scalar residual r, ordinary least squares minimizes:

text
rho_r(r) = 0.5 * r^2

The influence function is:

text
psi(r) = d rho_r(r) / d r

For squared loss:

text
psi(r) = r

Influence grows without bound, so one very large residual can dominate the solution. An M-estimator replaces the quadratic penalty:

text
min_x sum_i rho_r(r_i(x))

Many solvers implement this through iteratively reweighted least squares (IRLS):

text
w(r) = psi(r) / r
min_dx 0.5 * sum_i w(r_i) * (r_i + J_i dx)^2

For vector residual blocks in a solver, use the whitened squared norm:

text
e_i = L_i r_i
s_i = ||L_i r_i||^2
cost_i = rho_s(s_i)

where L_i is the square-root information matrix. Robust thresholds are then in whitened units, not raw meters, pixels, radians, or Doppler units.

Common robust losses

For residual r and scale k:

LossBehaviorTypical use
SquaredInfluence grows linearly forever.Clean Gaussian inliers with reliable associations.
HuberQuadratic near zero, linear after k.General-purpose SLAM, GNSS, visual reprojection, and mild outliers.
CauchyLogarithmic growth for large residuals.More aggressive scan matching and map localization outlier rejection.
Tukey bisquareRedescending; large residuals get near-zero influence.Good initialization with clear gross outliers.
Geman-McClureBounded, nonconvex, strong rejection.Robust PGO, GNC schedules, and high-outlier loop closure problems.

Huber loss:

text
rho_r(r) = 0.5 * r^2                 if |r| <= k
rho_r(r) = k * (|r| - 0.5 * k)       otherwise

w(r) = 1                           if |r| <= k
w(r) = k / |r|                     otherwise

Cauchy-style loss:

text
rho_r(r) = 0.5 * k^2 * log(1 + (r / k)^2)
w(r) = 1 / (1 + (r / k)^2)

Tukey bisquare:

text
rho_r(r) = (k^2 / 6) * (1 - (1 - (r / k)^2)^3)   if |r| <= k
rho_r(r) = k^2 / 6                               otherwise

w(r) = (1 - (r / k)^2)^2                       if |r| <= k
w(r) = 0                                       otherwise

Geman-McClure-style loss:

text
rho_r(r) = r^2 / (r^2 + k^2)
w(r) proportional to k^2 / (r^2 + k^2)^2

For Geman-McClure, that weight expression is up to constant normalization, depending on whether the loss is written as a scalar residual loss rho_r(r) or a squared-norm solver loss rho_s(s). The exact constants vary across libraries. The operational question is the same: how quickly should a residual lose influence as it leaves the inlier noise band?

Choosing a loss

SituationPreferReason
First robustification passHuberConvex and easier to optimize.
Scan matching with dynamic clutterCauchy or HuberKeeps useful medium residuals while weakening strong outliers.
False loop closures after verificationGNC, switchable constraints, or Geman-McClure-style lossesDirect nonconvex robust losses can need a schedule or switch model.
Good initialization and gross outliersTukey or Geman-McClureRedescending behavior can fully suppress bad measurements.
Poor initializationLeast squares, Huber, or graduated robustificationStrong redescending losses can ignore residuals needed for recovery.
Safety evidence and diagnosticsHuber plus gates and residual histogramsEasier to explain, tune, and audit.

Use robust losses only on factors that can plausibly be outliers. IMU preintegration, odometry continuity, and hard frame constraints usually need different treatment from camera feature tracks, loop closures, radar detections, or GNSS fixes.

Whitening and scale

Robust loss scale should usually be set after whitening:

text
e = L * r
s = ||e||^2
cost = rho_s(s)

This is why a Huber threshold such as k = 1.345 means roughly "number of sigmas" in many factor-graph APIs, not 1.345 meters or pixels. If residuals are not whitened first, a threshold that works for LiDAR meters will be meaningless for camera pixels or IMU bias units.

Practical checks:

  • Plot whitened residual histograms before and after robustification.
  • Plot robust weights against residual magnitude.
  • Separate residuals by sensor, range, class, weather, and map region.
  • Log how many residuals are nearly ignored by a redescending loss.
  • Compare pre-gate and post-gate innovation statistics to avoid selection bias.

Failure modes

SymptomLikely causeDiagnostic
One sensor still dominatesCovariance is too tight before robustification.Inspect per-factor whitened residuals and weights.
Good data is ignoredRobust threshold is too low.Plot weights and inlier residual distribution.
Solver converges to a bad local minimumNonconvex loss was applied too early.Start with Huber or use graduated nonconvexity.
Map jumps after a loop closureRobust loss cannot fix a plausible but wrong association alone.Add geometric verification, switchable constraints, or loop quarantine.
Validation looks good only after rejectionSelection bias from gates and robust weights.Report pre-gate and post-gate statistics separately.
Robust kernel hides model biasResidual model is wrong by range, class, weather, or timing.Bin residuals by operating condition and fix the model.

Sources

Public research notes collected from public sources.