Skip to content

Gauss-Newton, Levenberg-Marquardt, and Dogleg

Gauss-Newton, Levenberg-Marquardt, and Dogleg curated visual

Visual: nonlinear step geometry comparing Gauss-Newton, LM damping, trust-region radius, dogleg path, accept/reject ratio, and failure modes.

Why it matters for AV, perception, SLAM, and mapping

AV backends repeatedly solve the same local question: "Given the residuals and Jacobians at the current state, what update should I apply?" Gauss-Newton, Levenberg-Marquardt, and dogleg are the standard answers for nonlinear least-squares problems in SLAM, calibration, bundle adjustment, scan matching, pose graph optimization, and map alignment.

Gauss-Newton is fast when the current estimate is already good and residuals are close to linear. Levenberg-Marquardt is more forgiving because it damps the step. Dogleg is a trust-region method that blends a cautious gradient step with the full Gauss-Newton step. Production systems often start with a robust method such as LM or dogleg and behave more like Gauss-Newton near convergence.

Core math and algorithm steps

For whitened residuals F(x), linearize at x:

text
F(x + Delta) ~= F + J * Delta

The local quadratic model is:

text
m(Delta) = 0.5 * ||F + J Delta||^2
         = 0.5 * F^T F + g^T Delta + 0.5 * Delta^T H Delta

where:

text
g = J^T F
H = J^T J

All three methods differ mainly in how they choose or constrain Delta.

Gauss-Newton

Gauss-Newton minimizes the local linear least-squares model directly:

text
Delta_gn = argmin_Delta 0.5 * ||F + J Delta||^2

The normal equations are:

text
H * Delta_gn = -g

Algorithm:

  1. Evaluate residuals F and Jacobian J.
  2. Form or implicitly represent H = J^T J and g = J^T F.
  3. Solve H Delta = -g.
  4. Update x <- x boxplus Delta.
  5. Stop when gradient, step, or cost reduction is small.

Why it works: the exact Hessian of 0.5 * ||F||^2 is:

text
sum_i grad f_i grad f_i^T + sum_i f_i * Hessian(f_i)

Gauss-Newton keeps the first term J^T J and drops the residual-weighted second-derivative terms. This is accurate when residuals are small near the solution or when the model is close to linear.

Strengths:

  • Efficient for least-squares problems.
  • Natural fit for sparse factor graphs.
  • Fast local convergence when initialized well.
  • No extra damping parameter to tune.

Weaknesses:

  • Can diverge when initialized far from the solution.
  • Can fail when J is rank deficient or nearly rank deficient.
  • Sensitive to unmodeled outliers unless residuals are robustified.

Levenberg-Marquardt

Levenberg-Marquardt modifies the linear system with damping:

text
(H + lambda * D) * Delta_lm = -g

Common choices for D include I or diag(H). Ceres describes LM as a trust-region strategy that regularizes the step with a diagonal matrix. Nocedal and Wright describe LM as using the Gauss-Newton Hessian approximation with a trust-region strategy, which helps when the Jacobian is rank deficient or nearly so.

Behavior:

  • Small lambda: behaves like Gauss-Newton.
  • Large lambda: behaves more like a scaled gradient descent step.

Typical trust-region update:

text
predicted = m(0) - m(Delta)
actual    = cost(x) - cost(x boxplus Delta)
rho       = actual / predicted

Then:

  • If rho is good, accept the step and decrease lambda.
  • If rho is poor or negative, reject the step and increase lambda.

Algorithm:

  1. Evaluate F, J, H, and g.
  2. Choose an initial damping lambda.
  3. Solve (H + lambda D) Delta = -g.
  4. Evaluate trial cost at x boxplus Delta.
  5. Compare actual and predicted reduction.
  6. Accept or reject the trial state.
  7. Adjust lambda and repeat.

Strengths:

  • More robust than pure Gauss-Newton for poor initialization.
  • Handles rank deficiency better through damping.
  • Widely available in Ceres, GTSAM, and g2o-style solvers.

Weaknesses:

  • Damping policy matters.
  • Too much damping causes slow gradient-descent-like progress.
  • Too little damping can still take unstable steps.
  • The same bad residual modeling that breaks Gauss-Newton can break LM.

Dogleg

Dogleg is a trust-region method that chooses a step inside:

text
||Delta|| <= radius

It uses two candidate directions:

text
Delta_sd = steepest descent / Cauchy step
Delta_gn = Gauss-Newton step

The classic dogleg path is:

text
0 -> Delta_sd -> Delta_gn

The selected step is:

  • Delta_gn if it is inside the trust region.
  • A scaled steepest descent step if the Cauchy point is outside.
  • A point along the segment from Delta_sd to Delta_gn if the full Gauss-Newton step is outside but the Cauchy point is inside.

Algorithm:

  1. Evaluate residuals and Jacobians.
  2. Compute the Gauss-Newton step.
  3. Compute the steepest descent or Cauchy step.
  4. Select the point on the dogleg path that fits the current trust-region radius.
  5. Evaluate actual versus predicted reduction.
  6. Accept or reject the step and update the radius.

Strengths:

  • Often cheaper than solving multiple damped LM systems.
  • Gives a clear trust-region interpretation.
  • Works well when the Gauss-Newton step is high quality near the solution.

Weaknesses:

  • Usually assumes a direct solve for the Gauss-Newton step.
  • Less flexible than LM in some sparse or highly rank-deficient systems.
  • Trust-region radius policy still matters.

Choosing among them

Use Gauss-Newton when:

  • Initialization is reliable.
  • Residual models are smooth and well scaled.
  • The graph is well constrained.
  • You need predictable low-latency iterations.

Use Levenberg-Marquardt when:

  • Initialization can be poor.
  • Loop closures or calibration parameters create difficult transients.
  • Rank deficiency is likely.
  • You want a robust default for batch optimization.

Use dogleg when:

  • You want trust-region robustness without repeated LM damping solves.
  • The problem has a stable Gauss-Newton direction.
  • Direct sparse factorization is available.

Implementation notes

  • Always compute the gain ratio using the same residual scaling used by the optimizer.
  • Never tune lambda, trust-region radius, or robust loss thresholds before residual whitening is correct.
  • Use manifold boxplus updates for rotations and poses. Adding directly to quaternion coefficients creates invalid states.
  • Keep a fixed ordering policy for reproducible linear solver behavior; use fill-reducing orderings for large sparse graphs.
  • In bundle adjustment, exploit the camera-landmark block structure with Schur complement solvers.
  • In sliding-window VIO, watch marginalization priors. Bad priors can make any method look unstable.
  • Log accepted and rejected step counts. Many rejected steps indicate poor initialization, bad scaling, wrong Jacobians, or a trust-region policy mismatch.

Failure modes and diagnostics

  • Gauss-Newton cost increases: Add globalization, reduce step size, switch to LM, or improve initialization.
  • LM stuck with huge damping: Check residual scale, outliers, incorrect Jacobians, and whether the trial steps are outside the model validity region.
  • Dogleg repeatedly takes tiny Cauchy steps: The Gauss-Newton direction is not trusted. Inspect conditioning, rank, and covariance scaling.
  • Linear solver failure: Check gauge freedom, duplicate variables, disconnected graph components, and near-zero covariance values.
  • False convergence: Step norm is small because damping is huge, not because the optimum is reached. Inspect gradient norm and residual distribution.
  • Oscillation: Robust loss thresholds, poor measurement covariances, or inconsistent loop closures can make successive accepted steps fight each other.
  • Manifold update bug: Cost may change discontinuously near angle wrapping or quaternion normalization. Test residuals across small perturbations.

Concept Cards

Gauss-Newton step

  • What it means here: The update that minimizes the current linearized least-squares model.
  • Math object: Delta_gn = argmin_Delta 0.5 * ||F + J Delta||^2, often solved as J^T J Delta = -J^T F.
  • Effect on the solve: Gives a fast local step when the residual model is smooth and the current estimate is close.
  • What it solves: Efficient local least-squares progress for well-scaled, well-constrained problems.
  • What it does not solve: It does not protect against poor initialization, outliers, rank deficiency, or invalid local models.
  • Minimal example: One bundle-adjustment iteration from a visual-inertial initialization.
  • Failure symptoms: Cost increases, the step is too large, or the direction is dominated by weak or badly scaled variables.
  • Diagnostic artifact: Predicted-versus-actual reduction, step norm, gradient norm, and linear solver summary.
  • Normal vs abnormal artifact: Normal actual reduction tracks prediction for accepted steps; abnormal prediction is optimistic or has the wrong sign.
  • First debugging move: Evaluate the objective along scaled versions of the Gauss-Newton step.
  • Do not confuse with: Cholesky, QR, Schur, or any particular linear backend.
  • Read next: Solver Selection and Convergence Diagnosis.

Levenberg-Marquardt damping

  • What it means here: A numerical step-control term that makes the local solve more conservative when the Gauss-Newton model is unreliable.
  • Math object: (J^T J + lambda D) Delta = -J^T F.
  • Effect on the solve: Interpolates between Gauss-Newton-like behavior at low damping and scaled-gradient behavior at high damping.
  • What it solves: Stabilizes aggressive or ill-conditioned local steps and supports trust-region-style rejection and retry.
  • What it does not solve: It does not add real sensor information, replace a prior, fix gauge freedom, or repair a wrong residual.
  • Minimal example: Increase lambda after a rejected calibration trial step, then retry with a smaller update.
  • Failure symptoms: Damping grows, step norm becomes tiny, cost barely changes, and the solver reports convergence with a bad artifact.
  • Diagnostic artifact: Damping trace, gain ratio, accepted/rejected step counts, gradient norm, and final termination reason.
  • Normal vs abnormal artifact: Normal damping decreases after reliable accepted steps; abnormal damping remains huge or grows while progress stalls.
  • First debugging move: Check residual whitening, Jacobian consistency, and rank before tuning damping policy.
  • Do not confuse with: Damping stabilizes a numerical step but does not add real sensor information or replace a prior.
  • Read next: Nonlinear Solver Diagnostics Crosswalk.

Dogleg step

  • What it means here: A trust-region step selected along a piecewise path from steepest descent toward the Gauss-Newton step.
  • Math object: A point on 0 -> Delta_sd -> Delta_gn constrained by ||Delta|| <= radius.
  • Effect on the solve: Bounds the update while still using the Gauss-Newton direction when it is trusted.
  • What it solves: Provides trust-region step control without repeatedly solving damped LM systems.
  • What it does not solve: It does not make an indefinite, rank-broken, or badly modeled problem well posed.
  • Minimal example: Clip a pose-graph update to the trust-region boundary when the full Gauss-Newton step is too long.
  • Failure symptoms: The selected step stays near steepest descent, the radius shrinks repeatedly, or gain ratios remain poor.
  • Diagnostic artifact: Trust-region radius, selected dogleg segment, Gauss-Newton step norm, Cauchy step norm, and gain ratio.
  • Normal vs abnormal artifact: Normal segments move toward full Gauss-Newton near convergence; abnormal traces stay clipped with low actual reduction.
  • First debugging move: Compare the steepest-descent, dogleg, and Gauss-Newton candidate costs on the same trial-state path.
  • Do not confuse with: Line-search step length or linear solver choice.
  • Read next: Solver Selection and Convergence Diagnosis.

False convergence

  • What it means here: A solver stops because a numeric stopping rule is satisfied even though the optimized artifact is still wrong or unsupported.
  • Math object: A tolerance trigger on step norm, cost change, gradient norm, iteration count, or damping-limited update.
  • Effect on the solve: Ends iterations and can make a bad local state look like a successful optimization.
  • What it solves: It only satisfies a stopping criterion.
  • What it does not solve: It does not validate residual design, scale, gauge handling, observability, or downstream product quality.
  • Minimal example: LM stops on small step norm because damping is enormous, while residual histograms remain biased.
  • Failure symptoms: Good-looking solver summary with bad map, calibration, or trajectory; tiny steps with non-small gradient; high damping at termination.
  • Diagnostic artifact: Final termination reason, damping trace, gradient norm, residual histograms, and accepted-step history.
  • Normal vs abnormal artifact: Normal convergence has consistent cost, gradient, step, and residual artifacts; abnormal convergence satisfies one weak tolerance while diagnostics disagree.
  • First debugging move: Read the exact termination reason and compare it against residual distributions and physical validation checks.
  • Do not confuse with: Successful convergence, local minimum certification, or rank resolution.
  • Read next: Nonlinear Solver Diagnostics Crosswalk.

Sources

Public research notes collected from public sources.