Gauss-Newton, Levenberg-Marquardt, and Dogleg

Gauss-Newton, Levenberg-Marquardt, and Dogleg curated visual

Visual: nonlinear step geometry comparing Gauss-Newton, LM damping, trust-region radius, dogleg path, accept/reject ratio, and failure modes.

Why it matters for AV, perception, SLAM, and mapping

AV backends repeatedly solve the same local question: "Given the residuals and Jacobians at the current state, what update should I apply?" Gauss-Newton, Levenberg-Marquardt, and dogleg are the standard answers for nonlinear least-squares problems in SLAM, calibration, bundle adjustment, scan matching, pose graph optimization, and map alignment.

Gauss-Newton is fast when the current estimate is already good and residuals are close to linear. Levenberg-Marquardt is more forgiving because it damps the step. Dogleg is a trust-region method that blends a cautious gradient step with the full Gauss-Newton step. Production systems often start with a robust method such as LM or dogleg and behave more like Gauss-Newton near convergence.

Core math and algorithm steps

For whitened residuals F(x), linearize at x:

text

F(x + Delta) ~= F + J * Delta

The local quadratic model is:

text

m(Delta) = 0.5 * ||F + J Delta||^2
         = 0.5 * F^T F + g^T Delta + 0.5 * Delta^T H Delta

where:

text

g = J^T F
H = J^T J

All three methods differ mainly in how they choose or constrain Delta.

Gauss-Newton

Gauss-Newton minimizes the local linear least-squares model directly:

text

Delta_gn = argmin_Delta 0.5 * ||F + J Delta||^2

The normal equations are:

text

H * Delta_gn = -g

Algorithm:

Evaluate residuals F and Jacobian J.
Form or implicitly represent H = J^T J and g = J^T F.
Solve H Delta = -g.
Update x <- x boxplus Delta.
Stop when gradient, step, or cost reduction is small.

Why it works: the exact Hessian of 0.5 * ||F||^2 is:

text

sum_i grad f_i grad f_i^T + sum_i f_i * Hessian(f_i)

Gauss-Newton keeps the first term J^T J and drops the residual-weighted second-derivative terms. This is accurate when residuals are small near the solution or when the model is close to linear.

Strengths:

Efficient for least-squares problems.
Natural fit for sparse factor graphs.
Fast local convergence when initialized well.
No extra damping parameter to tune.

Weaknesses:

Can diverge when initialized far from the solution.
Can fail when J is rank deficient or nearly rank deficient.
Sensitive to unmodeled outliers unless residuals are robustified.

Levenberg-Marquardt

Levenberg-Marquardt modifies the linear system with damping:

text

(H + lambda * D) * Delta_lm = -g

Common choices for D include I or diag(H). Ceres describes LM as a trust-region strategy that regularizes the step with a diagonal matrix. Nocedal and Wright describe LM as using the Gauss-Newton Hessian approximation with a trust-region strategy, which helps when the Jacobian is rank deficient or nearly so.

Behavior:

Small lambda: behaves like Gauss-Newton.
Large lambda: behaves more like a scaled gradient descent step.

Typical trust-region update:

text

predicted = m(0) - m(Delta)
actual    = cost(x) - cost(x boxplus Delta)
rho       = actual / predicted

Then:

If rho is good, accept the step and decrease lambda.
If rho is poor or negative, reject the step and increase lambda.

Algorithm:

Evaluate F, J, H, and g.
Choose an initial damping lambda.
Solve (H + lambda D) Delta = -g.
Evaluate trial cost at x boxplus Delta.
Compare actual and predicted reduction.
Accept or reject the trial state.
Adjust lambda and repeat.

Strengths:

More robust than pure Gauss-Newton for poor initialization.
Handles rank deficiency better through damping.
Widely available in Ceres, GTSAM, and g2o-style solvers.

Weaknesses:

Damping policy matters.
Too much damping causes slow gradient-descent-like progress.
Too little damping can still take unstable steps.
The same bad residual modeling that breaks Gauss-Newton can break LM.

Dogleg

Dogleg is a trust-region method that chooses a step inside:

text

||Delta|| <= radius

It uses two candidate directions:

text

Delta_sd = steepest descent / Cauchy step
Delta_gn = Gauss-Newton step

The classic dogleg path is:

text

0 -> Delta_sd -> Delta_gn

The selected step is:

Delta_gn if it is inside the trust region.
A scaled steepest descent step if the Cauchy point is outside.
A point along the segment from Delta_sd to Delta_gn if the full Gauss-Newton step is outside but the Cauchy point is inside.

Algorithm:

Evaluate residuals and Jacobians.
Compute the Gauss-Newton step.
Compute the steepest descent or Cauchy step.
Select the point on the dogleg path that fits the current trust-region radius.
Evaluate actual versus predicted reduction.
Accept or reject the step and update the radius.

Strengths:

Often cheaper than solving multiple damped LM systems.
Gives a clear trust-region interpretation.
Works well when the Gauss-Newton step is high quality near the solution.

Weaknesses:

Usually assumes a direct solve for the Gauss-Newton step.
Less flexible than LM in some sparse or highly rank-deficient systems.
Trust-region radius policy still matters.

Choosing among them

Use Gauss-Newton when:

Initialization is reliable.
Residual models are smooth and well scaled.
The graph is well constrained.
You need predictable low-latency iterations.

Use Levenberg-Marquardt when:

Initialization can be poor.
Loop closures or calibration parameters create difficult transients.
Rank deficiency is likely.
You want a robust default for batch optimization.

Use dogleg when:

You want trust-region robustness without repeated LM damping solves.
The problem has a stable Gauss-Newton direction.
Direct sparse factorization is available.

Implementation notes

Always compute the gain ratio using the same residual scaling used by the optimizer.
Never tune lambda, trust-region radius, or robust loss thresholds before residual whitening is correct.
Use manifold boxplus updates for rotations and poses. Adding directly to quaternion coefficients creates invalid states.
Keep a fixed ordering policy for reproducible linear solver behavior; use fill-reducing orderings for large sparse graphs.
In bundle adjustment, exploit the camera-landmark block structure with Schur complement solvers.
In sliding-window VIO, watch marginalization priors. Bad priors can make any method look unstable.
Log accepted and rejected step counts. Many rejected steps indicate poor initialization, bad scaling, wrong Jacobians, or a trust-region policy mismatch.

Failure modes and diagnostics

Gauss-Newton cost increases: Add globalization, reduce step size, switch to LM, or improve initialization.
LM stuck with huge damping: Check residual scale, outliers, incorrect Jacobians, and whether the trial steps are outside the model validity region.
Dogleg repeatedly takes tiny Cauchy steps: The Gauss-Newton direction is not trusted. Inspect conditioning, rank, and covariance scaling.
Linear solver failure: Check gauge freedom, duplicate variables, disconnected graph components, and near-zero covariance values.
False convergence: Step norm is small because damping is huge, not because the optimum is reached. Inspect gradient norm and residual distribution.
Oscillation: Robust loss thresholds, poor measurement covariances, or inconsistent loop closures can make successive accepted steps fight each other.
Manifold update bug: Cost may change discontinuously near angle wrapping or quaternion normalization. Test residuals across small perturbations.

Concept Cards

Gauss-Newton step

What it means here: The update that minimizes the current linearized least-squares model.
Math object: Delta_gn = argmin_Delta 0.5 * ||F + J Delta||^2, often solved as J^T J Delta = -J^T F.
Effect on the solve: Gives a fast local step when the residual model is smooth and the current estimate is close.
What it solves: Efficient local least-squares progress for well-scaled, well-constrained problems.
What it does not solve: It does not protect against poor initialization, outliers, rank deficiency, or invalid local models.
Minimal example: One bundle-adjustment iteration from a visual-inertial initialization.
Failure symptoms: Cost increases, the step is too large, or the direction is dominated by weak or badly scaled variables.
Diagnostic artifact: Predicted-versus-actual reduction, step norm, gradient norm, and linear solver summary.
Normal vs abnormal artifact: Normal actual reduction tracks prediction for accepted steps; abnormal prediction is optimistic or has the wrong sign.
First debugging move: Evaluate the objective along scaled versions of the Gauss-Newton step.
Do not confuse with: Cholesky, QR, Schur, or any particular linear backend.
Read next: Solver Selection and Convergence Diagnosis.

Levenberg-Marquardt damping

What it means here: A numerical step-control term that makes the local solve more conservative when the Gauss-Newton model is unreliable.
Math object: (J^T J + lambda D) Delta = -J^T F.
Effect on the solve: Interpolates between Gauss-Newton-like behavior at low damping and scaled-gradient behavior at high damping.
What it solves: Stabilizes aggressive or ill-conditioned local steps and supports trust-region-style rejection and retry.
What it does not solve: It does not add real sensor information, replace a prior, fix gauge freedom, or repair a wrong residual.
Minimal example: Increase lambda after a rejected calibration trial step, then retry with a smaller update.
Failure symptoms: Damping grows, step norm becomes tiny, cost barely changes, and the solver reports convergence with a bad artifact.
Diagnostic artifact: Damping trace, gain ratio, accepted/rejected step counts, gradient norm, and final termination reason.
Normal vs abnormal artifact: Normal damping decreases after reliable accepted steps; abnormal damping remains huge or grows while progress stalls.
First debugging move: Check residual whitening, Jacobian consistency, and rank before tuning damping policy.
Do not confuse with: Damping stabilizes a numerical step but does not add real sensor information or replace a prior.
Read next: Nonlinear Solver Diagnostics Crosswalk.

Dogleg step

What it means here: A trust-region step selected along a piecewise path from steepest descent toward the Gauss-Newton step.
Math object: A point on 0 -> Delta_sd -> Delta_gn constrained by ||Delta|| <= radius.
Effect on the solve: Bounds the update while still using the Gauss-Newton direction when it is trusted.
What it solves: Provides trust-region step control without repeatedly solving damped LM systems.
What it does not solve: It does not make an indefinite, rank-broken, or badly modeled problem well posed.
Minimal example: Clip a pose-graph update to the trust-region boundary when the full Gauss-Newton step is too long.
Failure symptoms: The selected step stays near steepest descent, the radius shrinks repeatedly, or gain ratios remain poor.
Diagnostic artifact: Trust-region radius, selected dogleg segment, Gauss-Newton step norm, Cauchy step norm, and gain ratio.
Normal vs abnormal artifact: Normal segments move toward full Gauss-Newton near convergence; abnormal traces stay clipped with low actual reduction.
First debugging move: Compare the steepest-descent, dogleg, and Gauss-Newton candidate costs on the same trial-state path.
Do not confuse with: Line-search step length or linear solver choice.
Read next: Solver Selection and Convergence Diagnosis.

False convergence

What it means here: A solver stops because a numeric stopping rule is satisfied even though the optimized artifact is still wrong or unsupported.
Math object: A tolerance trigger on step norm, cost change, gradient norm, iteration count, or damping-limited update.
Effect on the solve: Ends iterations and can make a bad local state look like a successful optimization.
What it solves: It only satisfies a stopping criterion.
What it does not solve: It does not validate residual design, scale, gauge handling, observability, or downstream product quality.
Minimal example: LM stops on small step norm because damping is enormous, while residual histograms remain biased.
Failure symptoms: Good-looking solver summary with bad map, calibration, or trajectory; tiny steps with non-small gradient; high damping at termination.
Diagnostic artifact: Final termination reason, damping trace, gradient norm, residual histograms, and accepted-step history.
Normal vs abnormal artifact: Normal convergence has consistent cost, gradient, step, and residual artifacts; abnormal convergence satisfies one weak tolerance while diagnostics disagree.
First debugging move: Read the exact termination reason and compare it against residual distributions and physical validation checks.
Do not confuse with: Successful convergence, local minimum certification, or rank resolution.
Read next: Nonlinear Solver Diagnostics Crosswalk.

Sources

Ceres Solver, "Solving Non-linear Least Squares": https://ceres-solver.readthedocs.io/latest/nnls_solving.html
Ceres Solver, "Why Ceres?": https://ceres-solver.readthedocs.io/latest/features.html
GTSAM docs, nonlinear optimizers: https://borglab.github.io/gtsam/nonlinear
GTSAM Doxygen, NonlinearOptimizer: https://gtsam.org/doxygen/a05103.html
Nocedal and Wright, "Numerical Optimization": https://convexoptimization.com/TOOLS/nocedal.pdf
Kuemmerle et al., "g2o: A General Framework for Graph Optimization": https://ais.informatik.uni-freiburg.de/publications/papers/kuemmerle11icra.pdf

SLAM Methods

Methods

Gauss-Newton, Levenberg-Marquardt, and Dogleg ​

Related docs ​

Why it matters for AV, perception, SLAM, and mapping ​

Core math and algorithm steps ​

Gauss-Newton ​

Levenberg-Marquardt ​

Dogleg ​

Choosing among them ​

Implementation notes ​

Failure modes and diagnostics ​

Concept Cards ​

Gauss-Newton step ​

Levenberg-Marquardt damping ​

Dogleg step ​

False convergence ​

Sources ​

Gauss-Newton, Levenberg-Marquardt, and Dogleg

Related docs

Why it matters for AV, perception, SLAM, and mapping

Core math and algorithm steps

Gauss-Newton

Levenberg-Marquardt

Dogleg

Choosing among them

Implementation notes

Failure modes and diagnostics

Concept Cards

Gauss-Newton step

Levenberg-Marquardt damping

Dogleg step

False convergence

Sources