Gauss-Newton, Levenberg-Marquardt, and Dogleg
Visual: nonlinear step geometry comparing Gauss-Newton, LM damping, trust-region radius, dogleg path, accept/reject ratio, and failure modes.
Related docs
- Nonlinear Least Squares from First Principles
- Trust Region and Line Search Globalization
- Jacobians, Autodiff, and Manifold Linearization
- Factor Graph Solver Patterns: Ceres, GTSAM, and g2o
- Nonlinear Solver Diagnostics Crosswalk
- Solver Selection and Convergence Diagnosis
- Bundle Adjustment SLAM
Why it matters for AV, perception, SLAM, and mapping
AV backends repeatedly solve the same local question: "Given the residuals and Jacobians at the current state, what update should I apply?" Gauss-Newton, Levenberg-Marquardt, and dogleg are the standard answers for nonlinear least-squares problems in SLAM, calibration, bundle adjustment, scan matching, pose graph optimization, and map alignment.
Gauss-Newton is fast when the current estimate is already good and residuals are close to linear. Levenberg-Marquardt is more forgiving because it damps the step. Dogleg is a trust-region method that blends a cautious gradient step with the full Gauss-Newton step. Production systems often start with a robust method such as LM or dogleg and behave more like Gauss-Newton near convergence.
Core math and algorithm steps
For whitened residuals F(x), linearize at x:
F(x + Delta) ~= F + J * DeltaThe local quadratic model is:
m(Delta) = 0.5 * ||F + J Delta||^2
= 0.5 * F^T F + g^T Delta + 0.5 * Delta^T H Deltawhere:
g = J^T F
H = J^T JAll three methods differ mainly in how they choose or constrain Delta.
Gauss-Newton
Gauss-Newton minimizes the local linear least-squares model directly:
Delta_gn = argmin_Delta 0.5 * ||F + J Delta||^2The normal equations are:
H * Delta_gn = -gAlgorithm:
- Evaluate residuals
Fand JacobianJ. - Form or implicitly represent
H = J^T Jandg = J^T F. - Solve
H Delta = -g. - Update
x <- x boxplus Delta. - Stop when gradient, step, or cost reduction is small.
Why it works: the exact Hessian of 0.5 * ||F||^2 is:
sum_i grad f_i grad f_i^T + sum_i f_i * Hessian(f_i)Gauss-Newton keeps the first term J^T J and drops the residual-weighted second-derivative terms. This is accurate when residuals are small near the solution or when the model is close to linear.
Strengths:
- Efficient for least-squares problems.
- Natural fit for sparse factor graphs.
- Fast local convergence when initialized well.
- No extra damping parameter to tune.
Weaknesses:
- Can diverge when initialized far from the solution.
- Can fail when
Jis rank deficient or nearly rank deficient. - Sensitive to unmodeled outliers unless residuals are robustified.
Levenberg-Marquardt
Levenberg-Marquardt modifies the linear system with damping:
(H + lambda * D) * Delta_lm = -gCommon choices for D include I or diag(H). Ceres describes LM as a trust-region strategy that regularizes the step with a diagonal matrix. Nocedal and Wright describe LM as using the Gauss-Newton Hessian approximation with a trust-region strategy, which helps when the Jacobian is rank deficient or nearly so.
Behavior:
- Small
lambda: behaves like Gauss-Newton. - Large
lambda: behaves more like a scaled gradient descent step.
Typical trust-region update:
predicted = m(0) - m(Delta)
actual = cost(x) - cost(x boxplus Delta)
rho = actual / predictedThen:
- If
rhois good, accept the step and decreaselambda. - If
rhois poor or negative, reject the step and increaselambda.
Algorithm:
- Evaluate
F,J,H, andg. - Choose an initial damping
lambda. - Solve
(H + lambda D) Delta = -g. - Evaluate trial cost at
x boxplus Delta. - Compare actual and predicted reduction.
- Accept or reject the trial state.
- Adjust
lambdaand repeat.
Strengths:
- More robust than pure Gauss-Newton for poor initialization.
- Handles rank deficiency better through damping.
- Widely available in Ceres, GTSAM, and g2o-style solvers.
Weaknesses:
- Damping policy matters.
- Too much damping causes slow gradient-descent-like progress.
- Too little damping can still take unstable steps.
- The same bad residual modeling that breaks Gauss-Newton can break LM.
Dogleg
Dogleg is a trust-region method that chooses a step inside:
||Delta|| <= radiusIt uses two candidate directions:
Delta_sd = steepest descent / Cauchy step
Delta_gn = Gauss-Newton stepThe classic dogleg path is:
0 -> Delta_sd -> Delta_gnThe selected step is:
Delta_gnif it is inside the trust region.- A scaled steepest descent step if the Cauchy point is outside.
- A point along the segment from
Delta_sdtoDelta_gnif the full Gauss-Newton step is outside but the Cauchy point is inside.
Algorithm:
- Evaluate residuals and Jacobians.
- Compute the Gauss-Newton step.
- Compute the steepest descent or Cauchy step.
- Select the point on the dogleg path that fits the current trust-region radius.
- Evaluate actual versus predicted reduction.
- Accept or reject the step and update the radius.
Strengths:
- Often cheaper than solving multiple damped LM systems.
- Gives a clear trust-region interpretation.
- Works well when the Gauss-Newton step is high quality near the solution.
Weaknesses:
- Usually assumes a direct solve for the Gauss-Newton step.
- Less flexible than LM in some sparse or highly rank-deficient systems.
- Trust-region radius policy still matters.
Choosing among them
Use Gauss-Newton when:
- Initialization is reliable.
- Residual models are smooth and well scaled.
- The graph is well constrained.
- You need predictable low-latency iterations.
Use Levenberg-Marquardt when:
- Initialization can be poor.
- Loop closures or calibration parameters create difficult transients.
- Rank deficiency is likely.
- You want a robust default for batch optimization.
Use dogleg when:
- You want trust-region robustness without repeated LM damping solves.
- The problem has a stable Gauss-Newton direction.
- Direct sparse factorization is available.
Implementation notes
- Always compute the gain ratio using the same residual scaling used by the optimizer.
- Never tune
lambda, trust-region radius, or robust loss thresholds before residual whitening is correct. - Use manifold
boxplusupdates for rotations and poses. Adding directly to quaternion coefficients creates invalid states. - Keep a fixed ordering policy for reproducible linear solver behavior; use fill-reducing orderings for large sparse graphs.
- In bundle adjustment, exploit the camera-landmark block structure with Schur complement solvers.
- In sliding-window VIO, watch marginalization priors. Bad priors can make any method look unstable.
- Log accepted and rejected step counts. Many rejected steps indicate poor initialization, bad scaling, wrong Jacobians, or a trust-region policy mismatch.
Failure modes and diagnostics
- Gauss-Newton cost increases: Add globalization, reduce step size, switch to LM, or improve initialization.
- LM stuck with huge damping: Check residual scale, outliers, incorrect Jacobians, and whether the trial steps are outside the model validity region.
- Dogleg repeatedly takes tiny Cauchy steps: The Gauss-Newton direction is not trusted. Inspect conditioning, rank, and covariance scaling.
- Linear solver failure: Check gauge freedom, duplicate variables, disconnected graph components, and near-zero covariance values.
- False convergence: Step norm is small because damping is huge, not because the optimum is reached. Inspect gradient norm and residual distribution.
- Oscillation: Robust loss thresholds, poor measurement covariances, or inconsistent loop closures can make successive accepted steps fight each other.
- Manifold update bug: Cost may change discontinuously near angle wrapping or quaternion normalization. Test residuals across small perturbations.
Concept Cards
Gauss-Newton step
- What it means here: The update that minimizes the current linearized least-squares model.
- Math object:
Delta_gn = argmin_Delta 0.5 * ||F + J Delta||^2, often solved asJ^T J Delta = -J^T F. - Effect on the solve: Gives a fast local step when the residual model is smooth and the current estimate is close.
- What it solves: Efficient local least-squares progress for well-scaled, well-constrained problems.
- What it does not solve: It does not protect against poor initialization, outliers, rank deficiency, or invalid local models.
- Minimal example: One bundle-adjustment iteration from a visual-inertial initialization.
- Failure symptoms: Cost increases, the step is too large, or the direction is dominated by weak or badly scaled variables.
- Diagnostic artifact: Predicted-versus-actual reduction, step norm, gradient norm, and linear solver summary.
- Normal vs abnormal artifact: Normal actual reduction tracks prediction for accepted steps; abnormal prediction is optimistic or has the wrong sign.
- First debugging move: Evaluate the objective along scaled versions of the Gauss-Newton step.
- Do not confuse with: Cholesky, QR, Schur, or any particular linear backend.
- Read next: Solver Selection and Convergence Diagnosis.
Levenberg-Marquardt damping
- What it means here: A numerical step-control term that makes the local solve more conservative when the Gauss-Newton model is unreliable.
- Math object:
(J^T J + lambda D) Delta = -J^T F. - Effect on the solve: Interpolates between Gauss-Newton-like behavior at low damping and scaled-gradient behavior at high damping.
- What it solves: Stabilizes aggressive or ill-conditioned local steps and supports trust-region-style rejection and retry.
- What it does not solve: It does not add real sensor information, replace a prior, fix gauge freedom, or repair a wrong residual.
- Minimal example: Increase
lambdaafter a rejected calibration trial step, then retry with a smaller update. - Failure symptoms: Damping grows, step norm becomes tiny, cost barely changes, and the solver reports convergence with a bad artifact.
- Diagnostic artifact: Damping trace, gain ratio, accepted/rejected step counts, gradient norm, and final termination reason.
- Normal vs abnormal artifact: Normal damping decreases after reliable accepted steps; abnormal damping remains huge or grows while progress stalls.
- First debugging move: Check residual whitening, Jacobian consistency, and rank before tuning damping policy.
- Do not confuse with: Damping stabilizes a numerical step but does not add real sensor information or replace a prior.
- Read next: Nonlinear Solver Diagnostics Crosswalk.
Dogleg step
- What it means here: A trust-region step selected along a piecewise path from steepest descent toward the Gauss-Newton step.
- Math object: A point on
0 -> Delta_sd -> Delta_gnconstrained by||Delta|| <= radius. - Effect on the solve: Bounds the update while still using the Gauss-Newton direction when it is trusted.
- What it solves: Provides trust-region step control without repeatedly solving damped LM systems.
- What it does not solve: It does not make an indefinite, rank-broken, or badly modeled problem well posed.
- Minimal example: Clip a pose-graph update to the trust-region boundary when the full Gauss-Newton step is too long.
- Failure symptoms: The selected step stays near steepest descent, the radius shrinks repeatedly, or gain ratios remain poor.
- Diagnostic artifact: Trust-region radius, selected dogleg segment, Gauss-Newton step norm, Cauchy step norm, and gain ratio.
- Normal vs abnormal artifact: Normal segments move toward full Gauss-Newton near convergence; abnormal traces stay clipped with low actual reduction.
- First debugging move: Compare the steepest-descent, dogleg, and Gauss-Newton candidate costs on the same trial-state path.
- Do not confuse with: Line-search step length or linear solver choice.
- Read next: Solver Selection and Convergence Diagnosis.
False convergence
- What it means here: A solver stops because a numeric stopping rule is satisfied even though the optimized artifact is still wrong or unsupported.
- Math object: A tolerance trigger on step norm, cost change, gradient norm, iteration count, or damping-limited update.
- Effect on the solve: Ends iterations and can make a bad local state look like a successful optimization.
- What it solves: It only satisfies a stopping criterion.
- What it does not solve: It does not validate residual design, scale, gauge handling, observability, or downstream product quality.
- Minimal example: LM stops on small step norm because damping is enormous, while residual histograms remain biased.
- Failure symptoms: Good-looking solver summary with bad map, calibration, or trajectory; tiny steps with non-small gradient; high damping at termination.
- Diagnostic artifact: Final termination reason, damping trace, gradient norm, residual histograms, and accepted-step history.
- Normal vs abnormal artifact: Normal convergence has consistent cost, gradient, step, and residual artifacts; abnormal convergence satisfies one weak tolerance while diagnostics disagree.
- First debugging move: Read the exact termination reason and compare it against residual distributions and physical validation checks.
- Do not confuse with: Successful convergence, local minimum certification, or rank resolution.
- Read next: Nonlinear Solver Diagnostics Crosswalk.
Sources
- Ceres Solver, "Solving Non-linear Least Squares": https://ceres-solver.readthedocs.io/latest/nnls_solving.html
- Ceres Solver, "Why Ceres?": https://ceres-solver.readthedocs.io/latest/features.html
- GTSAM docs, nonlinear optimizers: https://borglab.github.io/gtsam/nonlinear
- GTSAM Doxygen,
NonlinearOptimizer: https://gtsam.org/doxygen/a05103.html - Nocedal and Wright, "Numerical Optimization": https://convexoptimization.com/TOOLS/nocedal.pdf
- Kuemmerle et al., "g2o: A General Framework for Graph Optimization": https://ais.informatik.uni-freiburg.de/publications/papers/kuemmerle11icra.pdf