Skip to content

Solver Selection and Convergence Diagnosis

Solver Selection and Convergence Diagnosis curated visual

Visual: solver choice matrix and convergence symptom routing.

Trial-state lifecycle

Solver Selection and Convergence Diagnosis starts by separating the committed state from the trial state. A nonlinear iteration forms a tangent step, retracts it to a trial state, evaluates actual cost at that trial state, compares actual against predicted reduction, and only then accepts or rejects the update. Rejected steps leave the committed state unchanged while damping, trust-region radius, or line-search length changes.

This lifecycle explains why a log can show many linear solves without state progress. The solver is not necessarily unstable; it may be protecting the committed state because the local model is not predicting the true objective.

Solver selection matrix

Problem conditionExpected symptomsNonlinear methodLinear backendAvoid whenConfirming telemetry
Good initialization, smooth residuals, well-scaled problem.Fast cost decrease, few rejected steps.Gauss-Newton.Sparse Cholesky or QR.Far from basin or rank uncertain.Gain ratio near 1, small final gradient.
Moderate nonlinearity or uncertain scale.Occasional rejected steps, need step control.Levenberg-Marquardt damping.Cholesky, QR, Schur, or PCG depending on structure.Damping hides missing gauge or bad residuals.Damping decreases after accepted steps.
Trust-region behavior needed with GN and steepest-descent blend.GN step too large but gradient direction useful.Dogleg.SPD direct backend.Indefinite or rank-deficient system.Trust-region radius expands after good ratio.
Discontinuous residual families or association changes.Predicted reduction unreliable.Trust region with conservative acceptance.QR/SVD debug backend, then production sparse backend.Objective jumps are front-end data changes.Actual against predicted reduction explains accepts/rejects.
Large-scale bundle adjustment or landmark-heavy SLAM.Direct full solve memory too high.LM or trust region.Schur complement direct or iterative Schur.Eliminated blocks are singular or dense reduced system explodes.Fill report, Schur block stats, accepted nonlinear progress.
Massive sparse SPD system with acceptable approximate linear solves.Direct factorization too slow or memory-heavy.LM/trust region with inexact linear solve.PCG with preconditioner.Operator is not SPD or residual norms stagnate.PCG residual norms drop and nonlinear cost improves.
Rank uncertain, covariance suspicious, or gauge policy under review.Cholesky fails, covariance nonsensical, weak modes.Debug with damped GN/LM.QR or SVD on a representative reduced case.Full production graph is too large for dense rank tools.Singular values and nullspace match expected gauge.
Production library integration decision.Same method behaves differently across APIs.Method selected separately from solver library choice.Backend exposed by Ceres, GTSAM, g2o, or custom stack.Library hides telemetry needed for safety review.Summaries expose cost, step, damping, rank, and backend stats.

Convergence diagnostics

Convergence criteria should be read as stopping rules, not proof of correctness:

  • Cost change small: the objective no longer changes much, but the objective may still be wrong.
  • Gradient norm small: the local first-order condition is nearly satisfied, but the point can be a bad local minimum.
  • Step norm small: updates are tiny, but this can be false convergence if damping is huge or scaling is poor.
  • Iteration limit reached: the solver stopped because of budget, not because the result is valid.
  • Trust-region or line-search acceptance stable: step control is healthy, but residual design and covariance still need audit.

False convergence is especially common when damping grows large and the step norm becomes tiny. The log can look calm while the solver is stuck outside a valid local model or fighting poor scale.

Failure modes

Failure modeTypical logFirst interpretationFirst action
Repeated rejected stepsLow or negative gain ratio, shrinking radius, shorter line-search length.Predicted reduction is not matching actual cost.Check Jacobians and local model by sweeping cost along the step.
Damping runawayLarge damping, tiny step, little cost change.LM is suppressing unstable steps or hiding scale problems.Inspect whitening and gradient norm before changing library.
False convergenceStep or cost tolerance reached with bad artifact.Stopping rule was satisfied for the written objective.Audit residual design and per-family cost share.
Backend failureCholesky/LDLT fails or PCG stagnates.Rank, SPD assumption, or conditioning issue.Switch to QR/SVD debug case and inspect nullspace.
Solver library choice masks method issueDifferent APIs show different defaults or parameterizations.Library defaults changed damping, loss ordering, local coordinates, or backend.Normalize configuration and compare telemetry side by side.

Concept cards

Gauss-Newton

FieldExplanation
What it means hereA nonlinear least-squares method that solves a local linear least-squares approximation.
Math objectJ^T J delta = -J^T r.
Effect on the solveTakes fast steps when residuals are smooth and initialization is close.
What it solvesEfficient local optimization for well-behaved least-squares problems.
What it does not solveIt does not globalize steps or handle poor initialization robustly.
Minimal exampleBundle adjustment near a good visual-inertial initialization.
Failure symptomsCost increases, rejected steps under trust region, unstable large updates.
Diagnostic artifactPredicted versus actual reduction and step norm.
Normal vs abnormal artifactNormal actual reduction tracks prediction; abnormal prediction is optimistic.
First debugging movePlot cost along the GN step.
Do not confuse withCholesky or any particular linear backend.
Read nextGauss-Newton, Levenberg-Marquardt, and Dogleg.

Levenberg-Marquardt damping

FieldExplanation
What it means hereA step-control mechanism that adds damping to make the local solve more conservative.
Math objectDamped system such as (J^T J + lambda D) delta = -J^T r.
Effect on the solveMoves between GN-like and gradient-descent-like behavior.
What it solvesHelps when GN steps are too aggressive.
What it does not solveIt does not create a measurement prior or fix gauge freedom.
Minimal exampleIncreasing damping after a rejected calibration step.
Failure symptomsDamping grows without accepted progress, tiny step norm, false convergence.
Diagnostic artifactDamping value, gain ratio, accepted/rejected count.
Normal vs abnormal artifactNormal damping falls after good steps; abnormal damping stays huge.
First debugging moveCompare damping trend with whitened residual scale and gradient norm.
Do not confuse withPrior or gauge anchor.
Read nextNonlinear Solver Diagnostics Crosswalk.

Dogleg

FieldExplanation
What it means hereA trust-region method that blends steepest-descent and Gauss-Newton steps.
Math objectPiecewise path inside a trust-region radius.
Effect on the solveProvides a bounded step when GN is too long but gradient direction is useful.
What it solvesStep control for SPD least-squares subproblems.
What it does not solveIt does not handle indefinite or rank-broken backends automatically.
Minimal examplePose calibration step clipped inside the trust region.
Failure symptomsRadius shrinks repeatedly, dogleg path always near steepest descent.
Diagnostic artifactTrust-region radius and selected dogleg segment.
Normal vs abnormal artifactNormal segment changes as convergence improves; abnormal stays clipped with low gain ratio.
First debugging moveCompare dogleg step with GN and gradient steps.
Do not confuse withGeneric line search.
Read nextGauss-Newton, Levenberg-Marquardt, and Dogleg.

Trust-region ratio

FieldExplanation
What it means hereRatio between actual cost decrease and predicted local-model decrease.
Math objectrho = actual_reduction / predicted_reduction.
Effect on the solveAccepts or rejects trial steps and updates radius or damping.
What it solvesTests local-model reliability.
What it does not solveIt does not identify the bad residual family by itself.
Minimal exampleRejecting a SLAM update when loop-closure association changes after retraction.
Failure symptomsNegative ratio, repeated low ratios, radius collapse.
Diagnostic artifactActual and predicted reduction log.
Normal vs abnormal artifactNormal ratio is positive and often near 1 near convergence; abnormal ratio is erratic or negative.
First debugging moveRe-evaluate cost at the trial state and compare to logged prediction.
Do not confuse withFinal convergence tolerance.
Read nextTrust Region and Line Search Globalization.

Line-search step length

FieldExplanation
What it means hereScalar shortening of a candidate direction to satisfy decrease conditions.
Math objectx_new = x boxplus alpha delta.
Effect on the solveKeeps the direction but reduces update magnitude.
What it solvesPrevents full steps that increase objective.
What it does not solveIt does not fix a bad direction from wrong Jacobians.
Minimal exampleBacktracking a planning cost update near a nonsmooth obstacle penalty.
Failure symptomsalpha becomes tiny, many evaluations per iteration, little progress.
Diagnostic artifactStep length, Armijo/Wolfe status, cost along direction.
Normal vs abnormal artifactNormal step length recovers to larger values; abnormal remains tiny.
First debugging movePlot objective along the search direction.
Do not confuse withTrust-region radius.
Read nextTrust Region and Line Search Globalization.

Step acceptance

FieldExplanation
What it means hereCommit decision for a trial state after evaluating actual cost.
Math objectAcceptance predicate using gain ratio or sufficient decrease.
Effect on the solveDetermines whether the state changes this iteration.
What it solvesProtects the committed estimate from bad trial steps.
What it does not solveIt does not certify output quality.
Minimal exampleRejected LM step leaves pose graph state unchanged while damping increases.
Failure symptomsMany rejected steps, stable committed state, changing damping or radius.
Diagnostic artifactAccepted/rejected step log and committed/trial cost.
Normal vs abnormal artifactNormal rejection is occasional; abnormal rejection dominates the solve.
First debugging moveVerify whether logged residuals are from committed or trial states.
Do not confuse withTrial-state evaluation.
Read nextNonlinear Solver Diagnostics Crosswalk.

Convergence criterion

FieldExplanation
What it means hereA stopping rule based on cost, gradient, step, time, or iteration count.
Math objectThreshold on scalar telemetry.
Effect on the solveEnds optimization when progress appears small or budget is exhausted.
What it solvesPrevents endless iterations and defines production budgets.
What it does not solveIt does not prove the objective was right or the solution is safe.
Minimal exampleStop when relative cost decrease is below tolerance.
Failure symptomsfalse convergence, budget stop, bad artifact with small step.
Diagnostic artifactFinal termination reason and all stopping metrics.
Normal vs abnormal artifactNormal termination agrees across cost, gradient, and artifact checks; abnormal termination only satisfies one weak criterion.
First debugging moveRead the exact termination reason, not only final cost.
Do not confuse withStep acceptance.
Read nextNonlinear Least Squares from First Principles.

Solver library choice

FieldExplanation
What it means hereChoosing an implementation ecosystem such as Ceres, GTSAM, g2o, or a custom solver.
Math objectAPI contracts for residuals, local parameterizations, backend options, and telemetry.
Effect on the solveDetermines defaults, supported backends, update conventions, and diagnostic visibility.
What it solvesProvides production implementation and tested solver components.
What it does not solveIt does not replace method selection or objective design.
Minimal exampleCeres LM with local parameterization versus GTSAM factor graph optimization for the same pose graph.
Failure symptomsDifferent results across libraries, missing telemetry, hidden loss or damping defaults.
Diagnostic artifactSide-by-side config and solver summary.
Normal vs abnormal artifactNormal comparison uses equivalent residuals and conventions; abnormal comparison changes multiple layers at once.
First debugging moveFreeze residuals and Jacobians, then compare one iteration across libraries if possible.
Do not confuse withNonlinear method choice.
Read nextFactor Graph Solver Patterns: Ceres, GTSAM, and g2o.

Sources

Public research notes collected from public sources.