Broyden's method solves nonlinear systems of equations without computing the full Jacobian at every step. Instead, it builds and updates an approximation to the Jacobian using information gathered from previous iterations. This makes it far cheaper per iteration than Newton's method, especially when the Jacobian is expensive to evaluate or the system is large.

The core idea: rather than recomputing all partial derivatives each time, you start with some initial Jacobian estimate and then apply a rank-one correction after each step. The result is superlinear convergence (faster than linear, slower than quadratic) with only one function evaluation per iteration.

Quasi-Newton Methods Overview

Quasi-Newton methods form a family of iterative techniques that approximate the Jacobian (or its inverse) rather than computing it exactly. The key properties:

They avoid explicit derivative calculations, using only function values to build successive approximations
They retain superlinear convergence in many practical cases, which is a significant step up from fixed-point iteration
Each iteration updates the current Jacobian approximation using a low-rank correction, keeping the cost per iteration low
Newton's method needs $n^2$ partial derivative evaluations per step for an $n$ -dimensional system; quasi-Newton methods need zero

Connection to the Secant Method

In one dimension, the secant method replaces the derivative $f'(x_k)$ with the finite difference $\frac{f(x_k) - f(x_{k-1})}{x_k - x_{k-1}}$ . Broyden's method generalizes this idea to $n$ dimensions.

The challenge in multiple dimensions is that a single step gives you one piece of information (one function difference and one step vector), but the Jacobian has $n^2$ entries. You can't determine the full Jacobian from one step alone. Broyden's solution: change the Jacobian approximation only in the direction of the most recent step, and leave it unchanged in all other directions. This is the minimum-change principle.

Broyden's Update Formula

The update satisfies the secant equation (also called the quasi-Newton condition):

$B_{k+1} s_k = y_k$

where:

$s_k = x_{k+1} - x_k$ is the step vector
$y_k = f(x_{k+1}) - f(x_k)$ is the change in function values
$B_k$ is the current Jacobian approximation

The rank-one update that satisfies this condition with the smallest change to $B_k$ (in the Frobenius norm) is:

$B_{k+1} = B_k + \frac{(y_k - B_k s_k) s_k^T}{s_k^T s_k}$

Notice that the numerator $(y_k - B_k s_k)$ measures how much the current approximation $B_k$ fails to satisfy the secant equation. If $B_k s_k = y_k$ already, no update occurs.

Algorithm Implementation

Step-by-Step Iteration

Choose an initial guess $x_0$ and an initial Jacobian approximation $B_0$
Solve the linear system $B_k s_k = -f(x_k)$ for the step $s_k$
Update the iterate: $x_{k+1} = x_k + s_k$
Evaluate $f(x_{k+1})$
Compute $y_k = f(x_{k+1}) - f(x_k)$
Update the Jacobian approximation: $B_{k+1} = B_k + \frac{(y_k - B_k s_k) s_k^T}{s_k^T s_k}$
Check stopping criteria; if not met, return to step 2

Choosing the Initial Jacobian

The initial approximation $B_0$ matters a lot for convergence. Common strategies:

Finite differences: Compute the true Jacobian $J(x_0)$ at the starting point using forward or central differences. This costs $n$ extra function evaluations (forward) or $2n$ (central), but you only pay this cost once.
Identity matrix: Set $B_0 = I$ . This is cheap but can lead to slow convergence if the true Jacobian is far from the identity.
Problem-specific knowledge: If you know the structure of the Jacobian (e.g., it's diagonally dominant or banded), use that to build a better $B_0$ .

A good $B_0$ can mean the difference between convergence in 10 iterations and convergence in 100 (or not at all).

Convergence Criteria

You typically check two conditions and stop when either is satisfied:

Residual norm: $\|f(x_k)\| < \epsilon_f$ (the function is close enough to zero)
Step size: $\frac{\|x_{k+1} - x_k\|}{\|x_k\|} < \epsilon_x$ (the iterates have essentially stopped moving)

Always set a maximum iteration count as a safeguard. If neither criterion is met within that limit, the method may not be converging for your problem.

Advantages and Limitations

Computational Efficiency

The main selling point of Broyden's method is cost per iteration:

Only one function evaluation per iteration (Newton's method needs one function evaluation plus one full Jacobian evaluation)
The linear system solve in step 2 costs $O(n^3)$ in general, but if you maintain the inverse approximation directly (see "Good Broyden" below), you can reduce this to $O(n^2)$
For problems where a single function evaluation is expensive (e.g., it requires solving a PDE), this savings is substantial

Memory Requirements

Storing $B_k$ requires $O(n^2)$ memory since it's a dense $n \times n$ matrix. For moderate $n$ this is fine, but for very large systems ( $n$ in the tens of thousands or more), this becomes a bottleneck. Limited-memory variants exist that store only a few vectors instead of the full matrix.

Convergence Rate

Under standard assumptions ( $f$ is continuously differentiable, the Jacobian at the root is nonsingular, $x_0$ is close enough to the solution), Broyden's method converges superlinearly
The convergence rate sits between linear and quadratic. You won't match Newton's quadratic rate, but you'll beat simple fixed-point iteration
If the initial Jacobian approximation is poor or the system is highly nonlinear, convergence can degrade or fail entirely

Variations and Extensions

Good Broyden Method (Broyden's First Method)

This is the standard version described above, which updates $B_k$ directly. The name "good" is somewhat historical and doesn't mean it's always better.

Quasi-Newton methods overview, Convergence properties of the Broyden-like method for mixed linear–nonlinear systems of ...

Inverse Broyden Method (Broyden's Second Method)

Instead of updating $B_k$ and solving a linear system each iteration, you can update the inverse approximation $H_k \approx J^{-1}$ directly:

$H_{k+1} = H_k + \frac{(s_k - H_k y_k) s_k^T H_k}{s_k^T H_k y_k}$

The iteration then replaces the linear solve with a simple matrix-vector multiply: $s_k = -H_k f(x_k)$ . This reduces the per-iteration cost from $O(n^3)$ to $O(n^2)$ , which is a significant gain for larger systems. The trade-off is that maintaining the inverse can be less numerically stable.

Terminology note: The literature sometimes calls the direct update "Broyden's first method" or "good Broyden," and the inverse update "Broyden's second method" or "bad Broyden." The names are misleading; neither is universally better.

Symmetric Broyden Updates

When the Jacobian is known to be symmetric (as in optimization, where the Hessian of a scalar objective is symmetric), you can enforce symmetry in the update. The symmetric rank-one (SR1) update preserves symmetry and satisfies the secant equation:

$B_{k+1} = B_k + \frac{(y_k - B_k s_k)(y_k - B_k s_k)^T}{(y_k - B_k s_k)^T s_k}$

This is useful in unconstrained optimization where you're approximating the Hessian.

Numerical Stability Considerations

Ill-Conditioning

Over many iterations, the Jacobian approximation $B_k$ can drift and become poorly conditioned. When this happens:

The linear system $B_k s_k = -f(x_k)$ produces inaccurate steps
Convergence slows down or the method diverges
Monitoring the condition number of $B_k$ (or at least its diagonal entries) helps you detect this early

A practical fix is to restart the method periodically: recompute the Jacobian from scratch using finite differences, then continue with Broyden updates from the fresh approximation.

Scaling Techniques

If the variables in your system have very different magnitudes (e.g., one variable is on the order of $10^6$ and another is $10^{-3}$ ), the Jacobian approximation will be poorly scaled from the start. Normalize your variables to similar magnitudes before applying Broyden's method, or apply diagonal scaling to $B_k$ .

Regularization

When $B_k$ is near-singular, you can add a small multiple of the identity: solve $(B_k + \lambda I) s_k = -f(x_k)$ instead. This is analogous to Tikhonov regularization and prevents catastrophically large steps. Trust-region methods provide a more principled way to achieve the same effect.

Comparison with Other Methods

Feature	Broyden	Newton	BFGS
Jacobian/Hessian computation	Once (at $B_0$ )	Every iteration	Never (builds from gradients)
Function evaluations per iteration	1	1 + Jacobian cost	1 (gradient)
Convergence rate	Superlinear	Quadratic	Superlinear
Primary use case	Nonlinear systems	Nonlinear systems	Unconstrained optimization
Symmetry assumption	No	No	Yes (positive definite Hessian)

vs. Newton: Use Broyden when the Jacobian is expensive to compute or unavailable analytically. Use Newton when you can cheaply compute exact derivatives and want the fastest convergence.
vs. BFGS: Both are quasi-Newton methods, but BFGS is designed for optimization (minimizing a scalar function) and maintains positive definiteness of the Hessian approximation. Broyden's method is more general and applies to non-symmetric systems of equations.
vs. DFP: The Davidon-Fletcher-Powell method is an older quasi-Newton approach for optimization. BFGS has largely replaced it in practice due to better numerical behavior, but DFP remains historically important.

Applications

Broyden's method is widely used in situations where function evaluations dominate the computational cost:

Chemical engineering: Solving equilibrium equations for multicomponent systems where each function evaluation involves thermodynamic calculations
Circuit simulation: Finding steady-state operating points of nonlinear circuits (SPICE-type simulators)
Nonlinear least squares: Parameter estimation and curve fitting, where it can complement Gauss-Newton approaches
PDE-constrained problems: When the "function evaluation" involves solving a partial differential equation, avoiding repeated Jacobian computation is critical

Implementation Notes

Most scientific computing environments provide Broyden's method out of the box:

Python: scipy.optimize.broyden1 (good Broyden) and scipy.optimize.broyden2 (inverse Broyden) in SciPy; also available through the general scipy.optimize.root interface with method='broyden1'
MATLAB: The fsolve function in the Optimization Toolbox supports quasi-Newton methods including Broyden-type updates
C++: Libraries like Eigen (for matrix operations), the GNU Scientific Library (GSL), and Ceres Solver provide the building blocks or direct implementations

For custom implementations, pay attention to the linear solve in step 2. Using the Sherman-Morrison formula, you can update $B_k^{-1}$ directly from $B_{k-1}^{-1}$ in $O(n^2)$ operations, avoiding the $O(n^3)$ cost of solving a fresh linear system each iteration.

Advanced Topics

Inexact Broyden Methods

In large-scale problems, even solving the linear system $B_k s_k = -f(x_k)$ exactly can be too expensive. Inexact variants allow an approximate solve (e.g., using a few steps of GMRES) and still maintain convergence, provided the solve accuracy improves as you approach the solution.

Limited-Memory Variants

For very high-dimensional problems where storing the full $n \times n$ matrix is impractical, limited-memory Broyden methods store only the last $m$ update vectors (typically $m \ll n$ ) and reconstruct the matrix-vector product implicitly. This mirrors the idea behind L-BFGS in optimization.

Parallel Implementations

Function evaluations can sometimes be parallelized (e.g., evaluating components of $f$ independently). Asynchronous variants of Broyden's method allow updates to proceed before all function evaluations complete, improving throughput on multi-core or distributed systems.