Fiveable

🔢Numerical Analysis II Unit 9 Review

QR code for Numerical Analysis II practice questions

9.2 Newton's method for nonlinear equations

9.2 Newton's method for nonlinear equations

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🔢Numerical Analysis II
Unit & Topic Study Guides

Fundamentals of Newton's Method

Newton's method is an iterative algorithm for finding roots of nonlinear equations. It exploits the derivative of a function to steer each successive approximation toward a root, often achieving quadratic convergence. For a Numerical Analysis II course, this method is foundational because it connects Taylor series theory, convergence analysis, and practical computation in one clean framework.

Concept and Motivation

The core idea: if you have a guess xnx_n that's near a root of f(x)=0f(x) = 0, you can get a better guess by linearizing ff at xnx_n and solving the resulting linear equation. The derivative tells you which direction to move and how far, which is why the method converges so quickly when it works.

Newton's method is motivated by the need for fast, accurate root-finding in fields like engineering, physics, and economics, where nonlinear equations arise constantly and closed-form solutions rarely exist.

Derivation from Taylor Series

The derivation starts with the Taylor expansion of ff about the current iterate xnx_n:

  1. Expand f(x)f(x) around xnx_n: f(x)=f(xn)+f(xn)(xxn)+12f(xn)(xxn)2+f(x) = f(x_n) + f'(x_n)(x - x_n) + \frac{1}{2}f''(x_n)(x - x_n)^2 + \cdots

  2. Truncate after the first-order term, giving the tangent line approximation: f(x)f(xn)+f(xn)(xxn)f(x) \approx f(x_n) + f'(x_n)(x - x_n)

  3. Set this linear approximation to zero and solve for xx: 0=f(xn)+f(xn)(xxn)0 = f(x_n) + f'(x_n)(x - x_n)

  4. Rearrange to obtain the iteration formula: xn+1=xnf(xn)f(xn)x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}

The higher-order terms you dropped in step 2 are what produce the approximation error at each step. Those terms also explain why the method converges quadratically: the error at step n+1n+1 is proportional to the square of the error at step nn.

Geometric Interpretation

Geometrically, each Newton step draws the tangent line to y=f(x)y = f(x) at the point (xn,f(xn))(x_n, f(x_n)), then takes the x-intercept of that tangent line as the next guess xn+1x_{n+1}.

  • When the function is smooth and the guess is close to the root, the tangent line is a good local model, so the x-intercept lands much closer to the true root.
  • The method can fail when the tangent line is nearly horizontal (f(xn)0f'(x_n) \approx 0), because the x-intercept shoots off to a distant point.
  • For functions with multiple roots, the tangent line may direct the iteration toward an unintended root depending on the starting point.

Sketching a few iterations by hand is one of the best ways to build intuition for when and why the method succeeds or fails.

Algorithm Implementation

Basic Iterative Formula

The algorithm repeatedly applies:

xn+1=xnf(xn)f(xn)x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}

In pseudocode:

Input: f, f', x_0, tol, max_iter
for n = 0, 1, 2, ... , max_iter:
    x_{n+1} = x_n - f(x_n) / f'(x_n)
    if stopping criterion met:
        return x_{n+1}
return failure (did not converge)

You need both ff and ff' as callable functions. The derivative must be available either analytically, via numerical differentiation, or through automatic differentiation.

Stopping Criteria

You need to decide when the iteration is "close enough." Common criteria include:

  • Absolute step size: xn+1xn<ϵ|x_{n+1} - x_n| < \epsilon
  • Relative step size: xn+1xnxn+1<ϵ\frac{|x_{n+1} - x_n|}{|x_{n+1}|} < \epsilon
  • Residual test: f(xn+1)<ϵ|f(x_{n+1})| < \epsilon

No single criterion is foolproof. A small step size doesn't guarantee a small residual (think of a nearly flat function), and a small residual doesn't guarantee you're close to the root (think of a steep function). In practice, combine at least two of these. Always include a maximum iteration count to prevent infinite loops if the method diverges.

Initial Guess Selection

The choice of x0x_0 can make or break Newton's method. Strategies include:

  • Domain knowledge: Physical constraints or problem context often suggest a reasonable range.
  • Graphical analysis: Plot f(x)f(x) and pick x0x_0 near where the curve crosses zero.
  • Bracketing first: Run a few bisection steps to narrow the interval, then switch to Newton's method for fast finishing.

If the function has multiple roots, you may need to run the method with several different initial guesses to find all of them.

Convergence Analysis

Local vs. Global Convergence

Local convergence means the method converges when x0x_0 is already close to a root. Newton's method has excellent local convergence properties under mild smoothness assumptions.

Global convergence asks whether the method converges from any starting point. Newton's method does not guarantee global convergence. For arbitrary x0x_0, the iterates can oscillate, diverge, or converge to an unintended root. Globalization strategies (line search, trust regions) are modifications that improve the chance of convergence from distant starting points.

Rate of Convergence

The order of convergence pp is defined by:

limnen+1enp=C\lim_{n \to \infty} \frac{|e_{n+1}|}{|e_n|^p} = C

where en=xnre_n = x_n - r is the error at iteration nn and CC is a nonzero constant.

For a simple root (multiplicity 1), Newton's method achieves quadratic convergence (p=2p = 2). This means the number of correct digits roughly doubles with each iteration. For example, if you have 3 correct digits at step nn, you'll have about 6 at step n+1n+1 and about 12 at step n+2n+2.

For roots of multiplicity m>1m > 1, convergence degrades to linear (p=1p = 1) unless you use a modified formula (see Limitations section).

Conditions for Convergence

Sufficient conditions for Newton's method to converge to a root rr:

  • ff is continuously differentiable in a neighborhood of rr
  • f(r)0f'(r) \neq 0 (the root is simple)
  • x0x_0 is sufficiently close to rr
  • ff' satisfies a Lipschitz condition near rr, which bounds how fast the derivative changes

The Kantorovich theorem provides a more rigorous, verifiable set of conditions: given bounds on f(x0)|f(x_0)|, f(x0)1|f'(x_0)^{-1}|, and the Lipschitz constant of ff', it guarantees convergence and gives an error bound without requiring you to already know where the root is.

Concept and motivation, Newton's method - Wikipedia

Variations and Extensions

Secant Method

When computing f(x)f'(x) is expensive or the derivative isn't available in closed form, the secant method replaces the exact derivative with a finite difference approximation:

xn+1=xnf(xn)(xnxn1)f(xn)f(xn1)x_{n+1} = x_n - \frac{f(x_n)(x_n - x_{n-1})}{f(x_n) - f(x_{n-1})}

This requires two initial points (x0x_0 and x1x_1) but only one function evaluation per step (versus one ff and one ff' for Newton). The convergence order is superlinear at p1.618p \approx 1.618 (the golden ratio), which is slower than quadratic but often faster in wall-clock time because each step is cheaper.

Quasi-Newton Methods

In higher dimensions, computing and inverting the full Jacobian (or Hessian for optimization) at every step is expensive. Quasi-Newton methods build and update an approximation to the Jacobian (or its inverse) using information gathered from successive iterates.

  • BFGS (Broyden-Fletcher-Goldfarb-Shanno): the most widely used quasi-Newton method for optimization; updates an approximation to the inverse Hessian.
  • Broyden's method: a rank-one update to the Jacobian approximation for systems of equations.

These methods trade some convergence speed for a large reduction in per-iteration cost, and they typically achieve superlinear convergence.

Multidimensional Newton's Method

For a system of nn nonlinear equations f(x)=0\mathbf{f}(\mathbf{x}) = \mathbf{0}, the iteration generalizes to:

xn+1=xn[J(xn)]1f(xn)\mathbf{x}_{n+1} = \mathbf{x}_n - [J(\mathbf{x}_n)]^{-1} \mathbf{f}(\mathbf{x}_n)

where J(xn)J(\mathbf{x}_n) is the n×nn \times n Jacobian matrix with entries Jij=fixjJ_{ij} = \frac{\partial f_i}{\partial x_j}. In practice, you never explicitly invert JJ. Instead, you solve the linear system:

J(xn)Δx=f(xn)J(\mathbf{x}_n) \, \Delta \mathbf{x} = -\mathbf{f}(\mathbf{x}_n)

then update xn+1=xn+Δx\mathbf{x}_{n+1} = \mathbf{x}_n + \Delta \mathbf{x}. This is more efficient and numerically stable. The main challenges are the cost of forming JJ (which has n2n^2 entries) and the possibility that JJ becomes singular or nearly singular during iteration.

Practical Considerations

Computational Cost

Each Newton iteration requires:

  • One evaluation of f(x)f(x)
  • One evaluation of f(x)f'(x) (or an approximation)
  • For multidimensional problems: forming the Jacobian (n2n^2 partial derivatives) and solving an n×nn \times n linear system (O(n3)O(n^3) via direct methods)

Compare this to bisection, which needs only one ff evaluation per step and no derivatives, but converges linearly. Newton's method typically needs far fewer iterations to reach a given tolerance, so the total cost is often lower despite the higher per-iteration expense.

Handling of Derivatives

Three main approaches for obtaining ff':

  • Analytical derivatives: Exact and fast to evaluate once derived, but can be tedious or impractical for complicated functions.
  • Numerical differentiation: Approximate f(xn)f(xn+h)f(xn)hf'(x_n) \approx \frac{f(x_n + h) - f(x_n)}{h} for small hh. Simple to implement, but introduces truncation error (from finite hh) and round-off error (from small hh). Choosing hh well matters; hϵmachxnh \approx \sqrt{\epsilon_{\text{mach}}} \cdot |x_n| is a common heuristic.
  • Automatic differentiation (AD): Computes exact derivatives algorithmically by applying the chain rule to the program that evaluates ff. Gives machine-precision derivatives without manual derivation. This is the preferred approach in modern scientific computing.

Numerical Stability Issues

Several things can go wrong in floating-point arithmetic:

  • Division by near-zero: If f(xn)0f'(x_n) \approx 0, the step f(xn)/f(xn)f(x_n)/f'(x_n) becomes huge, sending the iterate far away. Detect this and fall back to a safer method.
  • Overflow/underflow: Very large or very small intermediate values can exceed floating-point range.
  • Cancellation error: When xn+1x_{n+1} and xnx_n are nearly equal, computing their difference loses significant digits.

Mitigation techniques include scaling variables so that quantities are of moderate size, using regularization (adding a small term to prevent division by zero), and switching to extended precision for critical calculations.

Applications and Examples

Root-Finding Problems

Newton's method solves f(x)=0f(x) = 0 across many domains. A concrete example:

Find a root of f(x)=x32x5f(x) = x^3 - 2x - 5. Here f(x)=3x22f'(x) = 3x^2 - 2. Starting from x0=2x_0 = 2:

  • x1=2845122=2110=2.1x_1 = 2 - \frac{8 - 4 - 5}{12 - 2} = 2 - \frac{-1}{10} = 2.1
  • x2=2.1(2.1)32(2.1)53(2.1)222.0946x_2 = 2.1 - \frac{(2.1)^3 - 2(2.1) - 5}{3(2.1)^2 - 2} \approx 2.0946

After just a few iterations, you converge to the root r2.09455r \approx 2.09455. Transcendental equations like ex=cos(x)e^x = \cos(x) and implicit curve intersections are handled the same way.

Optimization Techniques

To find a minimum or maximum of g(x)g(x), set f(x)=g(x)=0f(x) = g'(x) = 0 and apply Newton's method to ff. The iteration becomes:

xn+1=xng(xn)g(xn)x_{n+1} = x_n - \frac{g'(x_n)}{g''(x_n)}

This is exactly Newton's method applied to the first-order optimality condition. In higher dimensions, this generalizes to the Newton step using the Hessian matrix, and it's often combined with line search or trust region strategies to ensure the step actually decreases the objective.

Concept and motivation, Solving System of Nonlinear Equations with the Genetic Algorithm and Newton’s Method - TIB AV-Portal

Nonlinear Systems of Equations

Systems of coupled nonlinear equations arise in chemical equilibrium calculations, nonlinear circuit analysis, and robotics (inverse kinematics). These require the multidimensional Newton's method with the Jacobian. The main practical challenges are forming the Jacobian efficiently and dealing with the possibility of multiple solutions, which means different initial guesses may lead to different roots.

Limitations and Challenges

Multiple Roots

For a root of multiplicity m>1m > 1 (where f(r)=f(r)==f(m1)(r)=0f(r) = f'(r) = \cdots = f^{(m-1)}(r) = 0), standard Newton's method degrades to linear convergence. Two common fixes:

  • Modified Newton's method: Use xn+1=xnmf(xn)f(xn)x_{n+1} = x_n - m \cdot \frac{f(x_n)}{f'(x_n)} if you know the multiplicity mm.
  • Deflation: After finding one root, divide it out (replace f(x)f(x) with f(x)/(xr)f(x)/(x - r)) and apply Newton's method again to find remaining roots.

The method can also converge to an unexpected root if the initial guess is closer to a different root's basin of attraction.

Slow Convergence Cases

Convergence slows down when:

  • The root has multiplicity greater than 1 (as above)
  • ff' is very small near the root, making the tangent line nearly horizontal
  • The problem is poorly scaled, so that variables differ by many orders of magnitude

Acceleration techniques like Aitken's Δ2\Delta^2 method can speed up a linearly converging sequence. Problem reformulation or variable scaling can also help restore fast convergence.

Divergence Scenarios

Newton's method can diverge entirely when:

  • The initial guess is too far from any root
  • The iterates enter an attracting cycle (e.g., bouncing between two points)
  • The function has vertical asymptotes or discontinuities that the iterates cross

Damped Newton's method addresses this by taking a fraction of the full Newton step: xn+1=xnαf(xn)f(xn)x_{n+1} = x_n - \alpha \frac{f(x_n)}{f'(x_n)} where 0<α10 < \alpha \leq 1 is chosen (often via a line search) to ensure sufficient decrease in f|f|. Trust region methods provide another framework for controlling step size.

Newton's Method vs. Alternatives

Fixed-Point Iteration Comparison

Fixed-point iteration rewrites f(x)=0f(x) = 0 as x=g(x)x = g(x) and iterates xn+1=g(xn)x_{n+1} = g(x_n). It's simpler (no derivatives needed) but only converges linearly when g(r)<1|g'(r)| < 1. Newton's method is itself a fixed-point iteration with g(x)=xf(x)/f(x)g(x) = x - f(x)/f'(x), specifically engineered so that g(r)=0g'(r) = 0 at a simple root, which is exactly what produces quadratic convergence.

Bisection Method Comparison

Bisection is the most reliable root-finding method: given an interval [a,b][a, b] where ff changes sign, it's guaranteed to converge. But it only converges linearly, gaining roughly one binary digit per step. Newton's method is far faster near a root but offers no convergence guarantee without a good starting point.

A common practical strategy is to use bisection to get within a reasonable neighborhood of the root, then switch to Newton's method for rapid finishing. This hybrid approach combines the reliability of bisection with the speed of Newton.

Hybrid Methods

Several hybrid strategies exist:

  • Newton-bisection: Maintain a bracket [a,b][a, b] and take a Newton step if it stays inside the bracket; otherwise, take a bisection step. This guarantees convergence while usually achieving Newton-like speed.
  • Globally convergent Newton with line search: Accept the Newton direction but choose the step length to ensure f|f| decreases sufficiently.
  • Trust region Newton: Restrict the step to a region where the linear model is trusted, expanding or shrinking the region based on how well the model predicts actual function behavior.

Advanced Topics

Complex Plane Applications

Newton's method extends naturally to functions of a complex variable zz. The same formula zn+1=znf(zn)/f(zn)z_{n+1} = z_n - f(z_n)/f'(z_n) applies, and it can find complex roots of polynomials or solve systems involving complex-valued functions. The convergence theory carries over, though branch cuts and essential singularities introduce complications not present in the real case.

Fractals and Newton's Method

Applying Newton's method to a complex polynomial like z31=0z^3 - 1 = 0 and coloring each starting point z0z_0 by which root the iteration converges to produces a Newton fractal. The boundaries between basins of attraction are fractal curves with infinite detail. These fractals illustrate sensitive dependence on initial conditions: two starting points arbitrarily close together can converge to different roots. This connects Newton's method to chaos theory and dynamical systems.

Parallel Implementations

For large-scale systems, parallelism can accelerate Newton's method at several levels:

  • Jacobian assembly: Each row (or block) of the Jacobian can be computed independently.
  • Linear solves: The Newton linear system can be solved using parallel direct or iterative solvers.
  • Multiple starting points: Independent Newton iterations from different x0x_0 values can run simultaneously to find multiple roots.

The main challenges are load balancing (especially when different starting points converge at different rates) and communication overhead in distributed-memory settings.