Fundamentals of Newton's Method
Newton's method is an iterative algorithm for finding roots of nonlinear equations. It exploits the derivative of a function to steer each successive approximation toward a root, often achieving quadratic convergence. For a Numerical Analysis II course, this method is foundational because it connects Taylor series theory, convergence analysis, and practical computation in one clean framework.
Concept and Motivation
The core idea: if you have a guess that's near a root of , you can get a better guess by linearizing at and solving the resulting linear equation. The derivative tells you which direction to move and how far, which is why the method converges so quickly when it works.
Newton's method is motivated by the need for fast, accurate root-finding in fields like engineering, physics, and economics, where nonlinear equations arise constantly and closed-form solutions rarely exist.
Derivation from Taylor Series
The derivation starts with the Taylor expansion of about the current iterate :
-
Expand around :
-
Truncate after the first-order term, giving the tangent line approximation:
-
Set this linear approximation to zero and solve for :
-
Rearrange to obtain the iteration formula:
The higher-order terms you dropped in step 2 are what produce the approximation error at each step. Those terms also explain why the method converges quadratically: the error at step is proportional to the square of the error at step .
Geometric Interpretation
Geometrically, each Newton step draws the tangent line to at the point , then takes the x-intercept of that tangent line as the next guess .
- When the function is smooth and the guess is close to the root, the tangent line is a good local model, so the x-intercept lands much closer to the true root.
- The method can fail when the tangent line is nearly horizontal (), because the x-intercept shoots off to a distant point.
- For functions with multiple roots, the tangent line may direct the iteration toward an unintended root depending on the starting point.
Sketching a few iterations by hand is one of the best ways to build intuition for when and why the method succeeds or fails.
Algorithm Implementation
Basic Iterative Formula
The algorithm repeatedly applies:
In pseudocode:
Input: f, f', x_0, tol, max_iter for n = 0, 1, 2, ... , max_iter: x_{n+1} = x_n - f(x_n) / f'(x_n) if stopping criterion met: return x_{n+1} return failure (did not converge)
You need both and as callable functions. The derivative must be available either analytically, via numerical differentiation, or through automatic differentiation.
Stopping Criteria
You need to decide when the iteration is "close enough." Common criteria include:
- Absolute step size:
- Relative step size:
- Residual test:
No single criterion is foolproof. A small step size doesn't guarantee a small residual (think of a nearly flat function), and a small residual doesn't guarantee you're close to the root (think of a steep function). In practice, combine at least two of these. Always include a maximum iteration count to prevent infinite loops if the method diverges.
Initial Guess Selection
The choice of can make or break Newton's method. Strategies include:
- Domain knowledge: Physical constraints or problem context often suggest a reasonable range.
- Graphical analysis: Plot and pick near where the curve crosses zero.
- Bracketing first: Run a few bisection steps to narrow the interval, then switch to Newton's method for fast finishing.
If the function has multiple roots, you may need to run the method with several different initial guesses to find all of them.
Convergence Analysis
Local vs. Global Convergence
Local convergence means the method converges when is already close to a root. Newton's method has excellent local convergence properties under mild smoothness assumptions.
Global convergence asks whether the method converges from any starting point. Newton's method does not guarantee global convergence. For arbitrary , the iterates can oscillate, diverge, or converge to an unintended root. Globalization strategies (line search, trust regions) are modifications that improve the chance of convergence from distant starting points.
Rate of Convergence
The order of convergence is defined by:
where is the error at iteration and is a nonzero constant.
For a simple root (multiplicity 1), Newton's method achieves quadratic convergence (). This means the number of correct digits roughly doubles with each iteration. For example, if you have 3 correct digits at step , you'll have about 6 at step and about 12 at step .
For roots of multiplicity , convergence degrades to linear () unless you use a modified formula (see Limitations section).
Conditions for Convergence
Sufficient conditions for Newton's method to converge to a root :
- is continuously differentiable in a neighborhood of
- (the root is simple)
- is sufficiently close to
- satisfies a Lipschitz condition near , which bounds how fast the derivative changes
The Kantorovich theorem provides a more rigorous, verifiable set of conditions: given bounds on , , and the Lipschitz constant of , it guarantees convergence and gives an error bound without requiring you to already know where the root is.

Variations and Extensions
Secant Method
When computing is expensive or the derivative isn't available in closed form, the secant method replaces the exact derivative with a finite difference approximation:
This requires two initial points ( and ) but only one function evaluation per step (versus one and one for Newton). The convergence order is superlinear at (the golden ratio), which is slower than quadratic but often faster in wall-clock time because each step is cheaper.
Quasi-Newton Methods
In higher dimensions, computing and inverting the full Jacobian (or Hessian for optimization) at every step is expensive. Quasi-Newton methods build and update an approximation to the Jacobian (or its inverse) using information gathered from successive iterates.
- BFGS (Broyden-Fletcher-Goldfarb-Shanno): the most widely used quasi-Newton method for optimization; updates an approximation to the inverse Hessian.
- Broyden's method: a rank-one update to the Jacobian approximation for systems of equations.
These methods trade some convergence speed for a large reduction in per-iteration cost, and they typically achieve superlinear convergence.
Multidimensional Newton's Method
For a system of nonlinear equations , the iteration generalizes to:
where is the Jacobian matrix with entries . In practice, you never explicitly invert . Instead, you solve the linear system:
then update . This is more efficient and numerically stable. The main challenges are the cost of forming (which has entries) and the possibility that becomes singular or nearly singular during iteration.
Practical Considerations
Computational Cost
Each Newton iteration requires:
- One evaluation of
- One evaluation of (or an approximation)
- For multidimensional problems: forming the Jacobian ( partial derivatives) and solving an linear system ( via direct methods)
Compare this to bisection, which needs only one evaluation per step and no derivatives, but converges linearly. Newton's method typically needs far fewer iterations to reach a given tolerance, so the total cost is often lower despite the higher per-iteration expense.
Handling of Derivatives
Three main approaches for obtaining :
- Analytical derivatives: Exact and fast to evaluate once derived, but can be tedious or impractical for complicated functions.
- Numerical differentiation: Approximate for small . Simple to implement, but introduces truncation error (from finite ) and round-off error (from small ). Choosing well matters; is a common heuristic.
- Automatic differentiation (AD): Computes exact derivatives algorithmically by applying the chain rule to the program that evaluates . Gives machine-precision derivatives without manual derivation. This is the preferred approach in modern scientific computing.
Numerical Stability Issues
Several things can go wrong in floating-point arithmetic:
- Division by near-zero: If , the step becomes huge, sending the iterate far away. Detect this and fall back to a safer method.
- Overflow/underflow: Very large or very small intermediate values can exceed floating-point range.
- Cancellation error: When and are nearly equal, computing their difference loses significant digits.
Mitigation techniques include scaling variables so that quantities are of moderate size, using regularization (adding a small term to prevent division by zero), and switching to extended precision for critical calculations.
Applications and Examples
Root-Finding Problems
Newton's method solves across many domains. A concrete example:
Find a root of . Here . Starting from :
After just a few iterations, you converge to the root . Transcendental equations like and implicit curve intersections are handled the same way.
Optimization Techniques
To find a minimum or maximum of , set and apply Newton's method to . The iteration becomes:
This is exactly Newton's method applied to the first-order optimality condition. In higher dimensions, this generalizes to the Newton step using the Hessian matrix, and it's often combined with line search or trust region strategies to ensure the step actually decreases the objective.

Nonlinear Systems of Equations
Systems of coupled nonlinear equations arise in chemical equilibrium calculations, nonlinear circuit analysis, and robotics (inverse kinematics). These require the multidimensional Newton's method with the Jacobian. The main practical challenges are forming the Jacobian efficiently and dealing with the possibility of multiple solutions, which means different initial guesses may lead to different roots.
Limitations and Challenges
Multiple Roots
For a root of multiplicity (where ), standard Newton's method degrades to linear convergence. Two common fixes:
- Modified Newton's method: Use if you know the multiplicity .
- Deflation: After finding one root, divide it out (replace with ) and apply Newton's method again to find remaining roots.
The method can also converge to an unexpected root if the initial guess is closer to a different root's basin of attraction.
Slow Convergence Cases
Convergence slows down when:
- The root has multiplicity greater than 1 (as above)
- is very small near the root, making the tangent line nearly horizontal
- The problem is poorly scaled, so that variables differ by many orders of magnitude
Acceleration techniques like Aitken's method can speed up a linearly converging sequence. Problem reformulation or variable scaling can also help restore fast convergence.
Divergence Scenarios
Newton's method can diverge entirely when:
- The initial guess is too far from any root
- The iterates enter an attracting cycle (e.g., bouncing between two points)
- The function has vertical asymptotes or discontinuities that the iterates cross
Damped Newton's method addresses this by taking a fraction of the full Newton step: where is chosen (often via a line search) to ensure sufficient decrease in . Trust region methods provide another framework for controlling step size.
Newton's Method vs. Alternatives
Fixed-Point Iteration Comparison
Fixed-point iteration rewrites as and iterates . It's simpler (no derivatives needed) but only converges linearly when . Newton's method is itself a fixed-point iteration with , specifically engineered so that at a simple root, which is exactly what produces quadratic convergence.
Bisection Method Comparison
Bisection is the most reliable root-finding method: given an interval where changes sign, it's guaranteed to converge. But it only converges linearly, gaining roughly one binary digit per step. Newton's method is far faster near a root but offers no convergence guarantee without a good starting point.
A common practical strategy is to use bisection to get within a reasonable neighborhood of the root, then switch to Newton's method for rapid finishing. This hybrid approach combines the reliability of bisection with the speed of Newton.
Hybrid Methods
Several hybrid strategies exist:
- Newton-bisection: Maintain a bracket and take a Newton step if it stays inside the bracket; otherwise, take a bisection step. This guarantees convergence while usually achieving Newton-like speed.
- Globally convergent Newton with line search: Accept the Newton direction but choose the step length to ensure decreases sufficiently.
- Trust region Newton: Restrict the step to a region where the linear model is trusted, expanding or shrinking the region based on how well the model predicts actual function behavior.
Advanced Topics
Complex Plane Applications
Newton's method extends naturally to functions of a complex variable . The same formula applies, and it can find complex roots of polynomials or solve systems involving complex-valued functions. The convergence theory carries over, though branch cuts and essential singularities introduce complications not present in the real case.
Fractals and Newton's Method
Applying Newton's method to a complex polynomial like and coloring each starting point by which root the iteration converges to produces a Newton fractal. The boundaries between basins of attraction are fractal curves with infinite detail. These fractals illustrate sensitive dependence on initial conditions: two starting points arbitrarily close together can converge to different roots. This connects Newton's method to chaos theory and dynamical systems.
Parallel Implementations
For large-scale systems, parallelism can accelerate Newton's method at several levels:
- Jacobian assembly: Each row (or block) of the Jacobian can be computed independently.
- Linear solves: The Newton linear system can be solved using parallel direct or iterative solvers.
- Multiple starting points: Independent Newton iterations from different values can run simultaneously to find multiple roots.
The main challenges are load balancing (especially when different starting points converge at different rates) and communication overhead in distributed-memory settings.