Single-step methods like Runge-Kutta compute $y_{n+1}$ using only information at $y_n$ . Multistep methods take a different approach: they reuse function evaluations from several previous steps to build the next approximation. This recycling of past work makes them computationally efficient, since each new step typically requires only one or two new function evaluations rather than the four that a standard RK4 method needs.

The general form of a linear multistep method (LMM) is:

$\sum_{j=0}^{k} \alpha_j y_{n+j} = h \sum_{j=0}^{k} \beta_j f(t_{n+j}, y_{n+j})$

where $k$ is the number of steps, $h$ is the step size, and the coefficients $\alpha_j$ and $\beta_j$ are chosen by matching terms in a Taylor series expansion to achieve a desired order of accuracy.

Two key distinctions:

Explicit methods have $\beta_k = 0$ , so $y_{n+k}$ can be computed directly from known values
Implicit methods have $\beta_k \neq 0$ , so $y_{n+k}$ appears inside $f$ on the right-hand side, requiring you to solve an equation at each step

Backward differentiation formulas (BDF) are a particularly important class of implicit multistep methods. They're the go-to choice for stiff differential equations because of their strong stability properties.

Three properties govern whether a multistep method actually works:

Consistency: the method approximates the true ODE as $h \to 0$
Zero-stability: small perturbations don't cause unbounded error growth
Convergence: the numerical solution approaches the exact solution as $h \to 0$

Types and Characteristics

Explicit methods calculate the next value directly from previous values.

Simpler to implement and generally less expensive per step
May require smaller step sizes to remain stable

Implicit methods require solving an equation (often nonlinear) at each step.

More complex to implement and costlier per step
Generally far more stable, allowing larger step sizes that often more than compensate for the per-step cost

The order of accuracy tells you how fast the error shrinks when you reduce $h$ . A method of order $p$ has a global error proportional to $h^p$ . Higher-order methods give better accuracy for smooth solutions, but lower-order methods can be more robust for problems with discontinuities or rapid transients.

Implementation Considerations

Multistep methods need values at several previous time points before they can start. Since you only have the initial condition $y_0$ at the beginning, you need a starting procedure:

Use a single-step method (typically RK4 or another Runge-Kutta variant of matching order) to compute $y_1, y_2, \ldots, y_{k-1}$
Once you have $k$ values, switch to the multistep method for all remaining steps
The accuracy of these starting values matters: errors here propagate through the entire computation

Step size selection balances accuracy against computational cost. Smaller $h$ increases accuracy but requires more steps. Adaptive step size control automates this trade-off by estimating the local error at each step and adjusting $h$ accordingly: larger steps in smooth regions, smaller steps where the solution changes rapidly.

Error analysis distinguishes two types:

Local truncation error (LTE): the error introduced in a single step, assuming all previous values are exact
Global truncation error: the accumulated error over the entire integration interval

Adams-Bashforth and Adams-Moulton Methods

Adams-Bashforth Methods

These are the most widely used explicit multistep methods. The idea is to interpolate past derivative values $f_n, f_{n-1}, \ldots$ with a polynomial, then integrate that polynomial forward one step.

The general form of a $k$ -step Adams-Bashforth method:

$y_{n+1} = y_n + h \sum_{j=0}^{k-1} \beta_j f(t_{n-j}, y_{n-j})$

The coefficients $\beta_j$ are determined by requiring the interpolating polynomial to pass through the $k$ most recent derivative values.

Common examples (using shorthand $f_n = f(t_n, y_n)$ ):

2-step (order 2): $y_{n+1} = y_n + \frac{h}{2}(3f_n - f_{n-1})$
3-step (order 3): $y_{n+1} = y_n + \frac{h}{12}(23f_n - 16f_{n-1} + 5f_{n-2})$
4-step (order 4): $y_{n+1} = y_n + \frac{h}{24}(55f_n - 59f_{n-1} + 37f_{n-2} - 9f_{n-3})$

A $k$ -step Adams-Bashforth method has order $k$ , with local truncation error proportional to $h^{k+1}$ .

Concept and Derivation, Symmetric Hybrid Linear Multistep Method for General Third Order Differential Equations

Adams-Moulton Methods

These are the implicit counterparts. They include the derivative at the new point $t_{n+1}$ in the interpolation, which buys an extra order of accuracy for the same number of steps.

The general form of a $k$ -step Adams-Moulton method:

$y_{n+1} = y_n + h \sum_{j=-1}^{k-1} \beta_j f(t_{n-j}, y_{n-j})$

where the $j = -1$ term corresponds to $f(t_{n+1}, y_{n+1})$ , making the method implicit.

Common examples:

1-step (order 2, the trapezoidal rule): $y_{n+1} = y_n + \frac{h}{2}(f_{n+1} + f_n)$
2-step (order 3): $y_{n+1} = y_n + \frac{h}{12}(5f_{n+1} + 8f_n - f_{n-1})$
3-step (order 4): $y_{n+1} = y_n + \frac{h}{24}(9f_{n+1} + 19f_n - 5f_{n-1} + f_{n-2})$

Because $y_{n+1}$ appears on both sides (hidden inside $f_{n+1}$ ), you typically solve the implicit equation using an iterative method like Newton's method or fixed-point iteration.

Predictor-Corrector Methods

The standard way to handle the implicit equation efficiently is to pair an Adams-Bashforth predictor with an Adams-Moulton corrector. This avoids a full nonlinear solve while still capturing most of the accuracy benefit of the implicit method.

The procedure for each step:

Predict $y_{n+1}^{(0)}$ using the explicit Adams-Bashforth formula
Evaluate $f_{n+1}^{(0)} = f(t_{n+1}, y_{n+1}^{(0)})$
Correct using the Adams-Moulton formula with $f_{n+1}^{(0)}$ substituted for the implicit term
Optionally repeat steps 2-3 to iterate the corrector for improved accuracy

Example: 4th-order Adams-Bashforth-Moulton pair

Predictor (AB4): $y_{n+1}^{(0)} = y_n + \frac{h}{24}(55f_n - 59f_{n-1} + 37f_{n-2} - 9f_{n-3})$

Corrector (AM3): $y_{n+1} = y_n + \frac{h}{24}(9f_{n+1}^{(0)} + 19f_n - 5f_{n-1} + f_{n-2})$

This PECE (Predict-Evaluate-Correct-Evaluate) approach gives you near-implicit accuracy at roughly the cost of an explicit method: just two function evaluations per step.

Stability and Convergence of Multistep Methods

Stability Analysis

Stability analysis for multistep methods centers on the Dahlquist test equation $y' = \lambda y$ , where $\lambda$ is a complex constant. Applying a linear multistep method to this equation yields a difference equation whose behavior depends on the roots of the characteristic polynomial:

$\rho(r) - h\lambda\,\sigma(r) = 0$

Here $\rho(r) = \sum \alpha_j r^j$ and $\sigma(r) = \sum \beta_j r^j$ encode the method's coefficients.

The root condition determines stability:

All roots of the characteristic polynomial must satisfy $|r| \leq 1$
Any root with $|r| = 1$ must be a simple root (not repeated)
If all roots satisfy $|r| < 1$ strictly, the method is strongly stable

The region of absolute stability is the set of complex values $h\lambda$ for which the root condition holds. You can visualize this by plotting the boundary curve where $|r| = 1$ in the complex $h\lambda$ -plane. Values of $h\lambda$ inside this region produce bounded numerical solutions.

Important stability classifications:

A-stable: the stability region contains the entire left half of the complex plane ( $\text{Re}(h\lambda) < 0$ ). Any step size works for any decaying mode.
L-stable: A-stable, and additionally $|r| \to 0$ as $\text{Re}(h\lambda) \to -\infty$ . This provides extra damping for very stiff components.
Stiff stability: the stability region covers a large portion of the negative real axis, even if it's not fully A-stable.

Convergence Properties

Dahlquist's equivalence theorem is the central result: a linear multistep method converges if and only if it is both consistent and zero-stable. This connects three concepts that might otherwise seem independent.

Dahlquist's second barrier places a hard limit on what's achievable:

An A-stable linear multistep method can be at most second-order accurate.

This is a fundamental constraint. If you want higher-order multistep methods (order 3, 4, etc.), you must give up A-stability. The BDF methods of order 3 through 5 do exactly this: they sacrifice full A-stability but retain stability regions large enough to handle most stiff problems.

The relationship between local and global errors follows a standard pattern:

A method of order $p$ has local truncation error $O(h^{p+1})$
The global error accumulates to $O(h^p)$ over the integration interval

Zero-stability (also called D-stability) is the requirement that the roots of $\rho(r)$ alone (setting $h = 0$ ) satisfy the root condition. This is a property of the method itself, independent of the problem or step size, and it's a necessary condition for convergence.

Practical Considerations

Choosing a method involves navigating several trade-offs:

Higher-order methods have better accuracy per step but often have smaller stability regions. For stiff problems, this can force you to use a step size so small that the accuracy advantage is wasted.
Implicit methods cost more per step (due to the nonlinear solve) but their larger stability regions often allow step sizes that are orders of magnitude bigger than what explicit methods can handle for stiff systems.
Adaptive strategies that vary both the step size and the method order can be very effective. Modern ODE solvers (like MATLAB's ode15s or SciPy's solve_ivp with BDF) do exactly this.

Starting values deserve careful attention. If you use a $k$ -step method of order $p$ , your starting procedure should also be order $p$ accurate. Using a lower-order starter introduces errors that can degrade the accuracy of the entire computation.

Method Selection for Differential Equations

Problem Characteristics

The single most important factor in method selection is stiffness. A problem is stiff when it has widely separated time scales, meaning the ratio of the largest to smallest eigenvalue magnitudes of the Jacobian is very large.

Non-stiff problems are handled well by explicit methods. Adams-Bashforth or explicit Runge-Kutta methods are preferred for their simplicity and low cost per step. A simple harmonic oscillator with moderate frequency is a typical non-stiff example.
Stiff problems generally require implicit methods. BDF methods (orders 1-5) or implicit Runge-Kutta methods are standard choices. Chemical kinetics systems with reaction rates spanning many orders of magnitude are classic stiff problems.

Solution smoothness also matters. Higher-order methods shine when the solution is smooth, but they can perform poorly near discontinuities. Adaptive strategies that reduce the step size (or the method order) near rapid changes are the practical solution.

For oscillatory or periodic problems, standard methods can introduce artificial damping or phase drift over long integrations. Specialized methods like symplectic integrators (Störmer-Verlet, implicit midpoint) preserve the geometric structure of Hamiltonian systems, making them essential for long-time simulations in celestial mechanics or molecular dynamics.

Accuracy and Efficiency Considerations

The accuracy-efficiency trade-off comes down to: do you want fewer expensive steps or many cheap steps?

Implicit methods cost more per step (solving a nonlinear system) but allow much larger $h$ for stiff problems
Explicit methods are cheap per step but may need $h$ so small for stability that the total cost is far higher

Adaptive step size control is standard in modern solvers. The idea is to take a step, estimate the local error (often by comparing two methods of different orders), and accept or reject the step based on a user-specified tolerance. The Runge-Kutta-Fehlberg (RKF45) method, for example, embeds a 4th-order and 5th-order method to get a free error estimate.

Error tolerances come in two flavors:

Absolute tolerance: controls error when the solution is near zero
Relative tolerance: controls error proportional to the solution magnitude

Most solvers use a mixed criterion: $|e_i| \leq \text{atol} + \text{rtol} \cdot |y_i|$ .

Specialized Methods and Considerations

Some problem classes need methods designed to preserve specific mathematical structure:

Hamiltonian systems (N-body problems, molecular dynamics) need symplectic integrators that conserve the symplectic 2-form, preventing artificial energy drift over long times
Highly oscillatory problems (Schrödinger equation, circuit simulations) benefit from exponential integrators that build the oscillatory behavior directly into the numerical scheme
Problems with discontinuities (bouncing ball, switching circuits) require event detection algorithms that locate the discontinuity precisely and restart the integration
Conservation laws (incompressible flow, certain PDE discretizations) may need volume-preserving or mass-preserving methods to maintain physical fidelity

Parallel computing can also influence method choice. Runge-Kutta methods parallelize more naturally across stages, while multistep methods are inherently sequential in time but can exploit parallelism within each step's linear algebra.