Stochastic differential equations (SDEs) extend ordinary differential equations to systems driven by randomness. They combine a deterministic component describing average behavior with a stochastic component capturing random fluctuations, typically driven by a Wiener process (Brownian motion).

The general form of an SDE is:

$dX_t = \mu(X_t, t)\,dt + \sigma(X_t, t)\,dW_t$

$\mu(X_t, t)$ is the drift coefficient, governing the deterministic trend of the process
$\sigma(X_t, t)$ is the diffusion coefficient, controlling the intensity of random fluctuations
$W_t$ is a Wiener process, a continuous-time Gaussian process with independent increments and $W_0 = 0$

The drift term pulls the process in a predictable direction, while the diffusion term adds noise scaled by $\sigma$ . Understanding how these two terms interact is central to everything that follows.

Solutions of stochastic differential equations

Existence of solutions

Existence theorems specify when an SDE actually has a solution. The standard sufficient conditions are:

Lipschitz continuity of the coefficients: there exists a constant $K$ such that $|\mu(x,t) - \mu(y,t)| + |\sigma(x,t) - \sigma(y,t)| \leq K|x - y|$ for all $x, y$
Linear growth condition: $|\mu(x,t)| + |\sigma(x,t)| \leq K(1 + |x|)$

These conditions prevent the coefficients from blowing up or oscillating too wildly. Existence results can be global (solution defined for all $t \geq 0$ ) or local (defined only up to some stopping time).

Uniqueness of solutions

Under the same Lipschitz condition, uniqueness holds in two senses:

Pathwise (strong) uniqueness: any two solutions built on the same Wiener process agree almost surely for all $t$
Uniqueness in distribution (weak uniqueness): any two solutions share the same probability law, even if constructed on different probability spaces

Strong uniqueness implies weak uniqueness, but not the other way around.

Explicit solutions vs numerical methods

Closed-form solutions exist only in special cases. The most important example is the linear SDE with constant coefficients, which leads to geometric Brownian motion. For the vast majority of SDEs, you'll need numerical approximation. The general workflow is:

Discretize the time interval $[0, T]$ into steps of size $\Delta t$
Simulate increments $\Delta W_n \sim \mathcal{N}(0, \Delta t)$ of the Wiener process
Update the approximation step by step using a scheme like Euler-Maruyama or Milstein

Itô integral

Definition of Itô integral

The Itô integral extends classical integration to allow integration with respect to a Wiener process. For an adapted process $f(t, \omega)$ , it's defined as:

$\int_0^T f(t)\,dW_t = \lim_{n \to \infty} \sum_{i=0}^{n-1} f(t_i)\,(W_{t_{i+1}} - W_{t_i})$

The crucial feature: the integrand is evaluated at the left endpoint $t_i$ of each subinterval. This left-endpoint choice is what makes the Itô integral non-anticipating (it only uses information available at the current time).

The Itô integral is a martingale with zero mean: $E\left[\int_0^T f(t)\,dW_t\right] = 0$ .

Properties of Itô integral

Linearity: $\int_0^T [\alpha f + \beta g]\,dW_t = \alpha \int_0^T f\,dW_t + \beta \int_0^T g\,dW_t$
Adaptedness: the integrand must be adapted to the filtration generated by $W_t$ (no peeking into the future)
Continuity: the integral is a continuous function of the upper limit $T$
Quadratic variation: the quadratic variation of $\int_0^t f\,dW_s$ equals $\int_0^t f(s)^2\,ds$ , which is generally nonzero

Note: The Itô integral itself has nonzero quadratic variation. This is a key distinction from ordinary integrals and is precisely what gives rise to the extra term in Itô's lemma.

Itô isometry

The Itô isometry connects the second moment of a stochastic integral to a deterministic integral:

$E\left[\left(\int_0^T f(t)\,dW_t\right)^2\right] = E\left[\int_0^T f(t)^2\,dt\right]$

This identity is indispensable for computing variances and proving convergence results. It essentially lets you move between "stochastic world" and "deterministic world" when working with second moments.

Itô's lemma

Statement of Itô's lemma

Itô's lemma is the stochastic chain rule. If $X_t$ satisfies $dX_t = \mu\,dt + \sigma\,dW_t$ and $f(t, x)$ is twice continuously differentiable, then:

$df(t, X_t) = \left(\frac{\partial f}{\partial t} + \mu \frac{\partial f}{\partial x} + \frac{1}{2}\sigma^2 \frac{\partial^2 f}{\partial x^2}\right)dt + \sigma \frac{\partial f}{\partial x}\,dW_t$

The term $\frac{1}{2}\sigma^2 \frac{\partial^2 f}{\partial x^2}$ has no analogue in ordinary calculus. It arises because the Wiener process has nonzero quadratic variation: $dW_t \cdot dW_t = dt$ . This is the single most important thing to remember about Itô's lemma.

Existence of solutions, differential equations - Picard Theorem globally Lipschitz - Mathematics Stack Exchange

Applications of Itô's lemma

Deriving SDE dynamics for transformed processes: if you know $dX_t$ and want $d(\ln X_t)$ or $d(X_t^2)$ , Itô's lemma gives the answer directly
Financial mathematics: the Black-Scholes PDE is derived by applying Itô's lemma to a portfolio of options and stock
Computing moments and distributions of stochastic processes
Stochastic optimal control: Itô's lemma underpins the Hamilton-Jacobi-Bellman equation

Generalized Itô's lemma

For a function $f(t, X_t^1, \ldots, X_t^n)$ of multiple Itô processes, the generalized formula includes:

Partial derivatives with respect to each process and time
Cross-variation terms $dX_t^i \cdot dX_t^j$ , which account for correlations between driving Wiener processes

This is essential for multi-dimensional SDEs and models with multiple sources of uncertainty (e.g., multi-asset option pricing).

Stochastic exponential

Definition of stochastic exponential

The stochastic exponential (Doléans-Dade exponential) of a semimartingale $M_t$ is the unique solution to:

$d\mathcal{E}(M)_t = \mathcal{E}(M)_t\,dM_t, \quad \mathcal{E}(M)_0 = 1$

For a continuous process $M_t$ , the explicit form is:

$\mathcal{E}(M)_t = \exp\left(M_t - \frac{1}{2}\langle M \rangle_t\right)$

where $\langle M \rangle_t$ is the quadratic variation of $M$ . The subtraction of $\frac{1}{2}\langle M \rangle_t$ is a direct consequence of Itô's lemma applied to the exponential function.

Properties of stochastic exponential

Strictly positive: $\mathcal{E}(M)_t > 0$ for all $t$ , since it's an exponential
Multiplicative: $\mathcal{E}(M) \cdot \mathcal{E}(N) = \mathcal{E}(M + N + \langle M, N \rangle)$
Inverse: $1/\mathcal{E}(M)$ is itself a stochastic exponential
Local martingale: $\mathcal{E}(M)_t$ is always a local martingale; it's a true martingale if Novikov's condition $E\left[\exp\left(\frac{1}{2}\langle M \rangle_T\right)\right] < \infty$ holds
Reduces to the ordinary exponential $e^{at}$ when the stochastic component vanishes

Relationship to geometric Brownian motion

Geometric Brownian motion (GBM) is the most common SDE in finance:

$dS_t = \mu S_t\,dt + \sigma S_t\,dW_t$

Its solution is the stochastic exponential:

$S_t = S_0 \exp\left(\left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W_t\right)$

The $-\frac{\sigma^2}{2}$ correction ensures $E[S_t] = S_0 e^{\mu t}$ , which you can verify using Itô's lemma. GBM stays strictly positive, making it suitable for asset price modeling. The Black-Scholes option pricing model is built on this foundation.

Stratonovich integral

Definition of Stratonovich integral

The Stratonovich integral uses a midpoint rule instead of the left-endpoint rule:

$\int_0^T f(t) \circ dW_t = \lim_{n \to \infty} \sum_{i=0}^{n-1} \frac{f(t_i) + f(t_{i+1})}{2}(W_{t_{i+1}} - W_{t_i})$

The circle notation $\circ\,dW_t$ distinguishes it from the Itô integral. Because of the midpoint evaluation, the Stratonovich integral obeys the ordinary chain rule without the extra second-derivative correction term.

Comparison to Itô integral

Feature	Itô	Stratonovich
Evaluation point	Left endpoint	Midpoint
Chain rule	Modified (extra $\frac{1}{2}\sigma^2 f''$ term)	Ordinary chain rule
Martingale property	Yes (integral is a martingale)	Not in general
Non-anticipating	Yes	No (uses future values in construction)

The Stratonovich integral is not non-anticipating, which makes it less natural for filtering and prediction problems. However, it's often preferred in physics because physical systems modeled as limits of smooth noise naturally yield Stratonovich SDEs.

Stratonovich calculus

To convert between the two frameworks, use the Itô-Stratonovich correction:

$\int_0^T f(X_t) \circ dW_t = \int_0^T f(X_t)\,dW_t + \frac{1}{2}\int_0^T f'(X_t)\sigma(X_t)\,dt$

This means any Stratonovich SDE can be rewritten as an Itô SDE (and vice versa) by adding or subtracting the correction term. The choice between the two depends on your application: Itô is standard in finance and probability theory; Stratonovich is common in physics and engineering.

Existence of solutions, differential equations - Picard theorem for functions which are locally lipschitz - Mathematics ...

Linear stochastic differential equations

Homogeneous linear equations

A homogeneous linear SDE has the form:

$dX_t = a(t)X_t\,dt + b(t)X_t\,dW_t$

The solution is given by the stochastic exponential:

$X_t = X_0 \exp\left(\int_0^t \left(a(s) - \frac{1}{2}b(s)^2\right)ds + \int_0^t b(s)\,dW_s\right)$

This generalizes the deterministic exponential solution $x(t) = x_0 e^{\int a\,ds}$ to the stochastic setting. These equations model phenomena like population dynamics with random growth rates.

Inhomogeneous linear equations

Inhomogeneous linear SDEs add an external forcing term:

$dX_t = [a(t)X_t + f(t)]\,dt + [b(t)X_t + g(t)]\,dW_t$

The solution combines a homogeneous solution with a particular solution found via variation of parameters.

Variation of parameters formula

This is the stochastic analogue of the classical ODE technique:

Solve the corresponding homogeneous equation to get the fundamental solution $\Phi_t$
Write the particular solution as $X_t = \Phi_t \int_0^t \Phi_s^{-1}[f(s)\,ds + g(s)\,dW_s]$
The full solution is the sum of the homogeneous solution (from initial conditions) and this particular integral

The formula reduces the inhomogeneous problem to computing stochastic integrals, which can then be handled analytically or numerically.

Numerical methods for SDEs

Euler-Maruyama method

The simplest and most widely used scheme. Given $dX_t = \mu(X_t)\,dt + \sigma(X_t)\,dW_t$ :

$X_{n+1} = X_n + \mu(X_n)\Delta t + \sigma(X_n)\Delta W_n$

where $\Delta W_n \sim \mathcal{N}(0, \Delta t)$ .

Strong convergence order: 0.5 (error in individual paths scales as $(\Delta t)^{0.5}$ )
Weak convergence order: 1.0 (error in expectations scales as $\Delta t$ )

Strong convergence matters when you care about pathwise accuracy. Weak convergence matters when you only need accurate expectations (e.g., option pricing).

Milstein method

The Milstein method adds a correction term to Euler-Maruyama:

$X_{n+1} = X_n + \mu(X_n)\Delta t + \sigma(X_n)\Delta W_n + \frac{1}{2}\sigma(X_n)\sigma'(X_n)\left[(\Delta W_n)^2 - \Delta t\right]$

This achieves strong convergence order 1.0, a significant improvement. The extra term involves $\sigma'$ , the derivative of the diffusion coefficient. In multiple dimensions, implementing Milstein requires simulating iterated stochastic integrals (Lévy areas), which adds complexity.

Runge-Kutta methods for SDEs

Stochastic Runge-Kutta methods adapt the classical ODE approach by using multiple evaluations of $\mu$ and $\sigma$ within each time step. Higher-order methods (e.g., order 1.5) provide better accuracy but at greater computational cost. They're most useful when high precision is needed and the SDE has smooth coefficients.

Applications of stochastic differential equations

Financial mathematics

The Black-Scholes model uses GBM ( $dS = \mu S\,dt + \sigma S\,dW$ ) to price European options
Stochastic volatility models like the Heston model couple an SDE for the asset price with a separate SDE for the variance: $dv_t = \kappa(\theta - v_t)\,dt + \xi\sqrt{v_t}\,dW_t^v$
Interest rate models (Vasicek, Cox-Ingersoll-Ross) use mean-reverting SDEs to capture the dynamics of short rates

Physics and engineering

The Langevin equation $m\,dv = -\gamma v\,dt + \sigma\,dW_t$ describes Brownian particle motion in a fluid, balancing friction against thermal noise
In control engineering, SDEs model systems with stochastic disturbances and form the basis for robust controller design
Stochastic PDEs (extensions of SDEs to spatial domains) appear in turbulence modeling and quantum field theory

Biology and ecology

Stochastic Lotka-Volterra equations add noise to predator-prey dynamics, capturing environmental variability and demographic stochasticity
Epidemic models (stochastic SIR) use SDEs to account for randomness in disease transmission
In neuroscience, SDEs model the stochastic firing patterns of neurons, where membrane potential fluctuates due to random synaptic inputs