Stochastic integrals extend the concept of integration to stochastic processes, allowing you to integrate one random process with respect to another. In classical calculus, you integrate a function against a smooth, deterministic variable. Stochastic integrals do something analogous, but the integrator is a random process like Brownian motion, which behaves far less predictably.

This machinery is essential for modeling systems driven by noise: asset prices in finance, particle diffusion in physics, noisy signals in engineering. Without stochastic integrals, you can't rigorously write down or solve stochastic differential equations.

Intuition behind stochastic integration

The basic idea mirrors Riemann integration: approximate the integral by summing products of the integrand and small increments of the integrator. You partition the time interval, evaluate the integrand somewhere in each subinterval, multiply by the increment of the integrator (say, Brownian motion), and take a limit as the partition gets finer.

The core difficulty is that Brownian motion has unbounded variation and is nowhere differentiable. You can't treat $dB(t)$ like an ordinary differential. The paths are too rough for classical integration theory to apply, so you need a purpose-built framework to make the limit well-defined.

Formal definition

Stochastic integrals are defined as limits of Riemann-Stieltjes-type sums, where the integrand is a predictable process (adapted and left-continuous) and the integrator is a semimartingale.

The definition depends on the concept of quadratic variation, which measures accumulated squared increments. For Brownian motion $B(t)$ , the quadratic variation over $[0, t]$ is simply $\langle B \rangle_t = t$ .

A critical subtlety: where you evaluate the integrand within each subinterval changes the result. Evaluating at the left endpoint gives the Itô integral; evaluating at the midpoint gives the Stratonovich integral. Unlike in classical calculus, these two choices produce different answers because of the nonzero quadratic variation of the integrator.

Differences vs Riemann-Stieltjes integrals

Integrator type: In Riemann-Stieltjes integrals, the integrator is a deterministic function of bounded variation. In stochastic integrals, it's a stochastic process with unbounded variation (like Brownian motion).
Predictability requirement: Stochastic integrals require the integrand to be predictable, meaning its value at time $t$ depends only on information available just before $t$ . Riemann-Stieltjes integrals have no such constraint.
Quadratic variation matters: The quadratic variation of the integrator directly affects the value of a stochastic integral and appears in key results like Itô's lemma. In the classical setting, quadratic variation is zero for functions of bounded variation, so it never enters the picture.
Evaluation point matters: Choosing different evaluation points within subintervals yields the same limit for Riemann-Stieltjes integrals but different limits for stochastic integrals.

Stochastic integrals generalize Riemann-Stieltjes integrals to handle integrators that are too irregular for the classical theory.

Itô integrals

Itô integrals, introduced by Kiyoshi Itô, are the most widely used stochastic integrals, especially in mathematical finance. They use a non-anticipative (left-endpoint) evaluation rule, which means the integrand at each step only uses information available before the next increment of noise. This makes them natural for modeling situations where decisions are made without knowledge of future randomness.

Definition of Itô integrals

Given an adapted process $X(t)$ satisfying $\mathbb{E}\left[\int_0^t X(s)^2 \, ds\right] < \infty$ and a Brownian motion $B(t)$ , the Itô integral is:

$\int_0^t X(s) \, dB(s) = \lim_{n \to \infty} \sum_{i=1}^n X(t_{i-1}) \left(B(t_i) - B(t_{i-1})\right)$

where $0 = t_0 < t_1 < \cdots < t_n = t$ is a partition with mesh size going to zero. The limit is taken in the $L^2$ (mean-square) sense.

Notice that $X$ is evaluated at the left endpoint $t_{i-1}$ , not at $t_i$ or the midpoint. This is what makes it "non-anticipative." A key consequence: the Itô integral $\int_0^t X(s) \, dB(s)$ is a martingale with respect to the filtration generated by $B(t)$ , provided the integrability condition holds.

Itô processes

An Itô process is a stochastic process that can be written as the sum of a deterministic drift integral and an Itô stochastic integral. Formally, $X(t)$ is an Itô process if:

$dX(t) = \mu(t, X(t)) \, dt + \sigma(t, X(t)) \, dB(t)$

or equivalently in integral form:

$X(t) = X(0) + \int_0^t \mu(s, X(s)) \, ds + \int_0^t \sigma(s, X(s)) \, dB(s)$

Here $\mu$ is the drift coefficient (the deterministic trend) and $\sigma$ is the diffusion coefficient (the intensity of random fluctuations). The classic example is geometric Brownian motion, used to model stock prices, where $\mu$ and $\sigma$ are proportional to $X(t)$ itself.

Itô's lemma

Itô's lemma is the stochastic calculus analog of the chain rule. It tells you how to compute the differential of a smooth function applied to an Itô process, and it's arguably the single most important tool in stochastic calculus.

Let $X(t)$ satisfy $dX(t) = \mu \, dt + \sigma \, dB(t)$ , and let $f(t, x)$ be twice continuously differentiable. Then:

$df(t, X(t)) = \left(\frac{\partial f}{\partial t} + \mu \frac{\partial f}{\partial x} + \frac{1}{2} \sigma^2 \frac{\partial^2 f}{\partial x^2}\right) dt + \sigma \frac{\partial f}{\partial x} \, dB(t)$

The term $\frac{1}{2} \sigma^2 \frac{\partial^2 f}{\partial x^2}$ is the Itô correction term. It has no analog in ordinary calculus and arises precisely because $dB(t)$ has nonzero quadratic variation ( $dB^2 = dt$ ). This correction is what makes Itô calculus different from classical calculus and is the reason the Itô integral of $B(t)$ with respect to $dB(t)$ is $\frac{1}{2}B(t)^2 - \frac{1}{2}t$ rather than $\frac{1}{2}B(t)^2$ .

Itô's lemma is the key to deriving the Black-Scholes PDE for option pricing.

Applications of Itô calculus

Mathematical finance: Modeling asset prices (geometric Brownian motion), deriving option pricing formulas (Black-Scholes), interest rate models, and portfolio optimization.
Physics: Describing Brownian motion of particles, diffusion processes, and Langevin dynamics in statistical mechanics.
Engineering: Filtering and estimation in noisy systems (Kalman filter), stochastic control, and signal processing.

Stratonovich integrals

Stratonovich integrals, introduced by Ruslan Stratonovich, offer an alternative to Itô integrals. They evaluate the integrand at the midpoint of each subinterval, which changes the resulting value of the integral but preserves the classical chain rule.

Definition of Stratonovich integrals

Given a process $X(t)$ and Brownian motion $B(t)$ , the Stratonovich integral is defined as:

$\int_0^t X(s) \circ dB(s) = \lim_{n \to \infty} \sum_{i=1}^n X\!\left(\frac{t_{i-1} + t_i}{2}\right) \left(B(t_i) - B(t_{i-1})\right)$

The $\circ$ notation distinguishes it from the Itô integral. Because the midpoint evaluation "peeks" at the future increment, the Stratonovich integral is not a martingale. This can be a disadvantage in probability-based arguments but is sometimes more natural in physical modeling.

Comparison vs Itô integrals

The two integrals are related by a correction term. When $X(t)$ is itself a function of $B(t)$ :

$\int_0^t X(s) \circ dB(s) = \int_0^t X(s) \, dB(s) + \frac{1}{2} \langle X, B \rangle_t$

where $\langle X, B \rangle_t$ is the quadratic covariation of $X$ and $B$ . For the specific case where $X(s) = g(B(s))$ for some smooth function $g$ , this becomes:

$\int_0^t g(B(s)) \circ dB(s) = \int_0^t g(B(s)) \, dB(s) + \frac{1}{2} \int_0^t g'(B(s)) \, ds$

The correction term is exactly what compensates for the Itô correction in Itô's lemma. You can always convert between the two formulations, so the choice is one of convenience:

Itô is preferred when martingale properties matter (finance, probability theory).
Stratonovich is preferred when you want classical calculus rules to hold (physics, systems derived from smooth approximations of noise).

Stratonovich calculus

Stratonovich calculus preserves the ordinary chain rule, which makes it more intuitive if you're coming from a classical calculus background. However, the computations can become more involved because you lose the martingale property and the clean Itô isometry.

Intuition behind stochastic integration, Stochastic process - Wikipedia

Chain rule for Stratonovich integrals

Let $X(t)$ satisfy the Stratonovich SDE $dX(t) = \mu \, dt + \sigma \circ dB(t)$ , and let $f(t, x)$ be twice continuously differentiable. The chain rule is:

$df(t, X(t)) = \frac{\partial f}{\partial t} \, dt + \frac{\partial f}{\partial x} \left(\mu \, dt + \sigma \circ dB(t)\right)$

$= \left(\frac{\partial f}{\partial t} + \mu \frac{\partial f}{\partial x}\right) dt + \sigma \frac{\partial f}{\partial x} \circ dB(t)$

There is no second-order correction term. This looks exactly like the ordinary chain rule, which is the main appeal of the Stratonovich formulation.

Properties of stochastic integrals

Both Itô and Stratonovich integrals share some fundamental properties, though they differ in others. These properties are the workhorses you'll use repeatedly when manipulating stochastic integrals.

Linearity of integration

Stochastic integrals are linear. For adapted processes $X(t)$ , $Y(t)$ and constants $a$ , $b$ :

$\int_0^t (aX(s) + bY(s)) \, dB(s) = a \int_0^t X(s) \, dB(s) + b \int_0^t Y(s) \, dB(s)$

This holds for both Itô and Stratonovich integrals and lets you break complex integrands into simpler pieces.

Isometry property

The Itô isometry connects the variance of a stochastic integral to a deterministic integral. For an adapted process $X(t)$ :

$\mathbb{E}\left[\left(\int_0^t X(s) \, dB(s)\right)^2\right] = \mathbb{E}\left[\int_0^t X(s)^2 \, ds\right]$

This is extremely useful for computing second moments and proving convergence results. It says that the $L^2$ norm of the stochastic integral equals the $L^2$ norm of the integrand, treating the Itô integral as an "isometry" between function spaces.

Note: this property holds specifically for Itô integrals. Stratonovich integrals don't satisfy an isometry of this form because they aren't martingales.

Martingale property

If $X(t)$ is adapted and satisfies $\mathbb{E}\left[\int_0^t X(s)^2 \, ds\right] < \infty$ , then the Itô integral $M(t) = \int_0^t X(s) \, dB(s)$ is a martingale:

$\mathbb{E}[M(t) \mid \mathcal{F}_s] = M(s) \quad \text{for } s \leq t$

This means the expected future value of the integral, given current information, equals its current value. The martingale property is central to mathematical finance, where it underpins the theory of fair pricing and hedging.

Stratonovich integrals are not martingales in general, which is why Itô integrals are preferred in probabilistic and financial applications.

Quadratic variation

The quadratic variation of Brownian motion is:

$\langle B \rangle_t = t$

This is a deterministic result despite $B(t)$ being random, and it's the fundamental reason stochastic calculus differs from ordinary calculus. For a stochastic integral $M(t) = \int_0^t X(s) \, dB(s)$ , the quadratic variation is:

$\langle M \rangle_t = \int_0^t X(s)^2 \, ds$

This follows directly from the Itô isometry. Quadratic variation appears throughout stochastic calculus: in Itô's lemma, in the conversion between Itô and Stratonovich integrals, and in the definition of the integrals themselves.

Stochastic differential equations (SDEs)

SDEs combine everything above into a framework for modeling systems driven by noise. An SDE specifies how a process evolves through both a deterministic drift and a random diffusion term.

Definition of SDEs

An SDE takes the form:

$dX(t) = \mu(t, X(t)) \, dt + \sigma(t, X(t)) \, dB(t)$

This is shorthand for the integral equation:

$X(t) = X(0) + \int_0^t \mu(s, X(s)) \, ds + \int_0^t \sigma(s, X(s)) \, dB(s)$

$\mu(t, x)$ is the drift coefficient, governing the deterministic trend.
$\sigma(t, x)$ is the diffusion coefficient, governing the intensity of noise.
The stochastic integral can be interpreted in either the Itô or Stratonovich sense, yielding different equations with different solutions.

Strong vs weak solutions

A strong solution is a process $X(t)$ defined on the same probability space as the given Brownian motion $B(t)$ , adapted to its filtration, and satisfying the integral equation pathwise (almost surely). Strong solutions are pathwise unique: if two strong solutions start at the same point, they agree for all time with probability 1.
A weak solution requires only that there exists some probability space carrying both a Brownian motion and a process $X(t)$ satisfying the SDE. The Brownian motion may not be the one you started with. Weak solutions are unique in distribution: any two weak solutions have the same probability law, but their individual paths may differ.

Strong solutions are more constructive and easier to work with numerically. Weak solutions are sufficient when you only care about distributional properties.

Existence and uniqueness of solutions

The standard sufficient conditions for a unique strong solution are:

Lipschitz continuity: There exists a constant $K > 0$ such that for all $t$ , $x$ , $y$ : $|\mu(t, x) - \mu(t, y)| + |\sigma(t, x) - \sigma(t, y)| \leq K|x - y|$
Linear growth: There exists a constant $K > 0$ such that for all $t$ , $x$ : $|\mu(t, x)|^2 + |\sigma(t, x)|^2 \leq K(1 + |x|^2)$

The Lipschitz condition prevents the coefficients from changing too abruptly, ensuring uniqueness. The linear growth condition prevents the solution from exploding to infinity in finite time, ensuring existence. Together, they guarantee a unique strong solution that exists for all time.

Numerical methods for SDEs

When analytical solutions aren't available, you approximate using numerical schemes:

Euler-Maruyama method: The simplest approach. Discretize time into steps of size $\Delta t$ and iterate: $X_{n+1} = X_n + \mu(t_n, X_n) \Delta t + \sigma(t_n, X_n) \Delta B_n$ where $\Delta B_n \sim \mathcal{N}(0, \Delta t)$ . This achieves strong convergence of order $0.5$ .
Milstein method: Adds a correction term from the Itô-Taylor expansion: $X_{n+1} = X_n + \mu \Delta t + \sigma \Delta B_n + \frac{1}{2} \sigma \sigma' \left((\Delta B_n)^2 - \Delta t\right)$ where $\sigma' = \frac{\partial \sigma}{\partial x}$ . This achieves strong convergence of order $1.0$ .
Higher-order methods: Include additional terms from the Itô-Taylor expansion for better accuracy, at the cost of greater complexity.

The Euler-Maruyama method is the go-to starting point. The Milstein method is worth the extra effort when you need better pathwise accuracy.

Applications of stochastic integrals

Mathematical finance

Stochastic integrals are the language of modern quantitative finance. The Black-Scholes model assumes stock prices follow geometric Brownian motion:

$dS(t) = \mu S(t) \, dt + \sigma S(t) \, dB(t)$

Applying Itô's lemma to $\ln S(t)$ yields the Black-Scholes PDE, which gives closed-form option prices. More advanced models like the Heston stochastic volatility model use coupled SDEs to capture the empirical observation that volatility itself is random, with the variance process satisfying its own SDE.