Fiveable

🔀Stochastic Processes Unit 9 Review

QR code for Stochastic Processes practice questions

9.3 Itô integral and Itô's lemma

9.3 Itô integral and Itô's lemma

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🔀Stochastic Processes
Unit & Topic Study Guides

Definition of Itô integral

The Itô integral extends the idea of integration to stochastic processes, specifically allowing you to integrate with respect to Brownian motion. Classical (Riemann or Lebesgue) integrals can't handle this because Brownian motion paths are nowhere differentiable and have infinite variation. The Itô integral solves this problem and provides the foundation for all of stochastic calculus.

Itô integral vs Riemann integral

Three key differences separate the Itô integral from the ordinary Riemann integral:

  • Random integrands and integrators. The Itô integral is built for stochastic processes with random fluctuations, not smooth deterministic functions.
  • Quadratic variation matters. Brownian motion has nonzero quadratic variation ([W](t)=t[W](t) = t), which means the accumulated "roughness" of the path contributes to the integral. Riemann integration never encounters this because smooth functions have zero quadratic variation.
  • Different limiting procedure. The Riemann integral is a limit of sums where you evaluate the integrand at any point in each subinterval. The Itô integral specifically evaluates the integrand at the left endpoint of each subinterval. This left-endpoint (non-anticipating) choice is what makes the integral adapted to the filtration and gives it the martingale property. Choosing the midpoint instead leads to the Stratonovich integral, which has different properties.

Itô integral for simple processes

The construction starts with simple (step) processes, which are piecewise constant and adapted to the filtration. A simple process has the form:

X(t)=i=0n1Xti1(ti,ti+1](t)X(t) = \sum_{i=0}^{n-1} X_{t_i} \mathbf{1}_{(t_i, t_{i+1}]}(t)

where each XtiX_{t_i} is Fti\mathcal{F}_{t_i}-measurable (known at time tit_i). For such a process, the Itô integral with respect to Brownian motion W(t)W(t) is defined as:

0TX(t)dW(t)=i=0n1Xti(W(ti+1)W(ti))\int_0^T X(t)\, dW(t) = \sum_{i=0}^{n-1} X_{t_i} \big(W(t_{i+1}) - W(t_i)\big)

Notice that XtiX_{t_i} is evaluated at the left endpoint of each interval. Because the increments W(ti+1)W(ti)W(t_{i+1}) - W(t_i) are independent of Fti\mathcal{F}_{t_i}, this construction guarantees two things:

  • The integral has zero mean: E ⁣[0TX(t)dW(t)]=0\mathbb{E}\!\left[\int_0^T X(t)\, dW(t)\right] = 0.
  • The integral is a martingale with respect to the Brownian filtration.

Itô isometry

The Itô isometry is the key tool for extending the integral beyond simple processes. For a square-integrable, adapted process X(t)X(t):

E ⁣[(0TX(t)dW(t))2]=E ⁣[0TX(t)2dt]\mathbb{E}\!\left[\left(\int_0^T X(t)\, dW(t)\right)^2\right] = \mathbb{E}\!\left[\int_0^T X(t)^2\, dt\right]

This says the L2L^2-norm of the stochastic integral equals the L2L^2-norm of the integrand computed with ordinary (Lebesgue) integration. The isometry is what lets you control the "size" of the stochastic integral using deterministic-style estimates, and it's the engine behind the extension to general integrands.

Extension to square-integrable processes

With the Itô isometry in hand, you extend the integral to all adapted, square-integrable processes, meaning processes satisfying:

E ⁣[0TX(t)2dt]<\mathbb{E}\!\left[\int_0^T X(t)^2\, dt\right] < \infty

The extension works in three steps:

  1. Approximate. Given a general square-integrable adapted process X(t)X(t), construct a sequence of simple processes Xn(t)X_n(t) that converge to X(t)X(t) in the L2L^2 sense.
  2. Integrate the approximations. Each 0TXn(t)dW(t)\int_0^T X_n(t)\, dW(t) is already well-defined by the simple-process definition.
  3. Take the limit. The Itô isometry guarantees that these integrals form a Cauchy sequence in L2L^2, so they converge to a unique limit. That limit is defined to be 0TX(t)dW(t)\int_0^T X(t)\, dW(t).

The result is independent of which approximating sequence you choose, so the integral is well-defined.

Properties of Itô integral

The Itô integral inherits several structural properties from its construction. These properties are not just theoretical niceties; you'll use them constantly when manipulating SDEs and applying Itô's lemma.

Linearity

For adapted, square-integrable processes X(t)X(t) and Y(t)Y(t) and constants a,ba, b:

0T(aX(t)+bY(t))dW(t)=a0TX(t)dW(t)+b0TY(t)dW(t)\int_0^T \big(aX(t) + bY(t)\big)\, dW(t) = a\int_0^T X(t)\, dW(t) + b\int_0^T Y(t)\, dW(t)

This follows directly from the linearity of the sum in the simple-process definition and carries through the L2L^2 extension.

Continuity

The Itô integral is continuous in its integrand in the L2L^2 sense. If Xn(t)X(t)X_n(t) \to X(t) in L2L^2, meaning:

E ⁣[0T(Xn(t)X(t))2dt]0as n\mathbb{E}\!\left[\int_0^T (X_n(t) - X(t))^2\, dt\right] \to 0 \quad \text{as } n \to \infty

then:

0TXn(t)dW(t)0TX(t)dW(t)in L2\int_0^T X_n(t)\, dW(t) \to \int_0^T X(t)\, dW(t) \quad \text{in } L^2

This is a direct consequence of the Itô isometry. It also turns out that the sample paths t0tX(s)dW(s)t \mapsto \int_0^t X(s)\, dW(s) are almost surely continuous, which is important for the theory of SDEs.

Martingale property

If X(t)X(t) is adapted and square-integrable, then the process:

M(t)=0tX(s)dW(s)M(t) = \int_0^t X(s)\, dW(s)

is a martingale with respect to the Brownian filtration {Ft}\{\mathcal{F}_t\}. Concretely, this means:

  • E[M(t)]<\mathbb{E}[|M(t)|] < \infty for all t0t \geq 0
  • E[M(t)Fs]=M(s)\mathbb{E}[M(t) \mid \mathcal{F}_s] = M(s) for all sts \leq t

The second condition says that the best prediction of the future value of the integral, given all information up to time ss, is just its current value. This is a direct consequence of evaluating the integrand at the left endpoint. The martingale property also implies E[M(t)]=0\mathbb{E}[M(t)] = 0 for all tt, which you'll use repeatedly.

Itô processes

An Itô process combines a deterministic drift with a stochastic diffusion driven by Brownian motion. These processes are the central objects you'll work with in stochastic calculus.

Definition and examples

An Itô process X(t)X(t) satisfies a stochastic differential equation (SDE) of the form:

dX(t)=μ(t,X(t))dt+σ(t,X(t))dW(t)dX(t) = \mu(t, X(t))\, dt + \sigma(t, X(t))\, dW(t)

The term μ(t,X(t))\mu(t, X(t)) is the drift coefficient (the deterministic trend), and σ(t,X(t))\sigma(t, X(t)) is the diffusion coefficient (the intensity of random fluctuations). In integral form, this reads:

X(t)=X(0)+0tμ(s,X(s))ds+0tσ(s,X(s))dW(s)X(t) = X(0) + \int_0^t \mu(s, X(s))\, ds + \int_0^t \sigma(s, X(s))\, dW(s)

where the first integral is an ordinary Lebesgue integral and the second is an Itô integral.

Three important examples:

  • Geometric Brownian motion (GBM): dX(t)=μX(t)dt+σX(t)dW(t)dX(t) = \mu X(t)\, dt + \sigma X(t)\, dW(t). Both drift and diffusion scale with the current value, so the process stays positive. This is the standard model for stock prices in the Black-Scholes framework.
  • Ornstein-Uhlenbeck (OU) process: dX(t)=θ(μX(t))dt+σdW(t)dX(t) = \theta(\mu - X(t))\, dt + \sigma\, dW(t). The drift pulls X(t)X(t) back toward the long-run mean μ\mu at rate θ\theta, making it mean-reverting. Used for interest rates and physical systems with a restoring force.
  • Cox-Ingersoll-Ross (CIR) process: dX(t)=θ(μX(t))dt+σX(t)dW(t)dX(t) = \theta(\mu - X(t))\, dt + \sigma\sqrt{X(t)}\, dW(t). Like OU but with diffusion proportional to X(t)\sqrt{X(t)}, which prevents the process from going negative (under appropriate parameter conditions). Widely used for interest rate modeling.

Quadratic variation of Itô processes

The quadratic variation of an Itô process captures the cumulative "roughness" contributed by the diffusion term. For dX(t)=μdt+σdW(t)dX(t) = \mu\, dt + \sigma\, dW(t):

[X](t)=0tσ(s,X(s))2ds[X](t) = \int_0^t \sigma(s, X(s))^2\, ds

Only the diffusion coefficient contributes. The drift term has zero quadratic variation because it behaves like a smooth function. This is why the extra second-derivative term appears in Itô's lemma: the nonzero quadratic variation of Brownian motion ([W](t)=t[W](t) = t, or informally dWdW=dtdW \cdot dW = dt) generates a correction that doesn't exist in ordinary calculus.

The informal multiplication rules that follow from this are:

  • dWdW=dtdW \cdot dW = dt
  • dWdt=0dW \cdot dt = 0
  • dtdt=0dt \cdot dt = 0

You'll use these constantly when applying Itô's lemma.

Stochastic differential equations

SDEs of the form dX(t)=μ(t,X(t))dt+σ(t,X(t))dW(t)dX(t) = \mu(t, X(t))\, dt + \sigma(t, X(t))\, dW(t) describe the evolution of systems with randomness. Existence and uniqueness of solutions typically require:

  • Lipschitz continuity of μ\mu and σ\sigma in xx (uniformly in tt)
  • Linear growth bounds on μ\mu and σ\sigma

Under these conditions, a unique strong solution exists for any initial condition with finite second moment. Solving SDEs analytically is only possible in special cases (GBM, OU, etc.). In general, you rely on Itô's lemma for transformations or numerical methods like the Euler-Maruyama scheme.

Itô integral vs Riemann integral, Frontiers | d-Dimensional KPZ Equation as a Stochastic Gradient Flow in an Evolving Landscape ...

Itô's lemma

Itô's lemma is the stochastic version of the chain rule. It tells you how to compute the differential of a smooth function applied to an Itô process. The critical difference from ordinary calculus is an extra second-derivative term arising from the quadratic variation of Brownian motion.

Statement of Itô's lemma

Let X(t)X(t) satisfy dX(t)=μ(t,X(t))dt+σ(t,X(t))dW(t)dX(t) = \mu(t, X(t))\, dt + \sigma(t, X(t))\, dW(t), and let f(t,x)f(t, x) be twice continuously differentiable in xx and once in tt (i.e., fC1,2f \in C^{1,2}). Then Y(t)=f(t,X(t))Y(t) = f(t, X(t)) is also an Itô process with:

dY=ftdt+fxdX+122fx2(dX)2dY = \frac{\partial f}{\partial t}\, dt + \frac{\partial f}{\partial x}\, dX + \frac{1}{2}\frac{\partial^2 f}{\partial x^2}\, (dX)^2

Expanding (dX)2(dX)^2 using the multiplication rules (dWdW=dtdW \cdot dW = dt, all other products vanish), this becomes:

dY=(ft+μfx+12σ22fx2)dt+σfxdWdY = \left(\frac{\partial f}{\partial t} + \mu \frac{\partial f}{\partial x} + \frac{1}{2}\sigma^2 \frac{\partial^2 f}{\partial x^2}\right) dt + \sigma \frac{\partial f}{\partial x}\, dW

where all partial derivatives are evaluated at (t,X(t))(t, X(t)).

Comparison with deterministic chain rule

In ordinary calculus, for a smooth function x(t)x(t):

ddtf(t,x(t))=ft+dxdtfx\frac{d}{dt}f(t, x(t)) = \frac{\partial f}{\partial t} + \frac{dx}{dt}\frac{\partial f}{\partial x}

Itô's lemma has the same two terms, plus the correction:

12σ(t,X(t))22fx2(t,X(t))\frac{1}{2}\sigma(t, X(t))^2 \frac{\partial^2 f}{\partial x^2}(t, X(t))

This term exists because Brownian motion has nonzero quadratic variation. In the deterministic case, σ=0\sigma = 0 and the correction vanishes, recovering the ordinary chain rule. The correction is sometimes called the Itô correction and is the single most important thing to remember about stochastic calculus.

Applying Itô's lemma: step-by-step

Here's how to apply Itô's lemma in practice:

  1. Identify the Itô process. Write down dX=μdt+σdWdX = \mu\, dt + \sigma\, dW and read off μ\mu and σ\sigma.
  2. Identify the function. Determine f(t,x)f(t, x) such that your target process is Y(t)=f(t,X(t))Y(t) = f(t, X(t)).
  3. Compute partial derivatives. Calculate ft\frac{\partial f}{\partial t}, fx\frac{\partial f}{\partial x}, and 2fx2\frac{\partial^2 f}{\partial x^2}.
  4. Substitute into the formula. Plug everything into: dY=(ft+μfx+12σ22fx2)dt+σfxdWdY = \left(\frac{\partial f}{\partial t} + \mu \frac{\partial f}{\partial x} + \frac{1}{2}\sigma^2 \frac{\partial^2 f}{\partial x^2}\right) dt + \sigma \frac{\partial f}{\partial x}\, dW
  5. Simplify. Collect terms and identify the new drift and diffusion coefficients of Y(t)Y(t).

Classic example: Solve GBM dS=μSdt+σSdWdS = \mu S\, dt + \sigma S\, dW by applying Itô's lemma to f(x)=lnxf(x) = \ln x. You get f=1/xf' = 1/x, f=1/x2f'' = -1/x^2, so: d(lnS)=(μσ22)dt+σdWd(\ln S) = \left(\mu - \frac{\sigma^2}{2}\right) dt + \sigma\, dW

This shows lnS(t)\ln S(t) is a Brownian motion with drift, giving the explicit solution S(t)=S(0)exp ⁣[(μσ22)t+σW(t)]S(t) = S(0)\exp\!\left[\left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W(t)\right].

Applications of Itô's lemma

  • Black-Scholes equation. Apply Itô's lemma to an option price V(t,S)V(t, S) where SS follows GBM. Combined with a hedging argument, this yields the Black-Scholes PDE.
  • Deriving moment equations. Apply Itô's lemma to f(x)=x2f(x) = x^2 (or higher powers) to derive ODEs for the moments of an SDE solution.
  • Change of variables for SDEs. Transform a complicated SDE into a simpler one. The GBM-to-log transform above is the prototypical example.
  • Physics and engineering. Analyze dynamics of particles subject to thermal noise (Langevin equations), study diffusion processes, and derive Fokker-Planck equations for probability densities.

Stochastic calculus

Beyond the Itô integral and Itô's lemma, stochastic calculus includes several additional tools that round out the theory and enable more advanced applications.

Stochastic integration by parts

For two Itô processes X(t)X(t) and Y(t)Y(t), the product rule takes the form:

d(XY)=XdY+YdX+dXdYd(XY) = X\, dY + Y\, dX + dX \cdot dY

In integral form:

X(t)Y(t)=X(0)Y(0)+0tX(s)dY(s)+0tY(s)dX(s)+[X,Y](t)X(t)Y(t) = X(0)Y(0) + \int_0^t X(s)\, dY(s) + \int_0^t Y(s)\, dX(s) + [X, Y](t)

The quadratic covariation [X,Y](t)[X, Y](t) is defined as:

[X,Y](t)=0tσX(s)σY(s)ds[X, Y](t) = \int_0^t \sigma_X(s)\, \sigma_Y(s)\, ds

where σX\sigma_X and σY\sigma_Y are the diffusion coefficients of XX and YY (assuming they're driven by the same Brownian motion). The extra [X,Y][X, Y] term is the product-rule analogue of the Itô correction in Itô's lemma. In deterministic calculus, d(xy)=xdy+ydxd(xy) = x\, dy + y\, dx; here you get an additional cross-variation term.

Integration with respect to martingales

The Itô integral generalizes beyond Brownian motion to integration with respect to continuous square-integrable martingales. A martingale M(t)M(t) satisfies:

  • E[M(t)]<\mathbb{E}[|M(t)|] < \infty for all tt
  • E[M(t)Fs]=M(s)\mathbb{E}[M(t) \mid \mathcal{F}_s] = M(s) for sts \leq t

The construction mirrors the Brownian case, with the quadratic variation [M](t)[M](t) replacing tt. The isometry becomes:

E ⁣[(0TXdM)2]=E ⁣[0TX2d[M]]\mathbb{E}\!\left[\left(\int_0^T X\, dM\right)^2\right] = \mathbb{E}\!\left[\int_0^T X^2\, d[M]\right]

A central result here is the martingale representation theorem: every square-integrable martingale adapted to the Brownian filtration can be written as a stochastic integral with respect to W(t)W(t). This is the theoretical backbone of hedging arguments in finance.

Girsanov's theorem

Girsanov's theorem lets you change the probability measure so that an Itô process with drift becomes a martingale (or equivalently, a Brownian motion) under the new measure.

Suppose X(t)=W(t)+0tθ(s)dsX(t) = W(t) + \int_0^t \theta(s)\, ds under the original measure P\mathbb{P}, where θ(t)\theta(t) is an adapted process satisfying the Novikov condition:

E ⁣[exp ⁣(120Tθ(s)2ds)]<\mathbb{E}\!\left[\exp\!\left(\frac{1}{2}\int_0^T \theta(s)^2\, ds\right)\right] < \infty

Define the Radon-Nikodým derivative (also called the exponential martingale):

Z(T)=exp ⁣(0Tθ(s)dW(s)120Tθ(s)2ds)Z(T) = \exp\!\left(-\int_0^T \theta(s)\, dW(s) - \frac{1}{2}\int_0^T \theta(s)^2\, ds\right)

Then under the new measure Q\mathbb{Q} defined by dQ=Z(T)dPd\mathbb{Q} = Z(T)\, d\mathbb{P}, the process X(t)X(t) is a standard Brownian motion.

In finance, Girsanov's theorem is how you move from the "real-world" measure P\mathbb{P} to the risk-neutral measure Q\mathbb{Q}, under which discounted asset prices are martingales. This simplifies derivative pricing to computing expected values under Q\mathbb{Q}.

Applications of Itô calculus

Financial mathematics and the Black-Scholes model

The Black-Scholes model assumes the stock price follows GBM:

dS=μSdt+σSdWdS = \mu S\, dt + \sigma S\, dW

To price a European option with payoff V(T,S(T))V(T, S(T)):

  1. Apply Itô's lemma to V(t,S)V(t, S) to get dVdV.
  2. Construct a self-financing portfolio that hedges the option (the "delta hedge").
  3. Since the hedged portfolio is riskless, it must earn the risk-free rate rr, yielding the Black-Scholes PDE: Vt+rSVS+12σ2S22VS2=rV\frac{\partial V}{\partial t} + rS\frac{\partial V}{\partial S} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} = rV
  4. Solve with the appropriate boundary condition to get the Black-Scholes formula.

Alternatively, use Girsanov's theorem to switch to the risk-neutral measure and compute V(t,S)=er(Tt)EQ[V(T,S(T))Ft]V(t, S) = e^{-r(T-t)}\mathbb{E}^{\mathbb{Q}}[V(T, S(T)) \mid \mathcal{F}_t].

Stochastic differential equations in physics

SDEs appear throughout physics whenever thermal or quantum noise is present:

  • Langevin equation: mdv=γvdt+σdWm\, dv = -\gamma v\, dt + \sigma\, dW, describing a particle subject to friction and random thermal kicks. Itô calculus lets you compute velocity distributions and diffusion coefficients.
  • Fokker-Planck equation: Given an SDE for X(t)X(t), Itô's lemma (applied to test functions) yields a PDE for the probability density p(t,x)p(t, x), connecting the stochastic and PDE perspectives.
  • Stochastic Schrödinger equations: Model quantum systems coupled to noisy environments, with Itô calculus providing the rigorous framework for their analysis.

Filtering theory and stochastic control

Filtering is the problem of estimating a hidden state from noisy observations. The observation process is typically modeled as an Itô process, and the optimal filter satisfies a stochastic PDE:

  • For linear Gaussian systems, this reduces to the Kalman-Bucy filter, a set of ODEs for the conditional mean and covariance.
  • For nonlinear systems, the Kushner-Stratonovich equation gives the evolution of the conditional distribution.

Stochastic control seeks optimal decisions in the presence of randomness. The value function satisfies the Hamilton-Jacobi-Bellman (HJB) equation, a nonlinear PDE derived using Itô's lemma and the dynamic programming principle. Applications range from portfolio optimization in finance to robotic path planning and resource management.