Stochastic processes are mathematical models for random phenomena that evolve over time or space. In signal processing, they provide the foundation for analyzing signals corrupted by noise, designing optimal filters, and building models of communication channels. This topic covers the core definitions and properties, the major types of stochastic processes, how they're transformed and estimated, and some advanced extensions you'll encounter in research and practice.

Stochastic process fundamentals

A stochastic process is a collection of random variables indexed by time (or space), and it's the natural framework for any signal that contains randomness. Before diving in, a quick review of the probabilistic building blocks is useful.

Random variables and probability

A random variable maps the outcome of a random experiment to a numerical value. Probability theory gives us the tools to describe how those values are distributed:

Continuous random variables are characterized by a probability density function (PDF) $f_X(x)$ and a cumulative distribution function (CDF) $F_X(x) = P(X \le x)$ .
Discrete random variables are described by a probability mass function (PMF) $p_X(x) = P(X = x)$ .

These distributions carry over directly when we move from a single random variable to an entire process.

Stochastic process definition

A stochastic process $\{X(t), t \in T\}$ assigns a random variable to every point in an index set $T$ (usually time). For a fixed $t$ , $X(t)$ is a random variable. For a fixed outcome $\omega$ of the underlying experiment, $X(t, \omega)$ traces out a deterministic function of $t$ .

The state space is the set of all values the random variables can take (continuous for voltage signals, discrete for digital symbol sequences, etc.). The index set $T$ can be continuous ( $t \in \mathbb{R}$ ) or discrete ( $t \in \mathbb{Z}$ ), giving rise to continuous-time or discrete-time processes.

Sample paths and realizations

A sample path (or realization) is what you get when you fix one outcome of the random experiment and observe the resulting function of time. Think of recording a single stretch of thermal noise on an oscilloscope: that trace is one realization.

Different realizations of the same process can look very different from each other. Statistical signal processing works by characterizing the ensemble of all possible realizations through moments, correlation functions, and distributions.

Types of stochastic processes

Choosing the right model depends on which structural properties the data exhibits. The distinctions below show up constantly in system design and analysis.

Stationary vs. non-stationary processes

A stationary process has statistical properties that don't change over time. Its mean, variance, and autocorrelation stay constant regardless of when you observe the process.
A non-stationary process has time-varying statistics. A signal with a drifting mean (trend) or time-dependent variance (heteroscedasticity) is non-stationary.

Many signal processing algorithms assume stationarity because it dramatically simplifies analysis. When that assumption fails, you typically segment the signal into short windows where approximate stationarity holds.

Ergodic processes

An ergodic process is a stationary process where time averages computed from a single, sufficiently long realization converge to the true ensemble averages. This is a powerful property: it means you can estimate $\mu$ , $R(\tau)$ , and $S(f)$ from one observed waveform instead of needing many independent realizations.

Not every stationary process is ergodic. A simple counterexample: let $X(t) = A$ for all $t$ , where $A$ is a random variable drawn once. The process is stationary, but every time average from a single realization just returns $A$ , not $\mathbb{E}[A]$ .

Gaussian processes

A Gaussian process is one where every finite collection of samples $[X(t_1), X(t_2), \ldots, X(t_n)]$ is jointly Gaussian. Two functions completely specify a Gaussian process:

The mean function $\mu(t) = \mathbb{E}[X(t)]$
The covariance function $C(t_1, t_2) = \text{Cov}(X(t_1), X(t_2))$

Gaussian processes appear everywhere in signal processing because the central limit theorem drives many aggregated noise sources toward Gaussian statistics, and because linear operations on Gaussian processes produce Gaussian processes.

Markov processes

A Markov process satisfies the memoryless (Markov) property: the conditional distribution of the future state, given the entire past and present, depends only on the present state.

$P(X(t_{n+1}) | X(t_n), X(t_{n-1}), \ldots, X(t_1)) = P(X(t_{n+1}) | X(t_n))$

This "one-step memory" structure makes Markov processes computationally tractable. They're used to model state transitions in communication channels, hidden Markov models for speech recognition, and many queueing systems.

Poisson processes

A Poisson process models the occurrence of discrete events in continuous time. Its defining properties are:

Events in non-overlapping intervals are independent.
The number of events in an interval of length $\Delta t$ follows a Poisson distribution with parameter $\lambda \Delta t$ , where $\lambda$ is the rate.
Inter-arrival times are exponentially distributed with mean $1/\lambda$ .

Poisson processes model photon arrivals in optical detectors, packet arrivals in network traffic, and failure events in reliability engineering.

Stochastic process properties

The statistical descriptors below are the primary tools for characterizing a process and designing systems that operate on it.

Mean and autocorrelation functions

The mean function gives the expected value at each time instant:

$\mu(t) = \mathbb{E}[X(t)]$

The autocorrelation function captures the second-order temporal structure:

$R_X(t_1, t_2) = \mathbb{E}[X(t_1) X(t_2)]$

For a zero-mean process, the autocorrelation equals the covariance. A rapidly decaying autocorrelation indicates a process whose values become uncorrelated quickly, while a slowly decaying one signals long-range dependence.

Power spectral density

The power spectral density (PSD) describes how the process's power is distributed across frequency. For a WSS process with autocorrelation $R_X(\tau)$ , the PSD is its Fourier transform (the Wiener-Khinchin theorem):

$S_X(f) = \int_{-\infty}^{\infty} R_X(\tau) e^{-j2\pi f\tau} \, d\tau$

The inverse relationship also holds: $R_X(\tau) = \int_{-\infty}^{\infty} S_X(f) e^{j2\pi f\tau} \, df$ . White noise, for instance, has a flat PSD $S(f) = N_0/2$ , meaning equal power at all frequencies.

Wide-sense stationarity

A process is wide-sense stationary (WSS) if two conditions hold:

The mean is constant: $\mu(t) = \mu$ for all $t$ .
The autocorrelation depends only on the time lag: $R_X(t_1, t_2) = R_X(t_2 - t_1) = R_X(\tau)$ .

WSS is the assumption behind most spectral analysis and Wiener filtering. It's weaker than strict-sense stationarity but sufficient for the vast majority of signal processing tasks.

Strict-sense stationarity

A process is strict-sense stationary (SSS) if its joint PDF is invariant to arbitrary time shifts:

$f_{X(t_1), \ldots, X(t_n)}(x_1, \ldots, x_n) = f_{X(t_1+\Delta), \ldots, X(t_n+\Delta)}(x_1, \ldots, x_n)$

for all $n$ , all time instants, and all shifts $\Delta$ . SSS implies WSS (assuming finite second moments), but not the other way around. The one exception: for Gaussian processes, WSS does imply SSS, because the Gaussian distribution is fully determined by its first two moments.

Stochastic process transformations

Signals are routinely passed through systems that modify their statistical properties. Understanding how transformations affect a stochastic process is essential for filter design and system analysis.

Linear transformations

Linear operations on stochastic processes include scaling, shifting, and addition:

$Y(t) = aX(t) + b$ : the mean becomes $a\mu_X(t) + b$ and the autocorrelation scales by $a^2$ .
$Z(t) = X(t) + Y(t)$ : requires knowledge of the cross-correlation between $X$ and $Y$ .

A key property: linear transformations of Gaussian processes remain Gaussian. This is why Gaussian models are so convenient in linear systems theory.

Nonlinear transformations

Nonlinear operations change the distribution of the process in ways that are generally harder to analyze:

$Y(t) = e^{X(t)}$ : if $X(t)$ is Gaussian, $Y(t)$ is log-normal.
$Z(t) = \max(X(t), c)$ : clipping, common in amplifier saturation.

After a nonlinear transformation, the output is typically no longer Gaussian even if the input was. You often need to compute the output PDF directly or resort to simulation.

Filtering of stochastic processes

Passing a WSS process $X(t)$ through an LTI system with impulse response $h(t)$ produces an output:

$Y(t) = \int_{-\infty}^{\infty} h(\tau) X(t - \tau) \, d\tau$

The output is also WSS, and its PSD relates to the input PSD by:

$S_Y(f) = |H(f)|^2 S_X(f)$

where $H(f)$ is the system's frequency response. This relationship is the basis for Wiener filter design, noise shaping, and matched filtering.

Random variables and probability, Lesson 20: High level plotting — Programming Bootcamp documentation

Stochastic process estimation

Estimating unknown parameters or signals from noisy observations is at the heart of statistical signal processing. The three major frameworks differ in what they optimize and what prior information they use.

Minimum mean square error estimation

MMSE estimation minimizes the expected squared error $\mathbb{E}[(X - \hat{X})^2]$ . The solution is the conditional mean:

$\hat{X}_{\text{MMSE}} = \mathbb{E}[X | Y]$

For jointly Gaussian $X$ and $Y$ , the MMSE estimator is linear, which leads directly to the Wiener filter. When the joint distribution is non-Gaussian, the conditional mean can be nonlinear and harder to compute.

Maximum likelihood estimation

MLE treats the unknown as a deterministic (non-random) parameter $\theta$ and finds the value that maximizes the likelihood:

$\hat{\theta}_{\text{ML}} = \arg\max_{\theta} \, p(Y | \theta)$

MLE doesn't require a prior distribution. It's consistent (converges to the true value as sample size grows) and asymptotically efficient (achieves the Cramér-Rao lower bound for large samples). In practice, you often maximize the log-likelihood instead, since it's more numerically stable.

Bayesian estimation

Bayesian estimation models the unknown parameter as a random variable with a prior distribution $p(\theta)$ . Bayes' theorem gives the posterior:

$p(\theta | Y) = \frac{p(Y | \theta) \, p(\theta)}{p(Y)}$

From the posterior, you can derive different point estimates:

MAP estimator: $\hat{\theta}_{\text{MAP}} = \arg\max_{\theta} \, p(\theta | Y)$
MMSE estimator: $\hat{\theta}_{\text{MMSE}} = \mathbb{E}[\theta | Y]$

The Bayesian framework naturally handles uncertainty and allows you to incorporate domain knowledge through the prior. As data increases, the influence of the prior diminishes and Bayesian estimates converge toward MLE.

Stochastic process applications

Signal detection in noise

The classic detection problem asks: is a signal present or not? You observe $Y(t) = s(t) + N(t)$ (signal plus noise) under hypothesis $H_1$ , or $Y(t) = N(t)$ under $H_0$ . The likelihood ratio test compares $\frac{p(Y|H_1)}{p(Y|H_0)}$ to a threshold set by the desired false alarm rate. Stochastic process models for the noise determine the test's structure and performance (probability of detection vs. probability of false alarm).

Channel modeling and characterization

Wireless channels introduce fading, multipath, and additive noise, all of which are modeled as stochastic processes. The Rayleigh and Rician fading models describe the envelope statistics, while the Doppler spectrum characterizes the rate of channel variation. Accurate stochastic channel models are essential for designing equalizers, coding schemes, and MIMO systems.

Queuing theory and network analysis

Poisson arrivals and exponential service times form the basis of the M/M/1 queue and its extensions. Markov chain models describe state transitions in network protocols. These stochastic models let you predict quantities like average delay, queue length, and throughput under varying traffic loads.

Financial modeling and forecasting

Geometric Brownian motion (GBM) underlies the Black-Scholes option pricing model, where the log-returns of a stock price follow a diffusion process. Stochastic volatility models (e.g., Heston model) extend GBM by allowing the variance itself to be a stochastic process. These models are used for derivative pricing, risk management, and portfolio optimization.

Stochastic process simulation

Simulation lets you generate synthetic data, validate analytical results, and study systems too complex for closed-form analysis.

Monte Carlo methods

Monte Carlo simulation generates many independent realizations of a stochastic process by repeatedly sampling from the underlying distributions. You then compute sample statistics (means, variances, probabilities) from the ensemble of realizations. The accuracy improves as $1/\sqrt{N}$ where $N$ is the number of trials, so doubling accuracy requires four times as many samples.

Generating random variables

Simulation starts with a source of uniform random numbers, which are then transformed to the desired distribution:

Inverse transform sampling: compute $X = F^{-1}(U)$ where $U \sim \text{Uniform}(0,1)$ and $F^{-1}$ is the inverse CDF.
Box-Muller transform: converts two independent uniform samples into two independent Gaussian samples.
Acceptance-rejection sampling: generates samples from complex distributions by proposing from a simpler one and accepting with an appropriate probability.

Modern implementations rely on pseudo-random number generators (e.g., Mersenne Twister) that produce deterministic but statistically high-quality sequences.

Simulating stochastic differential equations

A stochastic differential equation (SDE) of the form $dX(t) = a(X,t)\,dt + b(X,t)\,dW(t)$ (where $W(t)$ is a Wiener process) is simulated by discretizing time and stepping forward:

Euler-Maruyama scheme: $X_{n+1} = X_n + a(X_n, t_n)\Delta t + b(X_n, t_n)\sqrt{\Delta t}\,Z_n$ where $Z_n \sim \mathcal{N}(0,1)$ . This has strong order of convergence 0.5.
Milstein scheme: adds a correction term involving $\partial b / \partial x$ , achieving strong order 1.0 and better accuracy for the same step size.

Choosing the step size $\Delta t$ involves a tradeoff between accuracy and computational cost.

Advanced topics in stochastic processes

Martingales and stopping times

A martingale is a process where the best prediction of the future value, given all past and present information, is the current value:

$\mathbb{E}[X(t_{n+1}) | X(t_1), \ldots, X(t_n)] = X(t_n)$

Martingales model fair games and appear in the pricing of financial derivatives (the discounted price of a derivative is a martingale under the risk-neutral measure).

A stopping time $\tau$ is a random time whose occurrence can be determined from information available up to that time (you don't need to look into the future to know it happened). The optional stopping theorem connects martingales and stopping times, placing conditions under which $\mathbb{E}[X(\tau)] = \mathbb{E}[X(0)]$ .

Stochastic calculus and Itô's lemma

Standard calculus rules break down for processes with non-differentiable sample paths (like Brownian motion). Itô's lemma provides the correct chain rule. For a twice-differentiable function $g(X(t), t)$ where $dX = a\,dt + b\,dW$ :

$dg = \left(\frac{\partial g}{\partial t} + a\frac{\partial g}{\partial x} + \frac{1}{2}b^2 \frac{\partial^2 g}{\partial x^2}\right)dt + b\frac{\partial g}{\partial x}\,dW$

The extra $\frac{1}{2}b^2 \frac{\partial^2 g}{\partial x^2}$ term is the Itô correction, arising because $(dW)^2 = dt$ in the mean-square sense. This result is the foundation for deriving the Black-Scholes equation and for analyzing nonlinear transformations of diffusion processes.

Lévy processes and jump processes

Lévy processes generalize Brownian motion by allowing independent, stationary increments that can include jumps. Every Lévy process decomposes (via the Lévy-Khintchine representation) into a drift, a Brownian component, and a pure-jump component.

Compound Poisson process: jumps arrive at a Poisson rate, with random jump sizes drawn from some distribution.
Jump-diffusion models (e.g., Merton's model): combine continuous Brownian diffusion with discrete Poisson jumps.

These processes capture heavy tails and sudden discontinuities that pure diffusion models miss, making them valuable for modeling financial crashes, network anomalies, and impulsive noise.

Fractional Brownian motion

Fractional Brownian motion (fBm) $B_H(t)$ is a Gaussian process parameterized by the Hurst parameter $H \in (0,1)$ :

$H = 0.5$ : standard Brownian motion (independent increments).
$H > 0.5$ : positively correlated increments (long-range dependence, persistent behavior).
$H < 0.5$ : negatively correlated increments (anti-persistent, rougher paths).

fBm is self-similar: $B_H(at)$ has the same distribution as $a^H B_H(t)$ . It's used to model network traffic with long-range dependence, turbulence, and certain financial time series. Because fBm is neither Markov (for $H \neq 0.5$ ) nor a semimartingale, standard Itô calculus doesn't apply directly, and specialized tools like fractional calculus or the Mandelbrot-Van Ness integral representation are needed for analysis and simulation.

2,589 studying →