A stochastic process is a collection of random variables indexed by time (or sometimes space), used to model systems that evolve unpredictably. These processes give you a mathematical framework for analyzing random phenomena in fields like physics, biology, finance, and engineering.

Random variables over time

A stochastic process $\{X_t\}$ assigns a random variable to each point in time $t$ . Each random variable represents the system's state at that moment, and the set of all possible states is called the state space.

Stock prices over time have a continuous state space (the price can be any positive real number).
The number of customers in a queue has a discrete state space (0, 1, 2, 3, ...).

Probabilistic models

Stochastic processes assign probabilities to different outcomes or trajectories of the system.

Probability distributions describe the likelihood of the system being in a particular state at a given time.
Joint probability distributions capture dependencies between random variables at different time points. For instance, tomorrow's stock price isn't independent of today's.
Transition probabilities specify the likelihood of moving from one state to another.

Dynamical systems with randomness

Stochastic processes incorporate randomness into how a system evolves. That randomness can come from inherent uncertainty, external noise, or unpredictable events. The future state depends on both the current state and random factors.

For continuous-time systems, stochastic differential equations (SDEs) are the standard modeling tool. These generalize ordinary differential equations by adding a noise term, typically driven by Brownian motion.

Classification by state space

The state space is the set of all possible values the random variables can take. Its structure determines which mathematical tools you'll use.

Discrete state space

Here, the random variables take on a countable number of distinct values (often integers).

Number of defective items on a production line
Number of customers waiting in a queue
Discrete-time Markov chains are the most common process type in this category

Continuous state space

The random variables can take any value within a continuous range (typically subsets of $\mathbb{R}$ ).

Stock prices, temperature measurements, particle positions in a fluid
Brownian motion and diffusion processes are classic examples

Finite vs infinite state space

A finite state space has a fixed number of possible states. Markov chains on finite state spaces are easier to analyze and always have well-defined stationary distributions (under mild conditions).
An infinite state space has unboundedly many possible states. Random walks on $\mathbb{Z}$ and Poisson processes are examples. These generally require more advanced techniques, such as generating functions or transform methods.

Classification by time index

Stochastic processes are also classified by whether time is treated as discrete or continuous.

Discrete-time processes

The time index takes integer values: $t = 0, 1, 2, \ldots$ . The system's state is observed at fixed intervals.

Daily stock closing prices, monthly sales figures, annual population counts
Common models: discrete-time Markov chains, autoregressive (AR) models

Continuous-time processes

The time index takes real values: $t \in [0, \infty)$ . The system can change state at any instant.

Particle motion, chemical reaction kinetics, high-frequency financial data
Common models: Poisson processes, Brownian motion, solutions to SDEs

Classification by memory

How much of the process's history influences its future behavior is another key distinction.

Memoryless processes

The future state depends only on the present, not on how the system got there. The probability distribution of the next state is independent of the process's history.

Poisson processes (memoryless inter-arrival times)
Continuous-time Markov chains with exponential holding times

The exponential distribution is the only continuous distribution with the memoryless property: $P(T > t + s \mid T > t) = P(T > s)$ .

Processes with memory

The future state depends on both the current state and some or all past states.

Autoregressive (AR) models use a fixed number of past values to predict the next.
Moving average (MA) models depend on past random shocks.
Hidden Markov models have an underlying Markov structure, but the observed process itself is not Markov.

Markov vs non-Markov

Markov processes satisfy the Markov property: the future depends on the past only through the present state.

$P(X_{t+1} = x \mid X_t, X_{t-1}, \ldots, X_0) = P(X_{t+1} = x \mid X_t)$

This dramatically simplifies analysis because you only need to track the current state, not the full history.

Non-Markov processes have more complex dependence structures. Examples include long-memory processes and fractional Brownian motion, where correlations decay slowly and the entire history matters.

Examples of stochastic processes

Several fundamental stochastic processes serve as building blocks for more complex models.

Random variables over time, Dynamic Stochastic General Equilibrium models made (relatively) easy with R

Random walks

A random walk models an object taking random steps in some space. In the simplest version on $\mathbb{Z}$ , at each time step the position increases or decreases by 1 with equal probability:

$S_n = S_0 + \sum_{i=1}^{n} Z_i, \quad Z_i \in \{-1, +1\}$

Random walks appear in physics (as discrete approximations to Brownian motion), finance (simple models of price changes), and biology (animal foraging). Variations include biased random walks (unequal step probabilities), correlated random walks, and random walks with absorbing or reflecting barriers.

Poisson processes

A Poisson process counts the number of events occurring over time, where events happen independently at a constant average rate $\lambda$ .

The number of events in any interval of length $t$ follows a Poisson distribution: $P(N(t) = k) = \frac{(\lambda t)^k e^{-\lambda t}}{k!}$
Events in disjoint time intervals are independent.
Inter-arrival times are exponentially distributed with mean $1/\lambda$ .

Applications: customer arrivals at a service counter, radioactive decay, website traffic.

Brownian motion

Brownian motion (the Wiener process) $\{W_t\}_{t \geq 0}$ is a continuous-time, continuous-state process with three defining properties:

$W_0 = 0$
Increments are independent: $W_t - W_s$ is independent of $\{W_u : u \leq s\}$ for $s < t$
Increments are normally distributed: $W_t - W_s \sim N(0, t - s)$

Its sample paths are continuous but extremely jagged (nowhere differentiable, almost surely). Brownian motion is central to stochastic calculus and financial modeling. Geometric Brownian motion, where $S_t = S_0 \exp((\mu - \sigma^2/2)t + \sigma W_t)$ , is the process underlying the Black-Scholes option pricing model.

Markov chains

Markov chains are discrete-time processes satisfying the Markov property. The state space can be discrete or continuous (the continuous case is often called a Markov process or Markov kernel).

Transition probabilities $p_{ij} = P(X_{t+1} = j \mid X_t = i)$ govern movement between states and can be organized into a transition matrix.
Applications: weather modeling, PageRank algorithm, MCMC methods in Bayesian statistics, queueing systems.

Stationarity of stochastic processes

Stationarity describes whether the statistical properties of a process stay constant over time. This matters because many analytical tools and theorems only apply to stationary processes.

Strict vs wide-sense stationarity

Strict (strong) stationarity means the entire joint distribution is invariant under time shifts:

$P(X_{t_1}, X_{t_2}, \ldots, X_{t_n}) = P(X_{t_1+\tau}, X_{t_2+\tau}, \ldots, X_{t_n+\tau})$

for any time points $t_1, t_2, \ldots, t_n$ and any shift $\tau$ .

Wide-sense (weak) stationarity is a less demanding condition requiring only:

Constant mean: $E[X_t] = \mu$ for all $t$
Covariance depends only on the lag: $\text{Cov}(X_t, X_{t+\tau}) = R(\tau)$

Strict stationarity implies wide-sense stationarity (provided second moments exist), but the converse is not generally true. A Gaussian process is a notable exception: for Gaussian processes, wide-sense stationarity does imply strict stationarity, because the distribution is fully determined by the mean and covariance.

Stationary increments

A process has stationary increments if the distribution of $X_{t+\tau} - X_t$ depends only on the lag $\tau$ , not on the starting time $t$ . Both Brownian motion and Poisson processes have stationary increments.

A process with stationary increments is not necessarily stationary itself. For example, Brownian motion has stationary increments, but $\text{Var}(W_t) = t$ grows with time, so it's not stationary.

Ergodicity

Ergodicity is a stronger property than stationarity. An ergodic process is one where the time average of a single, sufficiently long realization converges to the ensemble average (the expected value across all possible realizations).

This is practically important: it means you can estimate statistical properties like the mean and variance from just one long observation of the process, rather than needing many independent realizations. Many stationary Markov chains (specifically, irreducible and aperiodic ones) are ergodic.

Sample paths of stochastic processes

A sample path (also called a realization or trajectory) is a single instance of the process over time. Think of it as one possible "story" the random system could tell.

Realizations and trajectories

Each realization represents one possible outcome. Formally, a sample path is a function of time for a fixed outcome $\omega$ : $t \mapsto X(\omega, t)$ . Different realizations can look very different from each other, depending on the underlying probability distribution.

Continuity of sample paths

Continuous sample paths have no jumps. Examples: Brownian motion, Ornstein-Uhlenbeck process.
Discontinuous sample paths exhibit jumps. Examples: Poisson processes (which jump by 1 at each event), compound Poisson processes (which can jump by random amounts).

Whether paths are continuous or not affects which mathematical tools apply. For instance, Itô calculus is built for processes with continuous paths, while jump-diffusion models require extensions.

Differentiability of sample paths

Continuity does not guarantee differentiability. Brownian motion is the classic example: its paths are continuous everywhere but differentiable nowhere (almost surely). This is why you can't write $dW_t/dt$ in the ordinary sense, and it's a key reason stochastic calculus (Itô's lemma) is needed instead of standard calculus.

The Ornstein-Uhlenbeck process, by contrast, has sample paths that are continuously differentiable in a mean-square sense. Differentiability properties determine which calculus rules you can apply.

Filtrations and adapted processes

These concepts formalize the idea of "what information is available at each point in time." They're essential for martingale theory and stochastic calculus.

Information accumulation over time

A filtration $\{\mathcal{F}_t\}_{t \geq 0}$ is an increasing family of $\sigma$ -algebras. Each $\mathcal{F}_t$ represents all the information available up to time $t$ . "Increasing" means information is never lost:

$\mathcal{F}_s \subseteq \mathcal{F}_t \quad \text{for all } s \leq t$

The most common example is the natural filtration, generated by the process itself: $\mathcal{F}_t = \sigma(X_s : s \leq t)$ . This contains exactly the information you'd have from observing the process up to time $t$ .

Adapted vs predictable processes

A process $\{X_t\}$ is adapted to a filtration if $X_t$ is $\mathcal{F}_t$ -measurable for every $t$ . In plain terms, you can determine the value of $X_t$ using only information available at time $t$ . Most processes you encounter (Brownian motion, Poisson processes, Itô processes) are adapted to their natural filtration.
A process is predictable if $X_t$ is measurable with respect to $\mathcal{F}_{t-}$ , the information available strictly before time $t$ . Predictability is a stronger condition and becomes important when defining stochastic integrals.

Martingales

A martingale is an adapted process $\{M_t\}_{t \geq 0}$ satisfying:

$E[M_t \mid \mathcal{F}_s] = M_s \quad \text{for all } s \leq t$

The interpretation: the best forecast of a martingale's future value, given everything you know now, is its current value. There's no systematic drift up or down.

Brownian motion $W_t$ is a martingale.
A compensated Poisson process $N_t - \lambda t$ is a martingale.
In finance, discounted asset prices are martingales under the risk-neutral measure (this is the foundation of no-arbitrage pricing).

Martingales also come in two related flavors: a submartingale has $E[M_t \mid \mathcal{F}_s] \geq M_s$ (tendency to increase), and a supermartingale has $E[M_t \mid \mathcal{F}_s] \leq M_s$ (tendency to decrease).