Fiveable

🔀Stochastic Processes Unit 5 Review

QR code for Stochastic Processes practice questions

5.1 Definition and properties of Markov chains

5.1 Definition and properties of Markov chains

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🔀Stochastic Processes
Unit & Topic Study Guides

Markov chains are stochastic models that describe systems transitioning between states over time. They're defined by a state space, transition probabilities, and the memoryless property, making them powerful tools for analyzing complex systems with probabilistic behavior.

This topic covers the key components and properties of Markov chains, including state space, transition probabilities, and classifications. It also covers long-term behavior, first passage times, and applications in fields like physics, biology, and computer science.

Definition of Markov chains

A Markov chain is a stochastic process where a system transitions between states over time, and the probability of moving to any future state depends only on the current state. The entire history of how the system arrived at its current state is irrelevant. This single constraint is what makes Markov chains both tractable and widely applicable across physics, biology, economics, and computer science.

The key components of a Markov chain are the state space, transition probabilities, initial state distribution, and the memoryless (Markov) property.

State space in Markov chains

The state space is the set of all possible states the system can occupy at any given time. Each state represents a unique configuration or condition of the system.

  • The state space can be discrete (finite or countably infinite) or continuous (uncountably infinite). In this unit, we focus primarily on discrete state spaces.
  • Example: In a weather model, the state space could be {sunny, cloudy, rainy}. In a queueing model, the state space might be {0, 1, 2, ..., n} representing the number of customers waiting.

Transition probabilities

Transition probabilities describe the likelihood of moving from one state to another in a single time step. The transition probability from state ii to state jj is written as:

P(Xt+1=jXt=i)P(X_{t+1} = j \mid X_t = i)

where XtX_t represents the state at time tt.

These probabilities are organized into a transition matrix PP, where entry PijP_{ij} gives the probability of going from state ii to state jj. Two properties must hold for every row of this matrix:

  • Each entry is non-negative: Pij0P_{ij} \geq 0
  • Each row sums to 1: jPij=1\sum_j P_{ij} = 1 (the system must go somewhere)

Example: For a two-state chain with states {0, 1}:

P=[0.70.30.40.6]P = \begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix}

This means if you're in state 0, there's a 70% chance of staying and a 30% chance of moving to state 1.

Initial state distribution

The initial state distribution, denoted π0\pi_0, is a probability vector specifying the likelihood of starting in each state at time t=0t = 0. Each entry π0(i)\pi_0(i) gives the probability of starting in state ii, and the entries must sum to 1.

Once you have π0\pi_0 and the transition matrix PP, you can compute the state distribution at any future time step tt by:

πt=π0Pt\pi_t = \pi_0 P^t

Example: For a three-state chain, π0=[0.2,0.5,0.3]\pi_0 = [0.2, 0.5, 0.3] means a 20% chance of starting in state 1, 50% in state 2, and 30% in state 3.

Memoryless property of Markov chains

The Markov property (also called the memoryless property) is the defining characteristic of a Markov chain. It states that the future evolution of the process depends only on the current state, not on the sequence of states that preceded it.

Mathematically:

P(Xt+1=jXt=i,Xt1=k,)=P(Xt+1=jXt=i)P(X_{t+1} = j \mid X_t = i, X_{t-1} = k, \ldots) = P(X_{t+1} = j \mid X_t = i)

This is what makes Markov chains computationally manageable. Instead of tracking the full history of the process, you only need to know where the system is right now to determine where it goes next.

Classifications of Markov chains

Markov chains can be classified along several dimensions: the nature of time, whether transition probabilities change, the size of the state space, and structural properties like irreducibility and aperiodicity. These classifications determine which analytical tools apply and what long-term behavior to expect.

Discrete-time vs continuous-time

  • Discrete-time Markov chains (DTMCs) evolve at fixed time steps (e.g., daily, weekly). Transitions are governed by transition probabilities.
  • Continuous-time Markov chains (CTMCs) allow transitions at any point in time. Instead of transition probabilities, they use transition rates that describe how quickly transitions occur.

Example: A DTMC could model daily weather changes, while a CTMC could model the random arrival of customers at a service counter.

Time-homogeneous vs time-inhomogeneous

  • Time-homogeneous chains have transition probabilities that stay constant over time: P(Xt+1=jXt=i)P(X_{t+1} = j \mid X_t = i) does not depend on tt. Most of the classical theory (stationary distributions, ergodicity) applies to this case.
  • Time-inhomogeneous chains have transition probabilities that vary with tt. These arise when the system's dynamics change over time, such as seasonal effects on consumer behavior.

Unless stated otherwise, you can generally assume a Markov chain is time-homogeneous in this course.

Finite vs infinite state space

  • Finite state space chains have a fixed number of states {1, 2, ..., n}. These are the most common in applications and have the most developed theory.
  • Countably infinite state space chains (e.g., a random walk on the integers) require more care, as convergence results that hold for finite chains don't always carry over.

Note: A chain with an infinite state space in this course typically means countably infinite (like the non-negative integers), not uncountably infinite. Uncountable state spaces fall under more general Markov process theory.

Irreducible Markov chains

A Markov chain is irreducible if every state can be reached from every other state in some finite number of steps. Formally, for all states ii and jj, there exists some n1n \geq 1 such that Pijn>0P^n_{ij} > 0.

  • An irreducible chain has no absorbing states and no isolated subsets that trap the process.
  • Irreducibility ensures that the long-term behavior doesn't depend on which state you start in.

Example: A random walk on a connected graph is irreducible because you can get from any node to any other node.

Aperiodic Markov chains

The period of a state ii is the greatest common divisor (gcd) of all return times to that state: d(i)=gcd{n1:Piin>0}d(i) = \gcd\{n \geq 1 : P^n_{ii} > 0\}. A state is aperiodic if d(i)=1d(i) = 1, and a Markov chain is aperiodic if all its states are aperiodic.

  • Periodicity means the chain cycles through states in a predictable pattern, preventing convergence to a single stationary distribution.
  • A simple way to guarantee aperiodicity: if any state has a positive self-loop probability (Pii>0P_{ii} > 0), that state is aperiodic.

Example: A chain alternating deterministically between two states has period 2 (periodic). A fair coin flip model where you can also stay in the same state is aperiodic.

Long-term behavior of Markov chains

The long-term behavior of a Markov chain describes what happens to the state distribution as the number of time steps grows without bound. This is where the classifications above pay off: irreducibility and aperiodicity together determine whether the chain settles into a predictable steady state.

State space in Markov chains, How to draw state diagram for first order Markov chain for 10000bases from 2 chromosomes ...

Stationary distribution

A stationary distribution π\pi is a probability distribution over the states that satisfies:

πP=π\pi P = \pi

with iπi=1\sum_i \pi_i = 1 and πi0\pi_i \geq 0 for all ii. If the chain's state distribution ever equals π\pi, it stays at π\pi forever. That's why it's called "stationary."

Example: For the transition matrix P=[0.70.30.40.6]P = \begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix}, you can solve πP=π\pi P = \pi to get π=[47,37][0.571,0.429]\pi = \left[\frac{4}{7}, \frac{3}{7}\right] \approx [0.571, 0.429].

To find π\pi, set up the system of equations from πP=π\pi P = \pi along with the constraint π1+π2+=1\pi_1 + \pi_2 + \cdots = 1, then solve.

Existence and uniqueness of stationary distribution

Not every Markov chain has a stationary distribution, and some have more than one. The key result:

  • A finite-state, irreducible, aperiodic Markov chain has exactly one stationary distribution.
  • This follows from the Perron-Frobenius theorem, which guarantees that a primitive non-negative matrix (one that is irreducible and aperiodic) has a unique dominant left eigenvector with eigenvalue 1.

If the chain is irreducible but periodic, a unique stationary distribution still exists, but the chain won't converge to it in the usual pointwise sense (it oscillates instead).

Convergence to stationary distribution

For a finite-state, irreducible, aperiodic Markov chain with stationary distribution π\pi:

limnPijn=πjfor all i,j\lim_{n \to \infty} P^n_{ij} = \pi_j \quad \text{for all } i, j

This means no matter where you start, the distribution of states converges to π\pi.

The rate of convergence is governed by the second-largest eigenvalue (in absolute value) of PP. If this eigenvalue is λ2\lambda_2, convergence is roughly geometric at rate λ2|\lambda_2|. A smaller λ2|\lambda_2| means faster convergence.

Ergodicity in Markov chains

A Markov chain is ergodic if it is irreducible, aperiodic, and positive recurrent. For finite-state chains, irreducibility and aperiodicity automatically imply positive recurrence, so any finite-state irreducible aperiodic chain is ergodic.

Ergodicity gives you a powerful result: time averages equal space averages. Specifically, for any function ff of the states:

1Nt=1Nf(Xt)iπif(i)as N\frac{1}{N} \sum_{t=1}^{N} f(X_t) \to \sum_i \pi_i f(i) \quad \text{as } N \to \infty

This means you can estimate the stationary distribution by running the chain for a long time and counting how often each state is visited.

First passage times

First passage times describe how long it takes a Markov chain to reach a particular state for the first time. These quantities capture the transient behavior of the chain, complementing the long-term analysis above.

Definition of first passage time

The first passage time from state ii to state jj, denoted TijT_{ij}, is the number of steps until the chain first reaches state jj, starting from state ii:

Tij=min{n1:Xn=jX0=i}T_{ij} = \min\{n \geq 1 : X_n = j \mid X_0 = i\}

When i=ji = j, this is called the first return time to state ii. A state is recurrent if the chain returns to it with probability 1, and transient if there's a positive probability of never returning.

Example: In a board game modeled as a Markov chain, TijT_{ij} represents how many turns it takes to reach the winning square for the first time from your current position.

Expected first passage times

The expected first passage time E[Tij]E[T_{ij}] is the average number of steps to reach state jj from state ii. For an ergodic chain, the expected return time to state ii is:

E[Tii]=1πiE[T_{ii}] = \frac{1}{\pi_i}

This connects first passage times directly to the stationary distribution: states visited more frequently in the long run have shorter expected return times.

For chains with absorbing states, expected first passage times can be computed using the fundamental matrix:

  1. Identify the transient states and form the submatrix QQ of PP (transitions among transient states only).

  2. Compute the fundamental matrix N=(IQ)1N = (I - Q)^{-1}.

  3. The entry NijN_{ij} gives the expected number of times the chain visits transient state jj before absorption, starting from transient state ii.

  4. The expected number of steps to absorption from state ii is the sum of row ii of NN.

Absorption probabilities

In chains with absorbing states (states where Pjj=1P_{jj} = 1), absorption probabilities tell you the likelihood of ending up in each absorbing state.

To compute them:

  1. Form QQ (transient-to-transient transitions) and RR (transient-to-absorbing transitions) from the transition matrix.

  2. Compute the fundamental matrix N=(IQ)1N = (I - Q)^{-1}.

  3. The absorption probability matrix is A=NRA = NR, where AijA_{ij} gives the probability of being absorbed in absorbing state jj starting from transient state ii.

Example: In a model of disease progression with stages {healthy, mild, severe, terminal}, if "terminal" is absorbing, the absorption probability from "mild" tells you the likelihood that a patient in the mild stage eventually reaches the terminal stage.

Applications of Markov chains

Markov chains show up across many disciplines because any system with probabilistic state transitions and the memoryless property can be modeled this way. Here are three major application areas.

Modeling with Markov chains

Markov chains provide a compact way to represent systems where the next state depends probabilistically on the current state. You define the states, estimate transition probabilities from data or domain knowledge, and then use the chain for simulation, prediction, or sensitivity analysis.

Example: An e-commerce website can be modeled with states {home, product page, cart, checkout, exit}. Transition probabilities capture how users navigate between pages. The stationary distribution reveals where users spend most of their time, and absorption probabilities (with "checkout" and "exit" as absorbing states) reveal conversion rates.

Markov decision processes

Markov decision processes (MDPs) extend Markov chains by adding actions and rewards. At each state, a decision-maker chooses an action that influences both the transition probabilities and the reward received.

  • The goal is to find a policy (a mapping from states to actions) that maximizes expected cumulative reward over time.
  • MDPs are the mathematical foundation of reinforcement learning and are widely used in robotics, operations research, and resource allocation.

Example: A robot navigating a grid world chooses movement directions (actions) at each cell (state). The transition probabilities account for stochastic movement (e.g., slipping), and rewards guide the robot toward a goal while avoiding obstacles.

Hidden Markov models

Hidden Markov models (HMMs) are used when the underlying Markov chain is not directly observable. Instead, each hidden state emits an observation according to some probability distribution, and you only see the observations.

The three central problems for HMMs are:

  1. Evaluation: Given a model and an observation sequence, what's the probability of that sequence? (solved by the forward algorithm)
  2. Decoding: Given observations, what's the most likely sequence of hidden states? (solved by the Viterbi algorithm)
  3. Learning: Given observations, how do you estimate the model parameters? (solved by the Baum-Welch algorithm, a form of EM)

Example: In speech recognition, hidden states represent phonemes (sound units) and observations are acoustic features. The HMM infers which phonemes were spoken based on the audio signal.