State spaces and transition probabilities are the two building blocks of any Markov chain. The state space tells you where the system can be, and the transition probabilities tell you how likely it is to move from one place to another. Together, they fully specify the chain's dynamics and set the stage for everything else: long-term behavior, stationary distributions, and convergence analysis.

State space

The state space of a stochastic process is the set of all possible values the process can take. You can think of it as the "menu" of outcomes at any given time step. How you classify the state space matters because it determines which mathematical tools you can use.

Discrete vs continuous

A discrete state space consists of a countable set of distinct values. A continuous state space is uncountable and spans a range of real numbers.

Discrete: Number of customers in a queue $\{0, 1, 2, \ldots\}$
Continuous: Temperature readings from a sensor, taking any value in $[0, 100]$

In this unit on Markov chains, you'll almost always be working with discrete state spaces. Continuous state spaces show up in continuous-state models like diffusion processes.

Countable vs uncountable

A countable state space can be put into one-to-one correspondence with the natural numbers. It can be finite or countably infinite.
An uncountable state space cannot. The classic example is any interval of real numbers.

Examples:

Countable: Number of emails received per day $\{0, 1, 2, \ldots\}$
Uncountable: Time between arrivals in a continuous-time process (any positive real number)

Finite vs infinite

A finite state space has a fixed, limited number of states. Example: outcomes of a single die roll, $\{1, 2, 3, 4, 5, 6\}$ .
An infinite state space has unboundedly many states. Example: total number of customers ever served, $\{0, 1, 2, \ldots\}$ .

Finite state spaces are especially nice because the transition probability matrix has a fixed size and you can do exact linear algebra on it.

Transition probabilities

Transition probabilities quantify how likely the process is to jump from one state to another. They encode all the randomness in a Markov chain.

Definition and notation

The transition probability from state $i$ to state $j$ is written $P_{ij}$ and defined as:

$P_{ij} = P(X_{n+1} = j \mid X_n = i)$

where $X_n$ is the state at time $n$ . Two properties must always hold:

Non-negativity: $0 \leq P_{ij} \leq 1$ for all $i, j$
Row sums equal 1: $\sum_{j} P_{ij} = 1$ for every state $i$

The second condition just says that from any state, the process must go somewhere (including possibly staying put).

One-step transition probabilities

One-step transition probabilities describe movement in a single time step. They're the most basic unit of the chain's behavior.

Example: In a two-state weather model (sunny, rainy), you might have $P_{\text{sunny, rainy}} = 0.2$ , meaning there's a 20% chance tomorrow is rainy given today is sunny.

n-step transition probabilities

The n-step transition probability $P_{ij}^{(n)}$ is the probability of going from state $i$ to state $j$ in exactly $n$ time steps:

$P_{ij}^{(n)} = P(X_{n+m} = j \mid X_m = i)$

You can compute these by raising the one-step transition matrix to the $n$ -th power, or by using the Chapman-Kolmogorov equations (covered below).

Transition probability matrix

The transition probability matrix $P$ collects all one-step probabilities into a single square matrix. The entry in row $i$ , column $j$ is $P_{ij}$ .

For a three-state chain, $P$ looks like:

$P = \begin{pmatrix} P_{11} & P_{12} & P_{13} \\ P_{21} & P_{22} & P_{23} \\ P_{31} & P_{32} & P_{33} \end{pmatrix}$

This matrix is a stochastic matrix: every entry is non-negative and every row sums to 1. The n-step transition matrix is simply $P^n$ .

Properties of transition probabilities

Each row of $P$ is a valid probability distribution.
$P^n$ gives the n-step transition probabilities directly.
As $n \to \infty$ , the rows of $P^n$ may converge to the stationary distribution (under the right conditions).

Chapman-Kolmogorov equations

The Chapman-Kolmogorov equations connect n-step transition probabilities to shorter-step probabilities. They're the key recursive tool for computing multi-step probabilities without brute-force enumeration.

Derivation

The idea comes from the law of total probability combined with the Markov property. To go from state $i$ to state $j$ in $m + n$ steps, the process must pass through some intermediate state $k$ after $m$ steps. Summing over all possible intermediates:

$P_{ij}^{(m+n)} = \sum_{k} P_{ik}^{(m)} \, P_{kj}^{(n)}$

This holds for any non-negative integers $m$ and $n$ , and any states $i$ and $j$ .

Discrete vs continuous, Frontiers | Discrete- vs. Continuous-Time Modeling of Unequally Spaced Experience Sampling ...

Matrix form

In matrix notation, the Chapman-Kolmogorov equations become:

$P^{(m+n)} = P^{(m)} \cdot P^{(n)}$

This is just matrix multiplication. It confirms that computing $P^n$ by repeated matrix multiplication is mathematically justified.

Solving for n-step probabilities

A practical recursive approach: set $m = 1$ and $n = n - 1$ to get

$P_{ij}^{(n)} = \sum_{k} P_{ik} \, P_{kj}^{(n-1)}$

Starting from the known one-step matrix $P$ , you can build up $P^{(2)}, P^{(3)}, \ldots$ iteratively. For large $n$ , matrix diagonalization or eigenvalue methods are more efficient than repeated multiplication.

Classification of states

Classifying states reveals the structural skeleton of a Markov chain. States that "talk to each other" get grouped together, and the relationships between groups determine the chain's long-term behavior.

Communicating states

Two states $i$ and $j$ communicate (written $i \leftrightarrow j$ ) if each is reachable from the other with positive probability in some finite number of steps. Formally, there exist integers $m, n \geq 1$ such that:

$P_{ij}^{(m)} > 0 \quad \text{and} \quad P_{ji}^{(n)} > 0$

Communication is an equivalence relation: it's reflexive (every state communicates with itself), symmetric, and transitive.

Accessibility and communication

State $j$ is accessible from state $i$ if there exists some $n \geq 0$ with $P_{ij}^{(n)} > 0$ . Accessibility is one-directional. Communication requires accessibility in both directions.

Equivalence classes

Because communication is an equivalence relation, it partitions the state space into disjoint communicating classes. Within each class, every state can reach every other state. Between classes, transitions can only go one way (or not at all).

Reducibility and irreducibility

A Markov chain is irreducible if the entire state space forms a single communicating class. Every state can reach every other state.
A Markov chain is reducible if there are two or more communicating classes.

Irreducibility is a crucial assumption for many convergence theorems. If a chain is reducible, you often analyze each closed communicating class separately.

Periodicity of states

Periodicity captures whether a state exhibits cyclic return patterns. It affects whether the chain's distribution settles down to a steady state or keeps oscillating.

Definition and examples

The period of state $i$ is:

$d(i) = \gcd\{n \geq 1 : P_{ii}^{(n)} > 0\}$

If $d(i) = 1$ , the state is aperiodic. If $d(i) > 1$ , the state is periodic with period $d(i)$ .

Aperiodic example: A state with a self-loop ( $P_{ii} > 0$ ) always has period 1, since the process can return in 1 step.
Periodic example: A random walk on a bipartite graph where returns are only possible in an even number of steps has period 2.

Note: A simple random walk on the integers is actually periodic with period 2 (you can only return to the origin in an even number of steps), not aperiodic.

Aperiodic vs periodic states

An aperiodic state has $\gcd = 1$ for its return times, so returns can happen at various step counts without a rigid cycle.
A periodic state with period $d$ can only return to itself at times that are multiples of $d$ .
In an irreducible chain, all states share the same period. So you can speak of the period of the chain itself.

Cyclic behavior

Periodic chains cycle through groups of states in a predictable pattern. The n-step transition probabilities $P_{ii}^{(n)}$ are zero whenever $n$ is not a multiple of the period.

Aperiodicity (together with irreducibility and positive recurrence) is one of the three conditions guaranteeing convergence to a unique stationary distribution.

Recurrence and transience

These properties describe whether the chain keeps revisiting a state or eventually leaves it forever.

Discrete vs continuous, Discrete Random Variables (3 of 5) | Concepts in Statistics

Recurrent vs transient states

State $i$ is recurrent if, starting from $i$ , the probability of eventually returning to $i$ is exactly 1.
State $i$ is transient if there's a positive probability of never returning to $i$ .

Equivalently, a recurrent state is visited infinitely often (with probability 1), while a transient state is visited only finitely many times.

In an irreducible chain, all states are the same type: either all recurrent or all transient.

Positive and null recurrence

Recurrent states split further:

Positive recurrent: The expected return time $E[T_i \mid X_0 = i]$ is finite.
Null recurrent: The state is recurrent, but the expected return time is infinite.

Null recurrence only occurs in chains with countably infinite state spaces. Every recurrent state in a finite irreducible chain is positive recurrent.

Positive recurrence is required for a stationary distribution to exist.

Absorption probabilities

An absorbing state is one that, once entered, is never left ( $P_{ii} = 1$ ). The absorption probability from state $i$ into a set of absorbing states $A$ is the probability of eventually reaching $A$ starting from $i$ .

To compute absorption probabilities:

Identify the absorbing states and the transient states.
Set up equations using first-step analysis: for each transient state $i$ , write $h_i = \sum_j P_{ij} h_j$ , where $h_j = 1$ for $j \in A$ .
Solve the resulting system of linear equations.

These calculations appear in gambler's ruin problems and reliability models.

Expected return times

The mean recurrence time of a positive recurrent state $i$ is $\mu_i = E[T_i \mid X_0 = i]$ , where $T_i$ is the first return time to $i$ .

A key connection to stationary distributions: for an irreducible, positive recurrent chain with stationary distribution $\pi$ ,

$\pi_i = \frac{1}{\mu_i}$

States visited more frequently have larger stationary probabilities and shorter mean return times.

Stationary distributions

A stationary distribution is a probability distribution that, once achieved, persists forever. It describes the chain's long-run behavior.

Definition and existence

A probability vector $\pi = (\pi_1, \pi_2, \ldots)$ is a stationary distribution if:

$\pi P = \pi \quad \text{and} \quad \sum_i \pi_i = 1, \quad \pi_i \geq 0$

The equation $\pi P = \pi$ says that if the chain's state distribution is $\pi$ at time $n$ , it's still $\pi$ at time $n+1$ .

Existence depends on the chain's structure. An irreducible, positive recurrent chain always has a stationary distribution.

Unique vs multiple stationary distributions

An irreducible, positive recurrent chain has exactly one stationary distribution.
A reducible chain can have multiple stationary distributions, because each closed communicating class can have its own, and you can take convex combinations.

Uniqueness matters because it means the long-run behavior doesn't depend on where the chain starts.

Calculating stationary distributions

To find $\pi$ for a finite irreducible chain:

Write out the system $\pi P = \pi$ , which gives one equation per state.
Notice that these equations are linearly dependent (they have a one-dimensional solution space).
Replace one equation with the normalization condition $\sum_i \pi_i = 1$ .
Solve the resulting system.

For larger chains, numerical methods like the power method (repeatedly multiplying an arbitrary distribution by $P$ ) or eigenvalue decomposition are more practical.

Limiting behavior and convergence

For an irreducible, aperiodic, positive recurrent Markov chain, the convergence theorem guarantees:

$\lim_{n \to \infty} P_{ij}^{(n)} = \pi_j$

regardless of the starting state $i$ . The rows of $P^n$ all converge to the same stationary distribution $\pi$ .

If the chain is periodic, this pointwise limit doesn't exist (the probabilities oscillate), but the time-averaged probabilities still converge to $\pi$ .

The mixing time measures how many steps the chain needs to get close to stationarity, and the spectral gap (the difference between the two largest eigenvalue magnitudes of $P$ ) controls the rate of convergence.

Applications and examples

Random walks on graphs

A random walk on a graph places an agent on a node, and at each step the agent moves to a neighbor chosen according to some probability rule (often uniformly at random). The state space is the node set, and the transition probabilities come from the graph structure. Applications include network analysis, community detection, and recommender systems.

Markov chains in biology

Biological applications include modeling DNA sequence evolution (each state is a nucleotide: A, C, G, T), protein folding (states are conformations), and population dynamics. The transition probabilities encode mutation rates, folding kinetics, or birth-death rates, and the stationary distribution often represents an equilibrium the system tends toward.

Queueing systems

In a queueing model, the state is typically the number of customers in the system. Arrivals and departures drive the transitions. For example, in an M/M/1 queue, the arrival rate $\lambda$ and service rate $\mu$ determine the transition probabilities, and the stationary distribution (when $\lambda < \mu$ ) gives the long-run probability of each queue length.

PageRank algorithm

Google's PageRank models a "random surfer" clicking links on the web. The state space is the set of all web pages, and the transition probability from page $i$ to page $j$ is proportional to the number of outgoing links from $i$ that point to $j$ . A damping factor (typically around 0.85) adds a small probability of jumping to any page at random, ensuring the chain is irreducible and aperiodic. The stationary distribution $\pi$ then ranks pages by importance.