State space and transition probabilities
State spaces and transition probabilities are the two building blocks of any Markov chain. The state space tells you where the system can be, and the transition probabilities tell you how likely it is to move from one place to another. Together, they fully specify the chain's dynamics and set the stage for everything else: long-term behavior, stationary distributions, and convergence analysis.
State space
The state space of a stochastic process is the set of all possible values the process can take. You can think of it as the "menu" of outcomes at any given time step. How you classify the state space matters because it determines which mathematical tools you can use.
Discrete vs continuous
A discrete state space consists of a countable set of distinct values. A continuous state space is uncountable and spans a range of real numbers.
- Discrete: Number of customers in a queue
- Continuous: Temperature readings from a sensor, taking any value in
In this unit on Markov chains, you'll almost always be working with discrete state spaces. Continuous state spaces show up in continuous-state models like diffusion processes.
Countable vs uncountable
- A countable state space can be put into one-to-one correspondence with the natural numbers. It can be finite or countably infinite.
- An uncountable state space cannot. The classic example is any interval of real numbers.
Examples:
- Countable: Number of emails received per day
- Uncountable: Time between arrivals in a continuous-time process (any positive real number)
Finite vs infinite
- A finite state space has a fixed, limited number of states. Example: outcomes of a single die roll, .
- An infinite state space has unboundedly many states. Example: total number of customers ever served, .
Finite state spaces are especially nice because the transition probability matrix has a fixed size and you can do exact linear algebra on it.
Transition probabilities
Transition probabilities quantify how likely the process is to jump from one state to another. They encode all the randomness in a Markov chain.
Definition and notation
The transition probability from state to state is written and defined as:
where is the state at time . Two properties must always hold:
- Non-negativity: for all
- Row sums equal 1: for every state
The second condition just says that from any state, the process must go somewhere (including possibly staying put).
One-step transition probabilities
One-step transition probabilities describe movement in a single time step. They're the most basic unit of the chain's behavior.
Example: In a two-state weather model (sunny, rainy), you might have , meaning there's a 20% chance tomorrow is rainy given today is sunny.
n-step transition probabilities
The n-step transition probability is the probability of going from state to state in exactly time steps:
You can compute these by raising the one-step transition matrix to the -th power, or by using the Chapman-Kolmogorov equations (covered below).
Transition probability matrix
The transition probability matrix collects all one-step probabilities into a single square matrix. The entry in row , column is .
For a three-state chain, looks like:
This matrix is a stochastic matrix: every entry is non-negative and every row sums to 1. The n-step transition matrix is simply .
Properties of transition probabilities
- Each row of is a valid probability distribution.
- gives the n-step transition probabilities directly.
- As , the rows of may converge to the stationary distribution (under the right conditions).
Chapman-Kolmogorov equations
The Chapman-Kolmogorov equations connect n-step transition probabilities to shorter-step probabilities. They're the key recursive tool for computing multi-step probabilities without brute-force enumeration.
Derivation
The idea comes from the law of total probability combined with the Markov property. To go from state to state in steps, the process must pass through some intermediate state after steps. Summing over all possible intermediates:
This holds for any non-negative integers and , and any states and .

Matrix form
In matrix notation, the Chapman-Kolmogorov equations become:
This is just matrix multiplication. It confirms that computing by repeated matrix multiplication is mathematically justified.
Solving for n-step probabilities
A practical recursive approach: set and to get
Starting from the known one-step matrix , you can build up iteratively. For large , matrix diagonalization or eigenvalue methods are more efficient than repeated multiplication.
Classification of states
Classifying states reveals the structural skeleton of a Markov chain. States that "talk to each other" get grouped together, and the relationships between groups determine the chain's long-term behavior.
Communicating states
Two states and communicate (written ) if each is reachable from the other with positive probability in some finite number of steps. Formally, there exist integers such that:
Communication is an equivalence relation: it's reflexive (every state communicates with itself), symmetric, and transitive.
Accessibility and communication
State is accessible from state if there exists some with . Accessibility is one-directional. Communication requires accessibility in both directions.
Equivalence classes
Because communication is an equivalence relation, it partitions the state space into disjoint communicating classes. Within each class, every state can reach every other state. Between classes, transitions can only go one way (or not at all).
Reducibility and irreducibility
- A Markov chain is irreducible if the entire state space forms a single communicating class. Every state can reach every other state.
- A Markov chain is reducible if there are two or more communicating classes.
Irreducibility is a crucial assumption for many convergence theorems. If a chain is reducible, you often analyze each closed communicating class separately.
Periodicity of states
Periodicity captures whether a state exhibits cyclic return patterns. It affects whether the chain's distribution settles down to a steady state or keeps oscillating.
Definition and examples
The period of state is:
If , the state is aperiodic. If , the state is periodic with period .
- Aperiodic example: A state with a self-loop () always has period 1, since the process can return in 1 step.
- Periodic example: A random walk on a bipartite graph where returns are only possible in an even number of steps has period 2.
Note: A simple random walk on the integers is actually periodic with period 2 (you can only return to the origin in an even number of steps), not aperiodic.
Aperiodic vs periodic states
- An aperiodic state has for its return times, so returns can happen at various step counts without a rigid cycle.
- A periodic state with period can only return to itself at times that are multiples of .
- In an irreducible chain, all states share the same period. So you can speak of the period of the chain itself.
Cyclic behavior
Periodic chains cycle through groups of states in a predictable pattern. The n-step transition probabilities are zero whenever is not a multiple of the period.
Aperiodicity (together with irreducibility and positive recurrence) is one of the three conditions guaranteeing convergence to a unique stationary distribution.
Recurrence and transience
These properties describe whether the chain keeps revisiting a state or eventually leaves it forever.

Recurrent vs transient states
- State is recurrent if, starting from , the probability of eventually returning to is exactly 1.
- State is transient if there's a positive probability of never returning to .
Equivalently, a recurrent state is visited infinitely often (with probability 1), while a transient state is visited only finitely many times.
In an irreducible chain, all states are the same type: either all recurrent or all transient.
Positive and null recurrence
Recurrent states split further:
- Positive recurrent: The expected return time is finite.
- Null recurrent: The state is recurrent, but the expected return time is infinite.
Null recurrence only occurs in chains with countably infinite state spaces. Every recurrent state in a finite irreducible chain is positive recurrent.
Positive recurrence is required for a stationary distribution to exist.
Absorption probabilities
An absorbing state is one that, once entered, is never left (). The absorption probability from state into a set of absorbing states is the probability of eventually reaching starting from .
To compute absorption probabilities:
- Identify the absorbing states and the transient states.
- Set up equations using first-step analysis: for each transient state , write , where for .
- Solve the resulting system of linear equations.
These calculations appear in gambler's ruin problems and reliability models.
Expected return times
The mean recurrence time of a positive recurrent state is , where is the first return time to .
A key connection to stationary distributions: for an irreducible, positive recurrent chain with stationary distribution ,
States visited more frequently have larger stationary probabilities and shorter mean return times.
Stationary distributions
A stationary distribution is a probability distribution that, once achieved, persists forever. It describes the chain's long-run behavior.
Definition and existence
A probability vector is a stationary distribution if:
The equation says that if the chain's state distribution is at time , it's still at time .
Existence depends on the chain's structure. An irreducible, positive recurrent chain always has a stationary distribution.
Unique vs multiple stationary distributions
- An irreducible, positive recurrent chain has exactly one stationary distribution.
- A reducible chain can have multiple stationary distributions, because each closed communicating class can have its own, and you can take convex combinations.
Uniqueness matters because it means the long-run behavior doesn't depend on where the chain starts.
Calculating stationary distributions
To find for a finite irreducible chain:
- Write out the system , which gives one equation per state.
- Notice that these equations are linearly dependent (they have a one-dimensional solution space).
- Replace one equation with the normalization condition .
- Solve the resulting system.
For larger chains, numerical methods like the power method (repeatedly multiplying an arbitrary distribution by ) or eigenvalue decomposition are more practical.
Limiting behavior and convergence
For an irreducible, aperiodic, positive recurrent Markov chain, the convergence theorem guarantees:
regardless of the starting state . The rows of all converge to the same stationary distribution .
If the chain is periodic, this pointwise limit doesn't exist (the probabilities oscillate), but the time-averaged probabilities still converge to .
The mixing time measures how many steps the chain needs to get close to stationarity, and the spectral gap (the difference between the two largest eigenvalue magnitudes of ) controls the rate of convergence.
Applications and examples
Random walks on graphs
A random walk on a graph places an agent on a node, and at each step the agent moves to a neighbor chosen according to some probability rule (often uniformly at random). The state space is the node set, and the transition probabilities come from the graph structure. Applications include network analysis, community detection, and recommender systems.
Markov chains in biology
Biological applications include modeling DNA sequence evolution (each state is a nucleotide: A, C, G, T), protein folding (states are conformations), and population dynamics. The transition probabilities encode mutation rates, folding kinetics, or birth-death rates, and the stationary distribution often represents an equilibrium the system tends toward.
Queueing systems
In a queueing model, the state is typically the number of customers in the system. Arrivals and departures drive the transitions. For example, in an M/M/1 queue, the arrival rate and service rate determine the transition probabilities, and the stationary distribution (when ) gives the long-run probability of each queue length.
PageRank algorithm
Google's PageRank models a "random surfer" clicking links on the web. The state space is the set of all web pages, and the transition probability from page to page is proportional to the number of outgoing links from that point to . A damping factor (typically around 0.85) adds a small probability of jumping to any page at random, ensuring the chain is irreducible and aperiodic. The stationary distribution then ranks pages by importance.