Fiveable

🔀Stochastic Processes Unit 5 Review

QR code for Stochastic Processes practice questions

5.2 State space and transition probabilities

5.2 State space and transition probabilities

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🔀Stochastic Processes
Unit & Topic Study Guides

State space and transition probabilities

State spaces and transition probabilities are the two building blocks of any Markov chain. The state space tells you where the system can be, and the transition probabilities tell you how likely it is to move from one place to another. Together, they fully specify the chain's dynamics and set the stage for everything else: long-term behavior, stationary distributions, and convergence analysis.

State space

The state space of a stochastic process is the set of all possible values the process can take. You can think of it as the "menu" of outcomes at any given time step. How you classify the state space matters because it determines which mathematical tools you can use.

Discrete vs continuous

A discrete state space consists of a countable set of distinct values. A continuous state space is uncountable and spans a range of real numbers.

  • Discrete: Number of customers in a queue {0,1,2,}\{0, 1, 2, \ldots\}
  • Continuous: Temperature readings from a sensor, taking any value in [0,100][0, 100]

In this unit on Markov chains, you'll almost always be working with discrete state spaces. Continuous state spaces show up in continuous-state models like diffusion processes.

Countable vs uncountable

  • A countable state space can be put into one-to-one correspondence with the natural numbers. It can be finite or countably infinite.
  • An uncountable state space cannot. The classic example is any interval of real numbers.

Examples:

  • Countable: Number of emails received per day {0,1,2,}\{0, 1, 2, \ldots\}
  • Uncountable: Time between arrivals in a continuous-time process (any positive real number)

Finite vs infinite

  • A finite state space has a fixed, limited number of states. Example: outcomes of a single die roll, {1,2,3,4,5,6}\{1, 2, 3, 4, 5, 6\}.
  • An infinite state space has unboundedly many states. Example: total number of customers ever served, {0,1,2,}\{0, 1, 2, \ldots\}.

Finite state spaces are especially nice because the transition probability matrix has a fixed size and you can do exact linear algebra on it.

Transition probabilities

Transition probabilities quantify how likely the process is to jump from one state to another. They encode all the randomness in a Markov chain.

Definition and notation

The transition probability from state ii to state jj is written PijP_{ij} and defined as:

Pij=P(Xn+1=jXn=i)P_{ij} = P(X_{n+1} = j \mid X_n = i)

where XnX_n is the state at time nn. Two properties must always hold:

  • Non-negativity: 0Pij10 \leq P_{ij} \leq 1 for all i,ji, j
  • Row sums equal 1: jPij=1\sum_{j} P_{ij} = 1 for every state ii

The second condition just says that from any state, the process must go somewhere (including possibly staying put).

One-step transition probabilities

One-step transition probabilities describe movement in a single time step. They're the most basic unit of the chain's behavior.

Example: In a two-state weather model (sunny, rainy), you might have Psunny, rainy=0.2P_{\text{sunny, rainy}} = 0.2, meaning there's a 20% chance tomorrow is rainy given today is sunny.

n-step transition probabilities

The n-step transition probability Pij(n)P_{ij}^{(n)} is the probability of going from state ii to state jj in exactly nn time steps:

Pij(n)=P(Xn+m=jXm=i)P_{ij}^{(n)} = P(X_{n+m} = j \mid X_m = i)

You can compute these by raising the one-step transition matrix to the nn-th power, or by using the Chapman-Kolmogorov equations (covered below).

Transition probability matrix

The transition probability matrix PP collects all one-step probabilities into a single square matrix. The entry in row ii, column jj is PijP_{ij}.

For a three-state chain, PP looks like:

P=(P11P12P13P21P22P23P31P32P33)P = \begin{pmatrix} P_{11} & P_{12} & P_{13} \\ P_{21} & P_{22} & P_{23} \\ P_{31} & P_{32} & P_{33} \end{pmatrix}

This matrix is a stochastic matrix: every entry is non-negative and every row sums to 1. The n-step transition matrix is simply PnP^n.

Properties of transition probabilities

  • Each row of PP is a valid probability distribution.
  • PnP^n gives the n-step transition probabilities directly.
  • As nn \to \infty, the rows of PnP^n may converge to the stationary distribution (under the right conditions).

Chapman-Kolmogorov equations

The Chapman-Kolmogorov equations connect n-step transition probabilities to shorter-step probabilities. They're the key recursive tool for computing multi-step probabilities without brute-force enumeration.

Derivation

The idea comes from the law of total probability combined with the Markov property. To go from state ii to state jj in m+nm + n steps, the process must pass through some intermediate state kk after mm steps. Summing over all possible intermediates:

Pij(m+n)=kPik(m)Pkj(n)P_{ij}^{(m+n)} = \sum_{k} P_{ik}^{(m)} \, P_{kj}^{(n)}

This holds for any non-negative integers mm and nn, and any states ii and jj.

Discrete vs continuous, Frontiers | Discrete- vs. Continuous-Time Modeling of Unequally Spaced Experience Sampling ...

Matrix form

In matrix notation, the Chapman-Kolmogorov equations become:

P(m+n)=P(m)P(n)P^{(m+n)} = P^{(m)} \cdot P^{(n)}

This is just matrix multiplication. It confirms that computing PnP^n by repeated matrix multiplication is mathematically justified.

Solving for n-step probabilities

A practical recursive approach: set m=1m = 1 and n=n1n = n - 1 to get

Pij(n)=kPikPkj(n1)P_{ij}^{(n)} = \sum_{k} P_{ik} \, P_{kj}^{(n-1)}

Starting from the known one-step matrix PP, you can build up P(2),P(3),P^{(2)}, P^{(3)}, \ldots iteratively. For large nn, matrix diagonalization or eigenvalue methods are more efficient than repeated multiplication.

Classification of states

Classifying states reveals the structural skeleton of a Markov chain. States that "talk to each other" get grouped together, and the relationships between groups determine the chain's long-term behavior.

Communicating states

Two states ii and jj communicate (written iji \leftrightarrow j) if each is reachable from the other with positive probability in some finite number of steps. Formally, there exist integers m,n1m, n \geq 1 such that:

Pij(m)>0andPji(n)>0P_{ij}^{(m)} > 0 \quad \text{and} \quad P_{ji}^{(n)} > 0

Communication is an equivalence relation: it's reflexive (every state communicates with itself), symmetric, and transitive.

Accessibility and communication

State jj is accessible from state ii if there exists some n0n \geq 0 with Pij(n)>0P_{ij}^{(n)} > 0. Accessibility is one-directional. Communication requires accessibility in both directions.

Equivalence classes

Because communication is an equivalence relation, it partitions the state space into disjoint communicating classes. Within each class, every state can reach every other state. Between classes, transitions can only go one way (or not at all).

Reducibility and irreducibility

  • A Markov chain is irreducible if the entire state space forms a single communicating class. Every state can reach every other state.
  • A Markov chain is reducible if there are two or more communicating classes.

Irreducibility is a crucial assumption for many convergence theorems. If a chain is reducible, you often analyze each closed communicating class separately.

Periodicity of states

Periodicity captures whether a state exhibits cyclic return patterns. It affects whether the chain's distribution settles down to a steady state or keeps oscillating.

Definition and examples

The period of state ii is:

d(i)=gcd{n1:Pii(n)>0}d(i) = \gcd\{n \geq 1 : P_{ii}^{(n)} > 0\}

If d(i)=1d(i) = 1, the state is aperiodic. If d(i)>1d(i) > 1, the state is periodic with period d(i)d(i).

  • Aperiodic example: A state with a self-loop (Pii>0P_{ii} > 0) always has period 1, since the process can return in 1 step.
  • Periodic example: A random walk on a bipartite graph where returns are only possible in an even number of steps has period 2.

Note: A simple random walk on the integers is actually periodic with period 2 (you can only return to the origin in an even number of steps), not aperiodic.

Aperiodic vs periodic states

  • An aperiodic state has gcd=1\gcd = 1 for its return times, so returns can happen at various step counts without a rigid cycle.
  • A periodic state with period dd can only return to itself at times that are multiples of dd.
  • In an irreducible chain, all states share the same period. So you can speak of the period of the chain itself.

Cyclic behavior

Periodic chains cycle through groups of states in a predictable pattern. The n-step transition probabilities Pii(n)P_{ii}^{(n)} are zero whenever nn is not a multiple of the period.

Aperiodicity (together with irreducibility and positive recurrence) is one of the three conditions guaranteeing convergence to a unique stationary distribution.

Recurrence and transience

These properties describe whether the chain keeps revisiting a state or eventually leaves it forever.

Discrete vs continuous, Discrete Random Variables (3 of 5) | Concepts in Statistics

Recurrent vs transient states

  • State ii is recurrent if, starting from ii, the probability of eventually returning to ii is exactly 1.
  • State ii is transient if there's a positive probability of never returning to ii.

Equivalently, a recurrent state is visited infinitely often (with probability 1), while a transient state is visited only finitely many times.

In an irreducible chain, all states are the same type: either all recurrent or all transient.

Positive and null recurrence

Recurrent states split further:

  • Positive recurrent: The expected return time E[TiX0=i]E[T_i \mid X_0 = i] is finite.
  • Null recurrent: The state is recurrent, but the expected return time is infinite.

Null recurrence only occurs in chains with countably infinite state spaces. Every recurrent state in a finite irreducible chain is positive recurrent.

Positive recurrence is required for a stationary distribution to exist.

Absorption probabilities

An absorbing state is one that, once entered, is never left (Pii=1P_{ii} = 1). The absorption probability from state ii into a set of absorbing states AA is the probability of eventually reaching AA starting from ii.

To compute absorption probabilities:

  1. Identify the absorbing states and the transient states.
  2. Set up equations using first-step analysis: for each transient state ii, write hi=jPijhjh_i = \sum_j P_{ij} h_j, where hj=1h_j = 1 for jAj \in A.
  3. Solve the resulting system of linear equations.

These calculations appear in gambler's ruin problems and reliability models.

Expected return times

The mean recurrence time of a positive recurrent state ii is μi=E[TiX0=i]\mu_i = E[T_i \mid X_0 = i], where TiT_i is the first return time to ii.

A key connection to stationary distributions: for an irreducible, positive recurrent chain with stationary distribution π\pi,

πi=1μi\pi_i = \frac{1}{\mu_i}

States visited more frequently have larger stationary probabilities and shorter mean return times.

Stationary distributions

A stationary distribution is a probability distribution that, once achieved, persists forever. It describes the chain's long-run behavior.

Definition and existence

A probability vector π=(π1,π2,)\pi = (\pi_1, \pi_2, \ldots) is a stationary distribution if:

πP=πandiπi=1,πi0\pi P = \pi \quad \text{and} \quad \sum_i \pi_i = 1, \quad \pi_i \geq 0

The equation πP=π\pi P = \pi says that if the chain's state distribution is π\pi at time nn, it's still π\pi at time n+1n+1.

Existence depends on the chain's structure. An irreducible, positive recurrent chain always has a stationary distribution.

Unique vs multiple stationary distributions

  • An irreducible, positive recurrent chain has exactly one stationary distribution.
  • A reducible chain can have multiple stationary distributions, because each closed communicating class can have its own, and you can take convex combinations.

Uniqueness matters because it means the long-run behavior doesn't depend on where the chain starts.

Calculating stationary distributions

To find π\pi for a finite irreducible chain:

  1. Write out the system πP=π\pi P = \pi, which gives one equation per state.
  2. Notice that these equations are linearly dependent (they have a one-dimensional solution space).
  3. Replace one equation with the normalization condition iπi=1\sum_i \pi_i = 1.
  4. Solve the resulting system.

For larger chains, numerical methods like the power method (repeatedly multiplying an arbitrary distribution by PP) or eigenvalue decomposition are more practical.

Limiting behavior and convergence

For an irreducible, aperiodic, positive recurrent Markov chain, the convergence theorem guarantees:

limnPij(n)=πj\lim_{n \to \infty} P_{ij}^{(n)} = \pi_j

regardless of the starting state ii. The rows of PnP^n all converge to the same stationary distribution π\pi.

If the chain is periodic, this pointwise limit doesn't exist (the probabilities oscillate), but the time-averaged probabilities still converge to π\pi.

The mixing time measures how many steps the chain needs to get close to stationarity, and the spectral gap (the difference between the two largest eigenvalue magnitudes of PP) controls the rate of convergence.

Applications and examples

Random walks on graphs

A random walk on a graph places an agent on a node, and at each step the agent moves to a neighbor chosen according to some probability rule (often uniformly at random). The state space is the node set, and the transition probabilities come from the graph structure. Applications include network analysis, community detection, and recommender systems.

Markov chains in biology

Biological applications include modeling DNA sequence evolution (each state is a nucleotide: A, C, G, T), protein folding (states are conformations), and population dynamics. The transition probabilities encode mutation rates, folding kinetics, or birth-death rates, and the stationary distribution often represents an equilibrium the system tends toward.

Queueing systems

In a queueing model, the state is typically the number of customers in the system. Arrivals and departures drive the transitions. For example, in an M/M/1 queue, the arrival rate λ\lambda and service rate μ\mu determine the transition probabilities, and the stationary distribution (when λ<μ\lambda < \mu) gives the long-run probability of each queue length.

PageRank algorithm

Google's PageRank models a "random surfer" clicking links on the web. The state space is the set of all web pages, and the transition probability from page ii to page jj is proportional to the number of outgoing links from ii that point to jj. A damping factor (typically around 0.85) adds a small probability of jumping to any page at random, ensuring the chain is irreducible and aperiodic. The stationary distribution π\pi then ranks pages by importance.