Markov chains are a core modeling tool in actuarial mathematics for systems that evolve over time through discrete states. By capturing how a system moves between states using transition probabilities, they let actuaries forecast outcomes for insurance portfolios, claims processes, and financial risk models.

This topic covers the formal structure of Markov chains, how to compute and interpret transition probabilities, state classification, long-term behavior, first passage and absorption analysis, estimation methods, and Markov decision processes.

Definition of Markov chains

A Markov chain is a stochastic process that transitions between a set of states over time, where the probability of each transition depends only on the current state. This "memoryless" behavior is what makes Markov chains both tractable and widely applicable.

In actuarial work, Markov chains model phenomena like policyholder behavior, claim frequency patterns, and credit rating migrations.

States in Markov chains

A state represents a possible condition the system can occupy at a given time. The collection of all possible states is the state space, which can be finite or countably infinite.

Common actuarial examples:

Policyholder status: active, lapsed, or surrendered
Claim status: no claim, minor claim, or major claim
Credit ratings: AAA, AA, A, BBB, and so on down to default

Discrete vs continuous time

Discrete-time Markov chains (DTMCs) have transitions occurring at fixed intervals (e.g., monthly or annually). These are the primary focus of this topic.
Continuous-time Markov chains (CTMCs) allow transitions at any moment. The time spent in each state before transitioning follows an exponential distribution, which itself has the memoryless property.

Memoryless property

The Markov property (memoryless property) is the defining feature of a Markov chain. It says that the future state depends only on the present state, not on how the system arrived there.

Formally, for states $i$ , $j$ , $k$ , ..., $l$ and time steps $n$ and $m$ :

$P(X_{n+m} = j \mid X_n = i, X_{n-1} = k, \ldots, X_0 = l) = P(X_{n+m} = j \mid X_n = i)$

This is what makes the math manageable: you only need to know where the system is now to compute where it goes next.

Transition probabilities

Transition probabilities quantify how likely the system is to move from one state to another. They're the building blocks for every calculation you'll do with Markov chains.

One-step transition probabilities

The one-step transition probability $p_{ij}$ is the probability of moving from state $i$ to state $j$ in a single time step. For a DTMC with $N$ states, these probabilities fill an $N \times N$ transition probability matrix $\mathbf{P}$ .

Each entry $p_{ij}$ satisfies $p_{ij} \geq 0$ , and each row sums to 1 (since the system must go somewhere).

Multi-step transition probabilities

The multi-step transition probability $p_{ij}^{(n)}$ is the probability of going from state $i$ to state $j$ in exactly $n$ steps. You compute it by raising the transition matrix to the $n$ -th power:

$\mathbf{P}^{(n)} = \mathbf{P}^n$

The Chapman-Kolmogorov equations give a recursive way to break this computation apart. For any intermediate point $m$ :

$p_{ij}^{(n+m)} = \sum_{k=1}^{N} p_{ik}^{(n)} \, p_{kj}^{(m)}$

This says: to get from $i$ to $j$ in $n+m$ steps, sum over all possible intermediate states $k$ that the chain could pass through at step $n$ .

Transition probability matrix

The transition probability matrix $\mathbf{P}$ is a square matrix where row $i$ , column $j$ contains $p_{ij}$ . Two properties always hold:

Every entry is non-negative: $p_{ij} \geq 0$
Every row sums to 1: $\sum_{j=1}^{N} p_{ij} = 1$

A matrix satisfying these conditions is called a stochastic matrix. When you see a transition matrix on an exam, check these two properties first to verify it's valid.

Classification of states

How states behave in the long run depends on their structural relationships within the chain. Classifying states tells you whether the chain settles down, cycles, or gets trapped.

Communicating states

States $i$ and $j$ communicate (written $i \leftrightarrow j$ ) if you can reach $j$ from $i$ and reach $i$ from $j$ , each in a finite number of steps.

Communication is an equivalence relation, so it partitions the state space into communicating classes. If the entire chain forms a single communicating class (every state can reach every other state), the chain is called irreducible.

Absorbing states

An absorbing state is one you can never leave: $p_{ii} = 1$ . Once the chain enters an absorbing state, it stays there forever.

A chain with at least one absorbing state and the property that every transient state can eventually reach some absorbing state is called an absorbing Markov chain. Actuarial examples include policy termination or death of a policyholder, where the state is permanent.

Transient vs recurrent states

A state is transient if there's a positive probability of never returning to it after leaving.
A state is recurrent if the chain is guaranteed (probability 1) to return to it eventually.

Recurrent states break down further:

Positive recurrent: The expected return time is finite. These are the states that matter for long-run analysis.
Null recurrent: The chain returns with probability 1, but the expected return time is infinite. This only occurs in chains with infinitely many states.

In a finite, irreducible chain, all states are positive recurrent.

Long-term behavior

Understanding where a Markov chain "settles" over time is critical for actuarial forecasting. The key question: does the chain converge to a stable distribution?

Stationary distribution

A stationary distribution $\boldsymbol{\pi}$ is a probability vector that satisfies:

$\boldsymbol{\pi} = \boldsymbol{\pi} \mathbf{P}$

If the chain starts in the distribution $\boldsymbol{\pi}$ , it stays in that distribution after every transition. Think of it as an equilibrium: the inflows and outflows for each state are perfectly balanced.

Existence and uniqueness depend on the chain's properties. An irreducible, positive recurrent chain always has a unique stationary distribution.

Limiting distribution

The limiting distribution $\boldsymbol{\pi}^*$ describes the long-run proportion of time spent in each state, regardless of where the chain started. For an irreducible and aperiodic chain, the limiting distribution exists, is unique, and equals the stationary distribution.

To find it, solve:

$\boldsymbol{\pi}^* = \boldsymbol{\pi}^* \mathbf{P}$

$\sum_{i=1}^{N} \pi_i^* = 1$

The first equation gives you $N$ equations (one per state, though one is redundant), and the second ensures the result is a valid probability distribution.

Aperiodicity matters. If a chain is periodic (e.g., it cycles between states with a fixed period), the limiting distribution doesn't exist even though a stationary distribution does. The chain keeps cycling instead of settling down.

Convergence of Markov chains

For an irreducible, aperiodic chain, the state probabilities converge to the limiting distribution over time, no matter the starting state. The rate of convergence is governed by the second-largest eigenvalue (in absolute value) of $\mathbf{P}$ . The closer this eigenvalue is to 0, the faster the chain converges.

This matters in practice: slow convergence means you need more time steps before the long-run distribution is a reliable approximation.

First passage times

First passage times measure how long it takes the chain to reach a target state for the first time. In actuarial contexts, this could represent the time until a first claim, or the number of periods until a policyholder lapses.

Definition of first passage time

The first passage time from state $i$ to state $j$ , denoted $T_{ij}$ , is:

$T_{ij} = \min\{n \geq 1 : X_n = j \mid X_0 = i\}$

This is a random variable. When $i = j$ , it's called the first return time to state $i$ .

States in Markov chains, Markov chain - Dave Tang's blog

Expected first passage times

The expected first passage time $m_{ij}$ is the average number of steps to first reach state $j$ from state $i$ . You find it by solving:

$m_{ij} = 1 + \sum_{k \neq j} p_{ik} \, m_{kj}$

The logic: from state $i$ , you take one step (the "1"), and if you don't land on $j$ , you end up in some state $k$ and still need $m_{kj}$ more steps on average. This gives a system of linear equations you solve simultaneously for all starting states.

Variance of first passage times

The variance $v_{ij}$ measures how spread out the first passage time is around its mean. It's computed from:

$v_{ij} = \sum_{k \neq j} p_{ik} \left( v_{kj} + m_{kj}^2 \right) - m_{ij}^2$

You need the expected first passage times $m_{kj}$ before you can solve for the variances. Together, the mean and variance give a much more complete picture of the first passage time distribution.

Absorption probabilities

For absorbing Markov chains, a central question is: starting from a transient state, what's the probability of ending up in each absorbing state?

Calculation of absorption probabilities

The absorption probability $a_{ij}$ is the probability that the chain, starting in transient state $i$ , eventually gets absorbed into absorbing state $j$ . Solve:

$a_{ij} = p_{ij} + \sum_{k \in T} p_{ik} \, a_{kj}$

where $T$ is the set of transient states. This is another system of linear equations: one equation per transient state, for each absorbing state.

Fundamental matrix

The fundamental matrix $\mathbf{N}$ encodes the expected number of times the chain visits each transient state before being absorbed:

$\mathbf{N} = (\mathbf{I} - \mathbf{Q})^{-1}$

Here, $\mathbf{Q}$ is the submatrix of $\mathbf{P}$ containing only transitions among transient states, and $\mathbf{I}$ is the identity matrix.

The entry $n_{ij}$ gives the expected number of visits to transient state $j$ before absorption, starting from transient state $i$ . This matrix is the workhorse for absorption calculations.

Time to absorption

The expected absorption time from transient state $i$ is simply the sum of the $i$ -th row of the fundamental matrix:

$t_i = \sum_{j \in T} n_{ij}$

This makes intuitive sense: the total expected time before absorption equals the sum of expected visits to every transient state. The variance of absorption time can also be derived from $\mathbf{N}$ .

Applications of Markov chains

Markov chains appear throughout actuarial practice. Three classical applications illustrate different aspects of the theory.

Queueing theory

Queueing models describe systems with arrivals and service (e.g., claims arriving at a processing center). The states represent the number of items in the queue, and transition probabilities depend on arrival and service rates.

By finding the steady-state distribution, actuaries can determine average queue lengths, waiting times, and the probability of exceeding capacity. This informs staffing and resource allocation decisions.

Gambler's ruin problem

A gambler starts with capital $k$ and plays repeated games, winning or losing one unit each round with fixed probabilities $p$ and $q = 1 - p$ . The states are the gambler's current capital (0 through some upper bound $N$ ), with 0 and $N$ as absorbing states.

This is a classic absorbing chain problem. The absorption probabilities give the probability of ruin (reaching 0) versus reaching the target $N$ , and the expected absorption time gives the expected duration of play. The same framework applies to surplus models in ruin theory.

Birth-death processes

Birth-death processes are CTMCs where transitions only occur between neighboring states: the population can increase by one (birth) or decrease by one (death) at any time.

In actuarial work, these model portfolio sizes (new policies vs. lapses), claim counts over time, or disease spread in health insurance populations. The balance between birth and death rates determines whether the system has a stable steady-state distribution or grows/shrinks without bound.

Estimation of transition probabilities

In practice, you rarely know the true transition probabilities. You estimate them from observed data, such as historical records of policyholder transitions or credit rating changes.

Maximum likelihood estimation

Maximum likelihood estimation (MLE) is the standard approach. Given observed transitions, the MLE for each transition probability is:

$\hat{p}_{ij} = \frac{n_{ij}}{n_i}$

where $n_{ij}$ is the number of observed transitions from state $i$ to state $j$ , and $n_i$ is the total number of transitions out of state $i$ . This is simply the observed proportion, and it automatically satisfies the row-sum constraint.

Bayesian estimation

Bayesian estimation incorporates prior beliefs about the transition probabilities before seeing data. A common choice is a Dirichlet prior for each row of the transition matrix, since it's the conjugate prior for multinomial data.

The posterior distribution combines the prior with the observed transition counts via Bayes' theorem. This approach is especially useful when data is sparse, as the prior prevents extreme estimates, and it naturally quantifies parameter uncertainty through the full posterior distribution.

Confidence intervals for probabilities

Confidence intervals quantify uncertainty in the estimated transition probabilities. Two main approaches:

Wald intervals: Use the asymptotic normality of the MLE. The standard error for $\hat{p}_{ij}$ is approximately $\sqrt{\hat{p}_{ij}(1 - \hat{p}_{ij}) / n_i}$ , and a 95% interval is $\hat{p}_{ij} \pm 1.96 \times SE$ .
Bootstrap methods: Resample from the observed data, re-estimate the transition matrix each time, and use the empirical distribution of estimates to construct intervals.

Bootstrap methods tend to perform better when sample sizes are small or when $\hat{p}_{ij}$ is near 0 or 1, where the normal approximation breaks down.

Markov decision processes

Markov decision processes (MDPs) extend Markov chains by adding a decision-maker who can choose actions that influence both the transitions and the rewards received. They're the foundation for optimal sequential decision-making under uncertainty.

States, actions, and rewards

An MDP is defined by four components:

States: The possible conditions of the system (same as in a Markov chain)
Actions: At each state, the decision-maker selects from a set of available actions
Transition probabilities: Now depend on both the current state and the chosen action: $p_{ij}(a)$
Rewards: Each state-action pair yields an immediate reward $r(i, a)$ , representing costs or benefits

The objective is to find a strategy (policy) that maximizes the expected total discounted reward over time.

Optimal policies

A policy maps each state to an action. The optimal policy maximizes the expected total reward, accounting for both immediate and future consequences.

The Bellman equation expresses this recursively. For the optimal value function $V^*(i)$ :

$V^*(i) = \max_{a} \left[ r(i, a) + \gamma \sum_{j} p_{ij}(a) \, V^*(j) \right]$

where $\gamma$ is the discount factor ( $0 \leq \gamma < 1$ ). The optimal action at each state is whichever action achieves this maximum.

Value iteration vs policy iteration

Two standard algorithms solve for the optimal policy:

Value iteration:

Start with an initial guess for $V(i)$ (e.g., all zeros)
Update each state's value using the Bellman equation
Repeat until the values converge (changes fall below a threshold)

Policy iteration:

Start with an arbitrary policy
Policy evaluation: Compute the value function for the current policy by solving a system of linear equations
Policy improvement: Update the policy by choosing the action that maximizes value at each state
Repeat steps 2-3 until the policy stops changing

Policy iteration typically converges in fewer iterations but each iteration is more expensive (it requires solving a linear system). Value iteration does simpler updates but may need more iterations. For large state spaces with sparse transitions, value iteration is often preferred.