Fiveable

📊Actuarial Mathematics Unit 1 Review

QR code for Actuarial Mathematics practice questions

1.4 Expectation, variance, and moments

1.4 Expectation, variance, and moments

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Actuarial Mathematics
Unit & Topic Study Guides

Definition of expectation

Expectation represents the average value a random variable takes over many trials. It collapses an entire probability distribution into a single number, which makes comparing random variables much easier. In actuarial work, expectation is the starting point for pricing insurance products, estimating reserves, and quantifying risk.

Discrete random variables

For a discrete random variable XX with probability mass function p(x)p(x), the expectation is:

E(X)=xxp(x)E(X) = \sum_{x} x \cdot p(x)

You multiply each possible outcome by its probability, then sum everything up.

Example: For a fair six-sided die, each face has probability 16\frac{1}{6}, so:

E(X)=116+216+316+416+516+616=3.5E(X) = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6} = 3.5

Notice the expected value doesn't have to be a value the die can actually land on. It's the long-run average.

Continuous random variables

For a continuous random variable XX with probability density function f(x)f(x), the sum becomes an integral:

E(X)=xf(x)dxE(X) = \int_{-\infty}^{\infty} x \cdot f(x) \, dx

The logic is the same: weight each value by how likely it is, but now you integrate over a continuous range instead of summing over discrete outcomes.

Example: For a standard normal distribution (μ=0,σ=1\mu = 0, \sigma = 1), the density is symmetric about zero, so E(X)=0E(X) = 0.

Linearity of expectation

This is one of the most useful properties in all of probability:

E(X+Y)=E(X)+E(Y)E(X + Y) = E(X) + E(Y)

This holds regardless of whether XX and YY are independent, dependent, or correlated. More generally, for constants aa and bb:

E(aX+b)=aE(X)+bE(aX + b) = aE(X) + b

Linearity lets you break complex random variables into simpler pieces, compute each expectation separately, and add the results.

Moments of distributions

Moments characterize the shape and properties of a distribution. Think of them as a sequence of numerical summaries: the first moment tells you where the distribution is centered, the second tells you how spread out it is, and higher moments capture asymmetry and tail behavior.

Raw moments

The kk-th raw moment (or moment about the origin) of a random variable XX is:

μk=E(Xk)=xkf(x)dx\mu_k' = E(X^k) = \int_{-\infty}^{\infty} x^k \cdot f(x) \, dx

For discrete variables, replace the integral with a sum. The first raw moment (k=1k=1) is just the expectation E(X)E(X).

Central moments

The kk-th central moment measures deviation from the mean:

μk=E((Xμ)k),where μ=E(X)\mu_k = E\left((X - \mu)^k\right), \quad \text{where } \mu = E(X)

  • The first central moment is always zero (deviations above and below the mean cancel).
  • The second central moment is the variance: Var(X)=E((Xμ)2)\text{Var}(X) = E\left((X - \mu)^2\right).
  • Higher central moments (k=3,4k = 3, 4) give skewness and kurtosis information.

Moment generating functions

The moment generating function (MGF) of XX is:

MX(t)=E(etX)M_X(t) = E(e^{tX})

Why "moment generating"? Because the kk-th derivative evaluated at t=0t = 0 gives the kk-th raw moment:

MX(k)(0)=E(Xk)M_X^{(k)}(0) = E(X^k)

Two key facts about MGFs:

  • Uniqueness: If two random variables have the same MGF (where it exists in a neighborhood of zero), they have the same distribution.
  • Sums of independent variables: If XX and YY are independent, MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) \cdot M_Y(t). This makes MGFs especially powerful for finding the distribution of sums.

Note that the MGF doesn't exist for all distributions (the integral may diverge). When it doesn't, the characteristic function E(eitX)E(e^{itX}) serves a similar role.

Variance and standard deviation

Variance and standard deviation measure how spread out a distribution is around its mean. A small variance means outcomes cluster tightly; a large variance means they're more dispersed.

Definition of variance

Var(X)=E((Xμ)2)=E(X2)(E(X))2\text{Var}(X) = E\left((X - \mu)^2\right) = E(X^2) - \left(E(X)\right)^2

The second form (often called the "computational formula") is usually easier to work with. To use it:

  1. Compute E(X2)E(X^2) (the second raw moment).
  2. Compute E(X)E(X) and square it.
  3. Subtract: Var(X)=E(X2)[E(X)]2\text{Var}(X) = E(X^2) - [E(X)]^2.

Because variance involves squaring, its units are the square of the original variable's units, which can make direct interpretation awkward.

Discrete random variables, Discrete Random Variables (5 of 5) | Concepts in Statistics

Variance of linear combinations

For random variables XX and YY with constants aa and bb:

Var(aX+bY)=a2Var(X)+b2Var(Y)+2abCov(X,Y)\text{Var}(aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y) + 2ab\,\text{Cov}(X,Y)

If XX and YY are independent, then Cov(X,Y)=0\text{Cov}(X,Y) = 0 and this simplifies to:

Var(aX+bY)=a2Var(X)+b2Var(Y)\text{Var}(aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y)

Watch the constants carefully: they get squared when pulled out of variance, unlike expectation where they come out linearly.

Standard deviation vs variance

σX=Var(X)\sigma_X = \sqrt{\text{Var}(X)}

Standard deviation is in the same units as XX, so it's more interpretable. If claim sizes have a mean of $500 and a standard deviation of $120, you can immediately see that typical deviations from the mean are on the order of $120. Variance would give you $14,400, which is harder to contextualize.

Skewness and kurtosis

These higher-order moments describe the shape of a distribution beyond just its center and spread.

Measuring asymmetry with skewness

Skewness quantifies how asymmetric a distribution is:

Skew(X)=E((Xμ)3)σ3\text{Skew}(X) = \frac{E\left((X - \mu)^3\right)}{\sigma^3}

  • Positive skewness: The right tail is longer. Most values cluster to the left of the mean. Common in insurance loss distributions where most claims are small but a few are very large.
  • Negative skewness: The left tail is longer.
  • Zero skewness: The distribution is symmetric (e.g., the normal distribution).

Dividing by σ3\sigma^3 makes skewness dimensionless, so you can compare skewness across distributions with different scales.

Measuring tail behavior with kurtosis

Kurtosis captures how heavy the tails are:

Kurt(X)=E((Xμ)4)σ4\text{Kurt}(X) = \frac{E\left((X - \mu)^4\right)}{\sigma^4}

The normal distribution has a kurtosis of 3. Excess kurtosis subtracts 3 so that the normal serves as the baseline:

Excess Kurt(X)=Kurt(X)3\text{Excess Kurt}(X) = \text{Kurt}(X) - 3

  • Excess kurtosis > 0 (leptokurtic): Heavier tails than normal, more prone to extreme values.
  • Excess kurtosis < 0 (platykurtic): Lighter tails than normal.

For actuaries, high kurtosis is a red flag: it signals that extreme losses are more likely than a normal model would predict.

Comparing distributions using moments

Two distributions can share the same mean and variance but differ in higher moments. For example, the normal and Laplace distributions can be parameterized to have identical means and variances, but the Laplace distribution has excess kurtosis of 3 (kurtosis of 6), meaning it produces more extreme values. Moments give you a systematic way to detect these differences.

Covariance and correlation

These measure the linear relationship between two random variables. They're essential for understanding how risks interact in a portfolio.

Definition of covariance

Cov(X,Y)=E((XμX)(YμY))=E(XY)E(X)E(Y)\text{Cov}(X,Y) = E\left((X - \mu_X)(Y - \mu_Y)\right) = E(XY) - E(X)E(Y)

The computational form E(XY)E(X)E(Y)E(XY) - E(X)E(Y) is typically easier to calculate. Covariance can be:

  • Positive: XX and YY tend to increase together.
  • Negative: When one increases, the other tends to decrease.
  • Zero: No linear association (but there could still be a nonlinear relationship).

Correlation coefficient

Correlation normalizes covariance to a [1,1][-1, 1] scale:

ρX,Y=Cov(X,Y)σXσY\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \, \sigma_Y}

  • ρ=1\rho = 1: Perfect positive linear relationship.
  • ρ=1\rho = -1: Perfect negative linear relationship.
  • ρ=0\rho = 0: No linear relationship.

A common pitfall: zero correlation does not imply independence. Two variables can be uncorrelated yet strongly dependent in a nonlinear way (e.g., Y=X2Y = X^2 where XX is symmetric about zero).

Discrete random variables, Probability mass function - Wikipedia

Covariance matrices

For a random vector X=(X1,,Xn)\mathbf{X} = (X_1, \ldots, X_n), the covariance matrix is:

Σ=E((Xμ)(Xμ)T)\Sigma = E\left((\mathbf{X} - \boldsymbol{\mu})(\mathbf{X} - \boldsymbol{\mu})^T\right)

  • Diagonal entries Σii=Var(Xi)\Sigma_{ii} = \text{Var}(X_i)
  • Off-diagonal entries Σij=Cov(Xi,Xj)\Sigma_{ij} = \text{Cov}(X_i, X_j)

The matrix is always symmetric and positive semi-definite. In actuarial work, covariance matrices are used to model the joint behavior of multiple lines of business or risk factors simultaneously.

Conditional expectation

Conditional expectation is the expected value of a random variable given that you know something about another variable. It refines your estimate by incorporating additional information.

Definition of conditional expectation

For continuous random variables, the conditional expectation of YY given X=xX = x is:

E(YX=x)=yfYX(yx)dyE(Y \mid X = x) = \int_{-\infty}^{\infty} y \cdot f_{Y|X}(y \mid x) \, dy

Here fYX(yx)f_{Y|X}(y \mid x) is the conditional density of YY given X=xX = x. You can think of this as: "If I fix XX at a particular value, what's the average of YY?"

Viewed as a function of XX, E(YX)E(Y \mid X) is itself a random variable.

Law of total expectation

E(Y)=E(E(YX))E(Y) = E\left(E(Y \mid X)\right)

This says you can compute E(Y)E(Y) in two stages:

  1. First compute E(YX)E(Y \mid X) for each possible value of XX.
  2. Then average those conditional expectations over the distribution of XX.

This is sometimes called the "tower property" or "iterated expectation." It's extremely useful when direct computation of E(Y)E(Y) is hard but conditioning on XX simplifies things.

Conditional variance

Var(YX=x)=E((YE(YX=x))2X=x)\text{Var}(Y \mid X = x) = E\left((Y - E(Y \mid X = x))^2 \mid X = x\right)

The law of total variance (also called the "Eve's law") decomposes unconditional variance into two components:

Var(Y)=E(Var(YX))+Var(E(YX))\text{Var}(Y) = E\left(\text{Var}(Y \mid X)\right) + \text{Var}\left(E(Y \mid X)\right)

  • E(Var(YX))E(\text{Var}(Y \mid X)): Average variability within each group defined by XX.
  • Var(E(YX))\text{Var}(E(Y \mid X)): Variability between group means.

In actuarial contexts, this decomposition appears frequently. For instance, if XX represents a policyholder's risk class, the first term captures randomness within each class, and the second captures how much average claims differ across classes.

Applications in actuarial science

Pricing insurance contracts

The expected payout E(L)E(L) of a loss random variable LL is the foundation of any premium calculation. A simple premium principle is:

Premium=E(L)+θVar(L)\text{Premium} = E(L) + \theta \cdot \text{Var}(L)

where θ\theta is a risk loading factor. Higher variance (more uncertain losses) leads to a higher premium. Skewness and kurtosis further inform how much extra loading is needed for distributions with heavy tails or asymmetry.

Calculating reserves

Reserves estimate future liabilities. Actuaries use E(L)E(L) to set the central estimate, then examine the variance and higher moments to determine how much additional margin is needed. The law of total expectation is especially useful here: you can condition on claim type, policy year, or development period to build up the overall reserve estimate in stages.

Risk management using moments

  • Variance and standard deviation quantify overall portfolio risk.
  • Skewness flags portfolios where large losses are more likely than large gains.
  • Kurtosis identifies exposure to extreme events that standard deviation alone would understate.
  • Covariance and correlation reveal how different lines of business or risk factors move together, which is critical for diversification. A portfolio of negatively correlated risks has lower total variance than the sum of individual variances.