Definition of expectation
Expectation represents the average value a random variable takes over many trials. It collapses an entire probability distribution into a single number, which makes comparing random variables much easier. In actuarial work, expectation is the starting point for pricing insurance products, estimating reserves, and quantifying risk.
Discrete random variables
For a discrete random variable with probability mass function , the expectation is:
You multiply each possible outcome by its probability, then sum everything up.
Example: For a fair six-sided die, each face has probability , so:
Notice the expected value doesn't have to be a value the die can actually land on. It's the long-run average.
Continuous random variables
For a continuous random variable with probability density function , the sum becomes an integral:
The logic is the same: weight each value by how likely it is, but now you integrate over a continuous range instead of summing over discrete outcomes.
Example: For a standard normal distribution (), the density is symmetric about zero, so .
Linearity of expectation
This is one of the most useful properties in all of probability:
This holds regardless of whether and are independent, dependent, or correlated. More generally, for constants and :
Linearity lets you break complex random variables into simpler pieces, compute each expectation separately, and add the results.
Moments of distributions
Moments characterize the shape and properties of a distribution. Think of them as a sequence of numerical summaries: the first moment tells you where the distribution is centered, the second tells you how spread out it is, and higher moments capture asymmetry and tail behavior.
Raw moments
The -th raw moment (or moment about the origin) of a random variable is:
For discrete variables, replace the integral with a sum. The first raw moment () is just the expectation .
Central moments
The -th central moment measures deviation from the mean:
- The first central moment is always zero (deviations above and below the mean cancel).
- The second central moment is the variance: .
- Higher central moments () give skewness and kurtosis information.
Moment generating functions
The moment generating function (MGF) of is:
Why "moment generating"? Because the -th derivative evaluated at gives the -th raw moment:
Two key facts about MGFs:
- Uniqueness: If two random variables have the same MGF (where it exists in a neighborhood of zero), they have the same distribution.
- Sums of independent variables: If and are independent, . This makes MGFs especially powerful for finding the distribution of sums.
Note that the MGF doesn't exist for all distributions (the integral may diverge). When it doesn't, the characteristic function serves a similar role.
Variance and standard deviation
Variance and standard deviation measure how spread out a distribution is around its mean. A small variance means outcomes cluster tightly; a large variance means they're more dispersed.
Definition of variance
The second form (often called the "computational formula") is usually easier to work with. To use it:
- Compute (the second raw moment).
- Compute and square it.
- Subtract: .
Because variance involves squaring, its units are the square of the original variable's units, which can make direct interpretation awkward.

Variance of linear combinations
For random variables and with constants and :
If and are independent, then and this simplifies to:
Watch the constants carefully: they get squared when pulled out of variance, unlike expectation where they come out linearly.
Standard deviation vs variance
Standard deviation is in the same units as , so it's more interpretable. If claim sizes have a mean of $500 and a standard deviation of $120, you can immediately see that typical deviations from the mean are on the order of $120. Variance would give you $14,400, which is harder to contextualize.
Skewness and kurtosis
These higher-order moments describe the shape of a distribution beyond just its center and spread.
Measuring asymmetry with skewness
Skewness quantifies how asymmetric a distribution is:
- Positive skewness: The right tail is longer. Most values cluster to the left of the mean. Common in insurance loss distributions where most claims are small but a few are very large.
- Negative skewness: The left tail is longer.
- Zero skewness: The distribution is symmetric (e.g., the normal distribution).
Dividing by makes skewness dimensionless, so you can compare skewness across distributions with different scales.
Measuring tail behavior with kurtosis
Kurtosis captures how heavy the tails are:
The normal distribution has a kurtosis of 3. Excess kurtosis subtracts 3 so that the normal serves as the baseline:
- Excess kurtosis > 0 (leptokurtic): Heavier tails than normal, more prone to extreme values.
- Excess kurtosis < 0 (platykurtic): Lighter tails than normal.
For actuaries, high kurtosis is a red flag: it signals that extreme losses are more likely than a normal model would predict.
Comparing distributions using moments
Two distributions can share the same mean and variance but differ in higher moments. For example, the normal and Laplace distributions can be parameterized to have identical means and variances, but the Laplace distribution has excess kurtosis of 3 (kurtosis of 6), meaning it produces more extreme values. Moments give you a systematic way to detect these differences.
Covariance and correlation
These measure the linear relationship between two random variables. They're essential for understanding how risks interact in a portfolio.
Definition of covariance
The computational form is typically easier to calculate. Covariance can be:
- Positive: and tend to increase together.
- Negative: When one increases, the other tends to decrease.
- Zero: No linear association (but there could still be a nonlinear relationship).
Correlation coefficient
Correlation normalizes covariance to a scale:
- : Perfect positive linear relationship.
- : Perfect negative linear relationship.
- : No linear relationship.
A common pitfall: zero correlation does not imply independence. Two variables can be uncorrelated yet strongly dependent in a nonlinear way (e.g., where is symmetric about zero).

Covariance matrices
For a random vector , the covariance matrix is:
- Diagonal entries
- Off-diagonal entries
The matrix is always symmetric and positive semi-definite. In actuarial work, covariance matrices are used to model the joint behavior of multiple lines of business or risk factors simultaneously.
Conditional expectation
Conditional expectation is the expected value of a random variable given that you know something about another variable. It refines your estimate by incorporating additional information.
Definition of conditional expectation
For continuous random variables, the conditional expectation of given is:
Here is the conditional density of given . You can think of this as: "If I fix at a particular value, what's the average of ?"
Viewed as a function of , is itself a random variable.
Law of total expectation
This says you can compute in two stages:
- First compute for each possible value of .
- Then average those conditional expectations over the distribution of .
This is sometimes called the "tower property" or "iterated expectation." It's extremely useful when direct computation of is hard but conditioning on simplifies things.
Conditional variance
The law of total variance (also called the "Eve's law") decomposes unconditional variance into two components:
- : Average variability within each group defined by .
- : Variability between group means.
In actuarial contexts, this decomposition appears frequently. For instance, if represents a policyholder's risk class, the first term captures randomness within each class, and the second captures how much average claims differ across classes.
Applications in actuarial science
Pricing insurance contracts
The expected payout of a loss random variable is the foundation of any premium calculation. A simple premium principle is:
where is a risk loading factor. Higher variance (more uncertain losses) leads to a higher premium. Skewness and kurtosis further inform how much extra loading is needed for distributions with heavy tails or asymmetry.
Calculating reserves
Reserves estimate future liabilities. Actuaries use to set the central estimate, then examine the variance and higher moments to determine how much additional margin is needed. The law of total expectation is especially useful here: you can condition on claim type, policy year, or development period to build up the overall reserve estimate in stages.
Risk management using moments
- Variance and standard deviation quantify overall portfolio risk.
- Skewness flags portfolios where large losses are more likely than large gains.
- Kurtosis identifies exposure to extreme events that standard deviation alone would understate.
- Covariance and correlation reveal how different lines of business or risk factors move together, which is critical for diversification. A portfolio of negatively correlated risks has lower total variance than the sum of individual variances.