Fiveable

📊Actuarial Mathematics Unit 1 Review

QR code for Actuarial Mathematics practice questions

1.7 Joint distributions and covariance

1.7 Joint distributions and covariance

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Actuarial Mathematics
Unit & Topic Study Guides

Joint distributions and covariance describe how multiple random variables interact. They let you model relationships between variables and calculate probabilities for complex events involving multiple outcomes.

For actuarial work, these tools are essential. Risks rarely depend on a single factor, so you need joint distributions to capture how variables move together. Covariance and correlation then quantify those relationships, which matters directly when assessing portfolio risk or pricing products that depend on correlated claims.

Joint probability distributions

A joint probability distribution gives you the probability of two or more random variables taking on specific combinations of values simultaneously. Rather than looking at each variable in isolation, you're capturing their combined behavior.

Joint distributions can be discrete (when the random variables take countable values) or continuous (when they take values over a continuous range).

Discrete joint distributions

When both random variables are discrete, you describe their joint behavior with a joint probability mass function (PMF). This function assigns a probability to each possible pair of values.

  • The joint PMF is written as p(x,y)=P(X=x,Y=y)p(x,y) = P(X = x, Y = y)
  • Every probability must be non-negative: p(x,y)0p(x,y) \geq 0
  • The sum over all possible pairs must equal 1: xyp(x,y)=1\sum_x \sum_y p(x,y) = 1

As an example, consider two policyholders filing claims in a given year. The joint PMF would tell you the probability that Policyholder A files exactly 2 claims and Policyholder B files exactly 1 claim.

Continuous joint distributions

When both random variables are continuous, you use a joint probability density function (PDF) instead. The joint PDF describes the relative likelihood of different combinations of values, but individual point probabilities are zero (just like in the univariate case).

  • The double integral over the entire domain must equal 1
  • You get probabilities by integrating over a region, not by evaluating at a point

For instance, the joint distribution of claim amounts for two types of insurance policies would be modeled with a continuous joint PDF.

Joint probability density functions

The joint PDF for random variables XX and YY is denoted f(x,y)f(x,y) and must satisfy two properties:

  • Non-negativity: f(x,y)0f(x,y) \geq 0 for all xx and yy
  • Total probability equals 1: f(x,y)dydx=1\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,y) \, dy \, dx = 1

To find the probability that (X,Y)(X,Y) falls in some region AA, you integrate the joint PDF over that region:

P((X,Y)A)=Af(x,y)dydxP((X,Y) \in A) = \iint_A f(x,y) \, dy \, dx

Joint cumulative distribution functions

The joint CDF gives the probability that both random variables are simultaneously at or below specified values:

F(x,y)=P(Xx,Yy)F(x,y) = P(X \leq x, Y \leq y)

You can obtain it by integrating the joint PDF:

F(x,y)=xyf(u,v)dvduF(x,y) = \int_{-\infty}^{x} \int_{-\infty}^{y} f(u,v) \, dv \, du

Key properties of the joint CDF:

  • Non-decreasing in both arguments
  • Right-continuous in each variable
  • limx,yF(x,y)=1\lim_{x,y \to \infty} F(x,y) = 1
  • F(x,y)0F(x,y) \to 0 as either xx \to -\infty or yy \to -\infty

Marginal distributions

A marginal distribution extracts the probability distribution of a single variable from a joint distribution. You're effectively "collapsing" the joint distribution along one dimension to focus on one variable at a time.

Marginal probability functions

Discrete case: Sum the joint PMF over all values of the other variable:

pX(x)=yp(x,y)andpY(y)=xp(x,y)p_X(x) = \sum_y p(x,y) \qquad \text{and} \qquad p_Y(y) = \sum_x p(x,y)

Continuous case: Integrate the joint PDF over all values of the other variable:

fX(x)=f(x,y)dyandfY(y)=f(x,y)dxf_X(x) = \int_{-\infty}^{\infty} f(x,y) \, dy \qquad \text{and} \qquad f_Y(y) = \int_{-\infty}^{\infty} f(x,y) \, dx

Once you have the marginal distributions, you can compute expectations, variances, and probabilities for each variable individually.

Sums of random variables

The distribution of a sum Z=X+YZ = X + Y can be derived from the joint distribution.

Discrete case:

pZ(z)=x+y=zp(x,y)p_Z(z) = \sum_{x+y=z} p(x,y)

Continuous case (convolution):

fZ(z)=f(x,zx)dxf_Z(z) = \int_{-\infty}^{\infty} f(x, z-x) \, dx

The expectation and variance of the sum connect back to the individual moments and the covariance:

  • E[X+Y]=E[X]+E[Y]E[X + Y] = E[X] + E[Y] (always true)
  • Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)Var(X + Y) = Var(X) + Var(Y) + 2\,Cov(X,Y)

Conditional distributions

A conditional distribution describes the behavior of one random variable given that you know the value of another. This is where you see how information about one variable changes your beliefs about the other.

Conditional probability functions

Discrete case: The conditional PMF of YY given X=xX = x is:

pYX(yx)=p(x,y)pX(x),provided pX(x)>0p_{Y|X}(y|x) = \frac{p(x,y)}{p_X(x)}, \quad \text{provided } p_X(x) > 0

Continuous case: The conditional PDF of YY given X=xX = x is:

fYX(yx)=f(x,y)fX(x),provided fX(x)>0f_{Y|X}(y|x) = \frac{f(x,y)}{f_X(x)}, \quad \text{provided } f_X(x) > 0

Notice the structure: you take the joint distribution and divide by the marginal of the variable you're conditioning on. This is a direct application of the definition of conditional probability.

Expectations of conditional distributions

The conditional expectation of YY given X=xX = x is computed using the conditional distribution:

  • Discrete: E[YX=x]=yypYX(yx)E[Y|X=x] = \sum_y y \cdot p_{Y|X}(y|x)
  • Continuous: E[YX=x]=yfYX(yx)dyE[Y|X=x] = \int_{-\infty}^{\infty} y \cdot f_{Y|X}(y|x) \, dy

A powerful result here is the law of total expectation (also called the tower property):

E[Y]=E[E[YX]]E[Y] = E[E[Y|X]]

This says you can compute the overall expectation of YY by first finding the conditional expectation E[YX]E[Y|X] (which is a function of XX), and then taking its expectation over the distribution of XX. This technique shows up frequently in actuarial pricing and reserving.

Independent random variables

Two random variables are independent if knowing the value of one gives you no information about the other. Formally, their joint distribution factors into the product of their marginals.

Definition of independence

Discrete: XX and YY are independent if and only if:

p(x,y)=pX(x)pY(y)for all x,yp(x,y) = p_X(x) \cdot p_Y(y) \quad \text{for all } x, y

Continuous: XX and YY are independent if and only if:

f(x,y)=fX(x)fY(y)for all x,yf(x,y) = f_X(x) \cdot f_Y(y) \quad \text{for all } x, y

Equivalently, independence means the conditional distribution equals the marginal:

  • pYX(yx)=pY(y)p_{Y|X}(y|x) = p_Y(y) (discrete)
  • fYX(yx)=fY(y)f_{Y|X}(y|x) = f_Y(y) (continuous)

This factorization must hold for every pair of values, not just some of them.

Properties of independent variables

If XX and YY are independent, the following hold:

  • E[XY]=E[X]E[Y]E[XY] = E[X] \cdot E[Y]
  • Var(X+Y)=Var(X)+Var(Y)Var(X+Y) = Var(X) + Var(Y)
  • Cov(X,Y)=0Cov(X,Y) = 0

A critical warning: the converses are not generally true. Two variables can have zero covariance and still be dependent. Covariance only captures linear association. For example, if XN(0,1)X \sim N(0,1) and Y=X2Y = X^2, then Cov(X,Y)=0Cov(X,Y) = 0 but XX and YY are clearly dependent.

Independence is a common modeling assumption in actuarial work (e.g., assuming claim frequencies for unrelated policyholders are independent), but you should always check whether it's reasonable.

Covariance and correlation

Covariance and correlation measure the linear relationship between two random variables. They tell you whether the variables tend to increase together, move in opposite directions, or show no linear pattern.

Definition of covariance

The covariance between XX and YY is:

Cov(X,Y)=E[(XE[X])(YE[Y])]Cov(X,Y) = E[(X - E[X])(Y - E[Y])]

The computationally easier form is:

Cov(X,Y)=E[XY]E[X]E[Y]Cov(X,Y) = E[XY] - E[X] \cdot E[Y]

Interpreting the sign:

  • Positive covariance: XX and YY tend to be above (or below) their means at the same time
  • Negative covariance: When one is above its mean, the other tends to be below
  • Zero covariance: No linear tendency to move together

Note that the magnitude of covariance depends on the units and scales of XX and YY, which makes it hard to compare across different pairs of variables. That's where correlation comes in.

Discrete joint distributions, probability - Joint distribution of multiple binomial distributions - Mathematics Stack Exchange

Covariance for independent variables

If XX and YY are independent, then Cov(X,Y)=0Cov(X,Y) = 0. This follows directly from the fact that E[XY]=E[X]E[Y]E[XY] = E[X] \cdot E[Y] under independence.

But again: Cov(X,Y)=0Cov(X,Y) = 0 does not imply independence. Zero covariance only rules out a linear relationship; nonlinear dependence can still exist.

Variance of sums of random variables

This formula comes up constantly in portfolio risk calculations:

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)Var(X + Y) = Var(X) + Var(Y) + 2\,Cov(X,Y)

For nn random variables:

Var(i=1nXi)=i=1nVar(Xi)+2i<jCov(Xi,Xj)Var\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n Var(X_i) + 2\sum_{i < j} Cov(X_i, X_j)

When variables are independent, all covariance terms vanish and the variance of the sum equals the sum of the variances. When variables are positively correlated, the total variance is larger than the sum of individual variances, which is exactly why diversification matters in portfolio management.

Correlation coefficient

The correlation coefficient standardizes covariance to a unitless measure:

ρ(X,Y)=Cov(X,Y)σXσY\rho(X,Y) = \frac{Cov(X,Y)}{\sigma_X \cdot \sigma_Y}

Properties:

  • Always bounded: 1ρ(X,Y)1-1 \leq \rho(X,Y) \leq 1
  • ρ=1\rho = 1: perfect positive linear relationship (Y=aX+bY = aX + b with a>0a > 0)
  • ρ=1\rho = -1: perfect negative linear relationship (Y=aX+bY = aX + b with a<0a < 0)
  • ρ=0\rho = 0: no linear relationship (but possibly nonlinear dependence)

Because correlation is dimensionless, you can meaningfully compare the strength of linear association across different pairs of variables.

Correlation vs causation

A high correlation between two variables does not mean one causes the other. Common reasons for misleading correlations include:

  • Confounding variables: A third variable drives both. Ice cream sales and drowning incidents are correlated because both increase in summer, not because one causes the other.
  • Spurious correlations: With enough variables, some will appear correlated purely by chance.
  • Reverse causation: The direction of influence may be opposite to what you assume.

In actuarial modeling, always think carefully about whether an observed correlation reflects a genuine structural relationship or a statistical artifact.

Bivariate normal distribution

The bivariate normal distribution is the most commonly used continuous joint distribution for two variables. It's fully characterized by five parameters: the two means, the two variances (or standard deviations), and the correlation coefficient.

Joint probability density function

The joint PDF for (X,Y)(X, Y) following a bivariate normal distribution is:

f(x,y)=12πσXσY1ρ2exp(12(1ρ2)[(xμX)2σX22ρ(xμX)(yμY)σXσY+(yμY)2σY2])f(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right)

where μX,μY\mu_X, \mu_Y are the means, σX,σY\sigma_X, \sigma_Y are the standard deviations, and ρ\rho is the correlation coefficient.

The density surface is bell-shaped. When ρ=0\rho = 0, the contours of equal density are circles (if σX=σY\sigma_X = \sigma_Y) or axis-aligned ellipses. When ρ0\rho \neq 0, the ellipses are tilted, reflecting the linear association between the variables.

Marginal distributions

The marginal distributions of a bivariate normal are themselves normal:

  • XN(μX,σX2)X \sim N(\mu_X, \sigma_X^2)
  • YN(μY,σY2)Y \sim N(\mu_Y, \sigma_Y^2)

This is a nice property, but be careful with the converse: two individually normal random variables do not necessarily have a joint bivariate normal distribution.

Conditional distributions

The conditional distributions are also normal, which makes the bivariate normal especially tractable:

YX=xN(μY+ρσYσX(xμX),  (1ρ2)σY2)Y | X = x \sim N\left(\mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x - \mu_X),\; (1 - \rho^2)\sigma_Y^2\right)

Two things to notice here:

  • The conditional mean is a linear function of xx. The slope ρσYσX\rho \frac{\sigma_Y}{\sigma_X} is the regression coefficient of YY on XX.
  • The conditional variance (1ρ2)σY2(1 - \rho^2)\sigma_Y^2 does not depend on xx. It's always smaller than the marginal variance σY2\sigma_Y^2 (unless ρ=0\rho = 0), reflecting the fact that knowing XX reduces your uncertainty about YY.

Linear combinations of normal variables

If (X,Y)(X, Y) follows a bivariate normal distribution, then any linear combination Z=aX+bYZ = aX + bY is also normally distributed:

  • E[Z]=aE[X]+bE[Y]E[Z] = a\,E[X] + b\,E[Y]
  • Var(Z)=a2Var(X)+b2Var(Y)+2abCov(X,Y)Var(Z) = a^2\,Var(X) + b^2\,Var(Y) + 2ab\,Cov(X,Y)

This property extends to the multivariate normal and is heavily used in portfolio theory. If individual claim amounts are jointly normal, you can immediately characterize the distribution of the total.

Multivariate distributions

Multivariate distributions generalize joint distributions to three or more random variables. The same concepts (marginals, conditionals, independence, covariance) all extend naturally, though the notation gets heavier.

Joint probability functions

For discrete random variables X1,X2,,XnX_1, X_2, \ldots, X_n:

p(x1,x2,,xn)=P(X1=x1,X2=x2,,Xn=xn)p(x_1, x_2, \ldots, x_n) = P(X_1 = x_1, X_2 = x_2, \ldots, X_n = x_n)

For continuous random variables, the joint PDF f(x1,x2,,xn)f(x_1, x_2, \ldots, x_n) must satisfy non-negativity and integrate to 1 over the entire domain, just as in the bivariate case.

Marginal and conditional distributions

  • Marginals are obtained by summing or integrating out the variables you don't care about. For example, to get the marginal of X1X_1 from a trivariate distribution, integrate the joint PDF over x2x_2 and x3x_3.
  • Conditionals are obtained by dividing the joint by the appropriate marginal, exactly as in the bivariate case.

The law of total expectation and iterated conditioning extend to multiple variables as well.

Moments of multivariate distributions

The expectation of any function g(X1,X2,,Xn)g(X_1, X_2, \ldots, X_n) is computed by weighting gg against the joint distribution:

  • Discrete: E[g(X1,,Xn)]=x1xng(x1,,xn)p(x1,,xn)E[g(X_1, \ldots, X_n)] = \sum_{x_1} \cdots \sum_{x_n} g(x_1, \ldots, x_n) \cdot p(x_1, \ldots, x_n)
  • Continuous: E[g(X1,,Xn)]=g(x1,,xn)f(x1,,xn)dx1dxnE[g(X_1, \ldots, X_n)] = \int \cdots \int g(x_1, \ldots, x_n) \cdot f(x_1, \ldots, x_n) \, dx_1 \cdots dx_n

For nn variables, the covariance structure is captured by the covariance matrix Σ\Sigma, where the (i,j)(i,j) entry is Cov(Xi,Xj)Cov(X_i, X_j). The diagonal entries are the variances. This matrix is symmetric and positive semi-definite.

Transformations of random variables

Transformations create new random variables as functions of existing ones. The goal is to derive the distribution of the transformed variables from the original joint distribution.

Jacobian method

The Jacobian method applies to one-to-one transformations of continuous random variables. Suppose you transform (X,Y)(U,V)(X, Y) \to (U, V) via U=g1(X,Y)U = g_1(X,Y) and V=g2(X,Y)V = g_2(X,Y).

Steps:

  1. Write the inverse transformation: express xx and yy in terms of uu and vv.
  2. Compute the Jacobian determinant of the inverse transformation:

J=xuxvyuyvJ = \begin{vmatrix} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{vmatrix}

  1. The joint PDF of (U,V)(U, V) is:

fU,V(u,v)=fX,Y(x(u,v),  y(u,v))Jf_{U,V}(u,v) = f_{X,Y}(x(u,v),\; y(u,v)) \cdot |J|

  1. Determine the support (range of valid values) for (U,V)(U, V) based on the transformation and the original support of (X,Y)(X, Y).

The absolute value of the Jacobian acts as a scaling factor that accounts for how the transformation stretches or compresses area in the (x,y)(x,y) plane. If the transformation is not one-to-one, you need to partition the domain into regions where it is, apply the method to each region, and sum the results.