Joint distributions and covariance describe how multiple random variables interact. They let you model relationships between variables and calculate probabilities for complex events involving multiple outcomes.

For actuarial work, these tools are essential. Risks rarely depend on a single factor, so you need joint distributions to capture how variables move together. Covariance and correlation then quantify those relationships, which matters directly when assessing portfolio risk or pricing products that depend on correlated claims.

Joint probability distributions

A joint probability distribution gives you the probability of two or more random variables taking on specific combinations of values simultaneously. Rather than looking at each variable in isolation, you're capturing their combined behavior.

Joint distributions can be discrete (when the random variables take countable values) or continuous (when they take values over a continuous range).

Discrete joint distributions

When both random variables are discrete, you describe their joint behavior with a joint probability mass function (PMF). This function assigns a probability to each possible pair of values.

The joint PMF is written as $p(x,y) = P(X = x, Y = y)$
Every probability must be non-negative: $p(x,y) \geq 0$
The sum over all possible pairs must equal 1: $\sum_x \sum_y p(x,y) = 1$

As an example, consider two policyholders filing claims in a given year. The joint PMF would tell you the probability that Policyholder A files exactly 2 claims and Policyholder B files exactly 1 claim.

Continuous joint distributions

When both random variables are continuous, you use a joint probability density function (PDF) instead. The joint PDF describes the relative likelihood of different combinations of values, but individual point probabilities are zero (just like in the univariate case).

The double integral over the entire domain must equal 1
You get probabilities by integrating over a region, not by evaluating at a point

For instance, the joint distribution of claim amounts for two types of insurance policies would be modeled with a continuous joint PDF.

Joint probability density functions

The joint PDF for random variables $X$ and $Y$ is denoted $f(x,y)$ and must satisfy two properties:

Non-negativity: $f(x,y) \geq 0$ for all $x$ and $y$
Total probability equals 1: $\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,y) \, dy \, dx = 1$

To find the probability that $(X,Y)$ falls in some region $A$ , you integrate the joint PDF over that region:

$P((X,Y) \in A) = \iint_A f(x,y) \, dy \, dx$

Joint cumulative distribution functions

The joint CDF gives the probability that both random variables are simultaneously at or below specified values:

$F(x,y) = P(X \leq x, Y \leq y)$

You can obtain it by integrating the joint PDF:

$F(x,y) = \int_{-\infty}^{x} \int_{-\infty}^{y} f(u,v) \, dv \, du$

Key properties of the joint CDF:

Non-decreasing in both arguments
Right-continuous in each variable
$\lim_{x,y \to \infty} F(x,y) = 1$
$F(x,y) \to 0$ as either $x \to -\infty$ or $y \to -\infty$

Marginal distributions

A marginal distribution extracts the probability distribution of a single variable from a joint distribution. You're effectively "collapsing" the joint distribution along one dimension to focus on one variable at a time.

Marginal probability functions

Discrete case: Sum the joint PMF over all values of the other variable:

$p_X(x) = \sum_y p(x,y) \qquad \text{and} \qquad p_Y(y) = \sum_x p(x,y)$

Continuous case: Integrate the joint PDF over all values of the other variable:

$f_X(x) = \int_{-\infty}^{\infty} f(x,y) \, dy \qquad \text{and} \qquad f_Y(y) = \int_{-\infty}^{\infty} f(x,y) \, dx$

Once you have the marginal distributions, you can compute expectations, variances, and probabilities for each variable individually.

Sums of random variables

The distribution of a sum $Z = X + Y$ can be derived from the joint distribution.

Discrete case:

$p_Z(z) = \sum_{x+y=z} p(x,y)$

Continuous case (convolution):

$f_Z(z) = \int_{-\infty}^{\infty} f(x, z-x) \, dx$

The expectation and variance of the sum connect back to the individual moments and the covariance:

$E[X + Y] = E[X] + E[Y]$ (always true)
$Var(X + Y) = Var(X) + Var(Y) + 2\,Cov(X,Y)$

Conditional distributions

A conditional distribution describes the behavior of one random variable given that you know the value of another. This is where you see how information about one variable changes your beliefs about the other.

Conditional probability functions

Discrete case: The conditional PMF of $Y$ given $X = x$ is:

$p_{Y|X}(y|x) = \frac{p(x,y)}{p_X(x)}, \quad \text{provided } p_X(x) > 0$

Continuous case: The conditional PDF of $Y$ given $X = x$ is:

$f_{Y|X}(y|x) = \frac{f(x,y)}{f_X(x)}, \quad \text{provided } f_X(x) > 0$

Notice the structure: you take the joint distribution and divide by the marginal of the variable you're conditioning on. This is a direct application of the definition of conditional probability.

Expectations of conditional distributions

The conditional expectation of $Y$ given $X = x$ is computed using the conditional distribution:

Discrete: $E[Y|X=x] = \sum_y y \cdot p_{Y|X}(y|x)$
Continuous: $E[Y|X=x] = \int_{-\infty}^{\infty} y \cdot f_{Y|X}(y|x) \, dy$

A powerful result here is the law of total expectation (also called the tower property):

$E[Y] = E[E[Y|X]]$

This says you can compute the overall expectation of $Y$ by first finding the conditional expectation $E[Y|X]$ (which is a function of $X$ ), and then taking its expectation over the distribution of $X$ . This technique shows up frequently in actuarial pricing and reserving.

Independent random variables

Two random variables are independent if knowing the value of one gives you no information about the other. Formally, their joint distribution factors into the product of their marginals.

Definition of independence

Discrete: $X$ and $Y$ are independent if and only if:

$p(x,y) = p_X(x) \cdot p_Y(y) \quad \text{for all } x, y$

Continuous: $X$ and $Y$ are independent if and only if:

$f(x,y) = f_X(x) \cdot f_Y(y) \quad \text{for all } x, y$

Equivalently, independence means the conditional distribution equals the marginal:

$p_{Y|X}(y|x) = p_Y(y)$ (discrete)
$f_{Y|X}(y|x) = f_Y(y)$ (continuous)

This factorization must hold for every pair of values, not just some of them.

Properties of independent variables

If $X$ and $Y$ are independent, the following hold:

$E[XY] = E[X] \cdot E[Y]$
$Var(X+Y) = Var(X) + Var(Y)$
$Cov(X,Y) = 0$

A critical warning: the converses are not generally true. Two variables can have zero covariance and still be dependent. Covariance only captures linear association. For example, if $X \sim N(0,1)$ and $Y = X^2$ , then $Cov(X,Y) = 0$ but $X$ and $Y$ are clearly dependent.

Independence is a common modeling assumption in actuarial work (e.g., assuming claim frequencies for unrelated policyholders are independent), but you should always check whether it's reasonable.

Covariance and correlation

Covariance and correlation measure the linear relationship between two random variables. They tell you whether the variables tend to increase together, move in opposite directions, or show no linear pattern.

Definition of covariance

The covariance between $X$ and $Y$ is:

$Cov(X,Y) = E[(X - E[X])(Y - E[Y])]$

The computationally easier form is:

$Cov(X,Y) = E[XY] - E[X] \cdot E[Y]$

Interpreting the sign:

Positive covariance: $X$ and $Y$ tend to be above (or below) their means at the same time
Negative covariance: When one is above its mean, the other tends to be below
Zero covariance: No linear tendency to move together

Note that the magnitude of covariance depends on the units and scales of $X$ and $Y$ , which makes it hard to compare across different pairs of variables. That's where correlation comes in.

Discrete joint distributions, probability - Joint distribution of multiple binomial distributions - Mathematics Stack Exchange

Covariance for independent variables

If $X$ and $Y$ are independent, then $Cov(X,Y) = 0$ . This follows directly from the fact that $E[XY] = E[X] \cdot E[Y]$ under independence.

But again: $Cov(X,Y) = 0$ does not imply independence. Zero covariance only rules out a linear relationship; nonlinear dependence can still exist.

Variance of sums of random variables

This formula comes up constantly in portfolio risk calculations:

$Var(X + Y) = Var(X) + Var(Y) + 2\,Cov(X,Y)$

For $n$ random variables:

$Var\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n Var(X_i) + 2\sum_{i < j} Cov(X_i, X_j)$

When variables are independent, all covariance terms vanish and the variance of the sum equals the sum of the variances. When variables are positively correlated, the total variance is larger than the sum of individual variances, which is exactly why diversification matters in portfolio management.

Correlation coefficient

The correlation coefficient standardizes covariance to a unitless measure:

$\rho(X,Y) = \frac{Cov(X,Y)}{\sigma_X \cdot \sigma_Y}$

Properties:

Always bounded: $-1 \leq \rho(X,Y) \leq 1$
$\rho = 1$ : perfect positive linear relationship ( $Y = aX + b$ with $a > 0$ )
$\rho = -1$ : perfect negative linear relationship ( $Y = aX + b$ with $a < 0$ )
$\rho = 0$ : no linear relationship (but possibly nonlinear dependence)

Because correlation is dimensionless, you can meaningfully compare the strength of linear association across different pairs of variables.

Correlation vs causation

A high correlation between two variables does not mean one causes the other. Common reasons for misleading correlations include:

Confounding variables: A third variable drives both. Ice cream sales and drowning incidents are correlated because both increase in summer, not because one causes the other.
Spurious correlations: With enough variables, some will appear correlated purely by chance.
Reverse causation: The direction of influence may be opposite to what you assume.

In actuarial modeling, always think carefully about whether an observed correlation reflects a genuine structural relationship or a statistical artifact.

Bivariate normal distribution

The bivariate normal distribution is the most commonly used continuous joint distribution for two variables. It's fully characterized by five parameters: the two means, the two variances (or standard deviations), and the correlation coefficient.

Joint probability density function

The joint PDF for $(X, Y)$ following a bivariate normal distribution is:

$f(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right)$

where $\mu_X, \mu_Y$ are the means, $\sigma_X, \sigma_Y$ are the standard deviations, and $\rho$ is the correlation coefficient.

The density surface is bell-shaped. When $\rho = 0$ , the contours of equal density are circles (if $\sigma_X = \sigma_Y$ ) or axis-aligned ellipses. When $\rho \neq 0$ , the ellipses are tilted, reflecting the linear association between the variables.

Marginal distributions

The marginal distributions of a bivariate normal are themselves normal:

$X \sim N(\mu_X, \sigma_X^2)$
$Y \sim N(\mu_Y, \sigma_Y^2)$

This is a nice property, but be careful with the converse: two individually normal random variables do not necessarily have a joint bivariate normal distribution.

Conditional distributions

The conditional distributions are also normal, which makes the bivariate normal especially tractable:

$Y | X = x \sim N\left(\mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x - \mu_X),\; (1 - \rho^2)\sigma_Y^2\right)$

Two things to notice here:

The conditional mean is a linear function of $x$ . The slope $\rho \frac{\sigma_Y}{\sigma_X}$ is the regression coefficient of $Y$ on $X$ .
The conditional variance $(1 - \rho^2)\sigma_Y^2$ does not depend on $x$ . It's always smaller than the marginal variance $\sigma_Y^2$ (unless $\rho = 0$ ), reflecting the fact that knowing $X$ reduces your uncertainty about $Y$ .

Linear combinations of normal variables

If $(X, Y)$ follows a bivariate normal distribution, then any linear combination $Z = aX + bY$ is also normally distributed:

$E[Z] = a\,E[X] + b\,E[Y]$
$Var(Z) = a^2\,Var(X) + b^2\,Var(Y) + 2ab\,Cov(X,Y)$

This property extends to the multivariate normal and is heavily used in portfolio theory. If individual claim amounts are jointly normal, you can immediately characterize the distribution of the total.

Multivariate distributions

Multivariate distributions generalize joint distributions to three or more random variables. The same concepts (marginals, conditionals, independence, covariance) all extend naturally, though the notation gets heavier.

Joint probability functions

For discrete random variables $X_1, X_2, \ldots, X_n$ :

$p(x_1, x_2, \ldots, x_n) = P(X_1 = x_1, X_2 = x_2, \ldots, X_n = x_n)$

For continuous random variables, the joint PDF $f(x_1, x_2, \ldots, x_n)$ must satisfy non-negativity and integrate to 1 over the entire domain, just as in the bivariate case.

Marginal and conditional distributions

Marginals are obtained by summing or integrating out the variables you don't care about. For example, to get the marginal of $X_1$ from a trivariate distribution, integrate the joint PDF over $x_2$ and $x_3$ .
Conditionals are obtained by dividing the joint by the appropriate marginal, exactly as in the bivariate case.

The law of total expectation and iterated conditioning extend to multiple variables as well.

Moments of multivariate distributions

The expectation of any function $g(X_1, X_2, \ldots, X_n)$ is computed by weighting $g$ against the joint distribution:

Discrete: $E[g(X_1, \ldots, X_n)] = \sum_{x_1} \cdots \sum_{x_n} g(x_1, \ldots, x_n) \cdot p(x_1, \ldots, x_n)$
Continuous: $E[g(X_1, \ldots, X_n)] = \int \cdots \int g(x_1, \ldots, x_n) \cdot f(x_1, \ldots, x_n) \, dx_1 \cdots dx_n$

For $n$ variables, the covariance structure is captured by the covariance matrix $\Sigma$ , where the $(i,j)$ entry is $Cov(X_i, X_j)$ . The diagonal entries are the variances. This matrix is symmetric and positive semi-definite.

Transformations of random variables

Transformations create new random variables as functions of existing ones. The goal is to derive the distribution of the transformed variables from the original joint distribution.

Jacobian method

The Jacobian method applies to one-to-one transformations of continuous random variables. Suppose you transform $(X, Y) \to (U, V)$ via $U = g_1(X,Y)$ and $V = g_2(X,Y)$ .

Steps:

Write the inverse transformation: express $x$ and $y$ in terms of $u$ and $v$ .
Compute the Jacobian determinant of the inverse transformation:

$J = \begin{vmatrix} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{vmatrix}$

The joint PDF of $(U, V)$ is:

$f_{U,V}(u,v) = f_{X,Y}(x(u,v),\; y(u,v)) \cdot |J|$

Determine the support (range of valid values) for $(U, V)$ based on the transformation and the original support of $(X, Y)$ .

The absolute value of the Jacobian acts as a scaling factor that accounts for how the transformation stretches or compresses area in the $(x,y)$ plane. If the transformation is not one-to-one, you need to partition the domain into regions where it is, apply the method to each region, and sum the results.