Joint distributions and covariance describe how multiple random variables interact. They let you model relationships between variables and calculate probabilities for complex events involving multiple outcomes.
For actuarial work, these tools are essential. Risks rarely depend on a single factor, so you need joint distributions to capture how variables move together. Covariance and correlation then quantify those relationships, which matters directly when assessing portfolio risk or pricing products that depend on correlated claims.
Joint probability distributions
A joint probability distribution gives you the probability of two or more random variables taking on specific combinations of values simultaneously. Rather than looking at each variable in isolation, you're capturing their combined behavior.
Joint distributions can be discrete (when the random variables take countable values) or continuous (when they take values over a continuous range).
Discrete joint distributions
When both random variables are discrete, you describe their joint behavior with a joint probability mass function (PMF). This function assigns a probability to each possible pair of values.
- The joint PMF is written as
- Every probability must be non-negative:
- The sum over all possible pairs must equal 1:
As an example, consider two policyholders filing claims in a given year. The joint PMF would tell you the probability that Policyholder A files exactly 2 claims and Policyholder B files exactly 1 claim.
Continuous joint distributions
When both random variables are continuous, you use a joint probability density function (PDF) instead. The joint PDF describes the relative likelihood of different combinations of values, but individual point probabilities are zero (just like in the univariate case).
- The double integral over the entire domain must equal 1
- You get probabilities by integrating over a region, not by evaluating at a point
For instance, the joint distribution of claim amounts for two types of insurance policies would be modeled with a continuous joint PDF.
Joint probability density functions
The joint PDF for random variables and is denoted and must satisfy two properties:
- Non-negativity: for all and
- Total probability equals 1:
To find the probability that falls in some region , you integrate the joint PDF over that region:
Joint cumulative distribution functions
The joint CDF gives the probability that both random variables are simultaneously at or below specified values:
You can obtain it by integrating the joint PDF:
Key properties of the joint CDF:
- Non-decreasing in both arguments
- Right-continuous in each variable
- as either or
Marginal distributions
A marginal distribution extracts the probability distribution of a single variable from a joint distribution. You're effectively "collapsing" the joint distribution along one dimension to focus on one variable at a time.
Marginal probability functions
Discrete case: Sum the joint PMF over all values of the other variable:
Continuous case: Integrate the joint PDF over all values of the other variable:
Once you have the marginal distributions, you can compute expectations, variances, and probabilities for each variable individually.
Sums of random variables
The distribution of a sum can be derived from the joint distribution.
Discrete case:
Continuous case (convolution):
The expectation and variance of the sum connect back to the individual moments and the covariance:
- (always true)
Conditional distributions
A conditional distribution describes the behavior of one random variable given that you know the value of another. This is where you see how information about one variable changes your beliefs about the other.
Conditional probability functions
Discrete case: The conditional PMF of given is:
Continuous case: The conditional PDF of given is:
Notice the structure: you take the joint distribution and divide by the marginal of the variable you're conditioning on. This is a direct application of the definition of conditional probability.
Expectations of conditional distributions
The conditional expectation of given is computed using the conditional distribution:
- Discrete:
- Continuous:
A powerful result here is the law of total expectation (also called the tower property):
This says you can compute the overall expectation of by first finding the conditional expectation (which is a function of ), and then taking its expectation over the distribution of . This technique shows up frequently in actuarial pricing and reserving.
Independent random variables
Two random variables are independent if knowing the value of one gives you no information about the other. Formally, their joint distribution factors into the product of their marginals.
Definition of independence
Discrete: and are independent if and only if:
Continuous: and are independent if and only if:
Equivalently, independence means the conditional distribution equals the marginal:
- (discrete)
- (continuous)
This factorization must hold for every pair of values, not just some of them.
Properties of independent variables
If and are independent, the following hold:
A critical warning: the converses are not generally true. Two variables can have zero covariance and still be dependent. Covariance only captures linear association. For example, if and , then but and are clearly dependent.
Independence is a common modeling assumption in actuarial work (e.g., assuming claim frequencies for unrelated policyholders are independent), but you should always check whether it's reasonable.
Covariance and correlation
Covariance and correlation measure the linear relationship between two random variables. They tell you whether the variables tend to increase together, move in opposite directions, or show no linear pattern.
Definition of covariance
The covariance between and is:
The computationally easier form is:
Interpreting the sign:
- Positive covariance: and tend to be above (or below) their means at the same time
- Negative covariance: When one is above its mean, the other tends to be below
- Zero covariance: No linear tendency to move together
Note that the magnitude of covariance depends on the units and scales of and , which makes it hard to compare across different pairs of variables. That's where correlation comes in.

Covariance for independent variables
If and are independent, then . This follows directly from the fact that under independence.
But again: does not imply independence. Zero covariance only rules out a linear relationship; nonlinear dependence can still exist.
Variance of sums of random variables
This formula comes up constantly in portfolio risk calculations:
For random variables:
When variables are independent, all covariance terms vanish and the variance of the sum equals the sum of the variances. When variables are positively correlated, the total variance is larger than the sum of individual variances, which is exactly why diversification matters in portfolio management.
Correlation coefficient
The correlation coefficient standardizes covariance to a unitless measure:
Properties:
- Always bounded:
- : perfect positive linear relationship ( with )
- : perfect negative linear relationship ( with )
- : no linear relationship (but possibly nonlinear dependence)
Because correlation is dimensionless, you can meaningfully compare the strength of linear association across different pairs of variables.
Correlation vs causation
A high correlation between two variables does not mean one causes the other. Common reasons for misleading correlations include:
- Confounding variables: A third variable drives both. Ice cream sales and drowning incidents are correlated because both increase in summer, not because one causes the other.
- Spurious correlations: With enough variables, some will appear correlated purely by chance.
- Reverse causation: The direction of influence may be opposite to what you assume.
In actuarial modeling, always think carefully about whether an observed correlation reflects a genuine structural relationship or a statistical artifact.
Bivariate normal distribution
The bivariate normal distribution is the most commonly used continuous joint distribution for two variables. It's fully characterized by five parameters: the two means, the two variances (or standard deviations), and the correlation coefficient.
Joint probability density function
The joint PDF for following a bivariate normal distribution is:
where are the means, are the standard deviations, and is the correlation coefficient.
The density surface is bell-shaped. When , the contours of equal density are circles (if ) or axis-aligned ellipses. When , the ellipses are tilted, reflecting the linear association between the variables.
Marginal distributions
The marginal distributions of a bivariate normal are themselves normal:
This is a nice property, but be careful with the converse: two individually normal random variables do not necessarily have a joint bivariate normal distribution.
Conditional distributions
The conditional distributions are also normal, which makes the bivariate normal especially tractable:
Two things to notice here:
- The conditional mean is a linear function of . The slope is the regression coefficient of on .
- The conditional variance does not depend on . It's always smaller than the marginal variance (unless ), reflecting the fact that knowing reduces your uncertainty about .
Linear combinations of normal variables
If follows a bivariate normal distribution, then any linear combination is also normally distributed:
This property extends to the multivariate normal and is heavily used in portfolio theory. If individual claim amounts are jointly normal, you can immediately characterize the distribution of the total.
Multivariate distributions
Multivariate distributions generalize joint distributions to three or more random variables. The same concepts (marginals, conditionals, independence, covariance) all extend naturally, though the notation gets heavier.
Joint probability functions
For discrete random variables :
For continuous random variables, the joint PDF must satisfy non-negativity and integrate to 1 over the entire domain, just as in the bivariate case.
Marginal and conditional distributions
- Marginals are obtained by summing or integrating out the variables you don't care about. For example, to get the marginal of from a trivariate distribution, integrate the joint PDF over and .
- Conditionals are obtained by dividing the joint by the appropriate marginal, exactly as in the bivariate case.
The law of total expectation and iterated conditioning extend to multiple variables as well.
Moments of multivariate distributions
The expectation of any function is computed by weighting against the joint distribution:
- Discrete:
- Continuous:
For variables, the covariance structure is captured by the covariance matrix , where the entry is . The diagonal entries are the variances. This matrix is symmetric and positive semi-definite.
Transformations of random variables
Transformations create new random variables as functions of existing ones. The goal is to derive the distribution of the transformed variables from the original joint distribution.
Jacobian method
The Jacobian method applies to one-to-one transformations of continuous random variables. Suppose you transform via and .
Steps:
- Write the inverse transformation: express and in terms of and .
- Compute the Jacobian determinant of the inverse transformation:
- The joint PDF of is:
- Determine the support (range of valid values) for based on the transformation and the original support of .
The absolute value of the Jacobian acts as a scaling factor that accounts for how the transformation stretches or compresses area in the plane. If the transformation is not one-to-one, you need to partition the domain into regions where it is, apply the method to each region, and sum the results.