Joint probability distributions
A joint probability distribution describes the probability of two or more random variables taking on values simultaneously. It gives you the complete picture of how multiple random variables relate to each other.
- For discrete random variables, the joint distribution is a probability mass function (PMF) that assigns probabilities to each combination of values.
- For continuous random variables, it's a probability density function (PDF) that describes the density over a region of possible value pairs.
Everything in this topic flows from the joint distribution. Marginal and conditional distributions are both derived from it.
Marginal distributions
A marginal distribution is the probability distribution of a single random variable, extracted from a joint distribution by "removing" the other variables. You're collapsing a multi-dimensional distribution down to one dimension.
The key idea: you recover the behavior of one variable alone, ignoring whatever the other variable does.
Marginal PMF
For discrete random variables and , you obtain the marginal PMF of by summing the joint PMF over all possible values of :
Think of it as adding up an entire row (or column) in a joint probability table. If and represent the outcomes of two dice, the marginal PMF of gives you the probability of each outcome on the first die, regardless of what the second die shows. Since the dice are fair and independent, each marginal probability is .
Marginal PDF
For continuous random variables and , you integrate out instead of summing:
The logic is identical to the discrete case. You're accumulating all the density along the -axis for each fixed value of , which leaves you with a function of alone.
Obtaining marginal distributions
To find a marginal distribution from a joint distribution:
- Start with the joint PMF or PDF.
- Identify which variable you want to keep and which you want to eliminate.
- Sum (discrete) or integrate (continuous) the joint distribution over all values of the variable you're eliminating.
This is useful whenever you need to analyze one variable in isolation, especially when the joint distribution is complex.
Conditional distributions
A conditional distribution describes the probability distribution of one random variable given that another variable has taken a specific value. It answers the question: how does knowing change what we expect about ?
.svg.png)
Conditional PMF
For discrete random variables and , the conditional PMF of given is:
You're taking the joint probability and normalizing by the marginal probability of the observed value of . This rescales the probabilities so they sum to 1 over , given the constraint on .
For two dice, the conditional PMF of given is the distribution of the first die given that the second die landed on 6. Since dice rolls are independent, this conditional distribution turns out to be the same as the marginal: uniform over . That won't always be the case for dependent variables.
Conditional PDF
For continuous random variables, the same logic applies:
Note that here is the marginal PDF of , which you'd compute by integrating the joint PDF over . The conditional PDF is a valid density in (it integrates to 1 over ) for each fixed .
Calculating conditional distributions
The steps are straightforward:
- Write down the joint PMF/PDF or .
- Compute the marginal distribution of the conditioning variable ( or ).
- Divide the joint by the marginal: that ratio is the conditional distribution.
The conditional distribution reveals how the relationship between variables works directionally. If and are dependent, conditioning on will shift or reshape the distribution of .
Independence of random variables
Two random variables are independent if knowing the value of one tells you nothing about the other. Formally, their joint distribution factors into the product of their marginals.
Definition of independence
- Discrete case: and are independent if and only if for all .
- Continuous case: and are independent if and only if for all .
This condition must hold for every pair of values, not just some. A single violation means the variables are dependent.
An equivalent way to check: and are independent if and only if the conditional distribution of given equals the marginal distribution of for all . That is, conditioning on doesn't change at all.
Two fair dice are independent because the outcome of one roll has no effect on the other. The joint probability of any pair is , which matches the product of the marginals.

Properties of independent variables
Independence gives you powerful computational shortcuts:
- Expectation of a product:
- Variance of a sum:
Both of these can fail for dependent variables. In particular, the variance formula for dependent variables includes a covariance term: . Independence makes , which eliminates that term.
Be careful with the converse: does not guarantee independence. Uncorrelated and independent are different things.
Relationship between joint, marginal, and conditional distributions
Joint, marginal, and conditional distributions are three views of the same underlying probability structure. You can move between them using the product rule (also called the chain rule of probability).
Product rule for discrete variables
This says: the joint probability of equals the probability of times the conditional probability of given . You can factor it either way.
Suppose is the type of car (sedan or SUV) and is the color (red, blue, green). If you know and , then .
Product rule for continuous variables
The structure is identical. This factorization is especially useful when you can model one variable's marginal distribution easily and then specify how the second variable depends on the first through a conditional density.
Applications of marginal and conditional distributions
Bayesian inference
Bayes' theorem connects prior beliefs with observed evidence to produce updated (posterior) beliefs. It's built directly from conditional and marginal distributions:
- is the prior: your marginal belief about hypothesis before seeing evidence.
- is the likelihood: the conditional probability of the evidence given the hypothesis.
- is the marginal likelihood (or evidence): often computed by summing over all possible hypotheses.
- is the posterior: your updated belief after observing the evidence.
In medical diagnosis, for example, if a disease has a prior prevalence of 1% and a test has a 95% detection rate (sensitivity), Bayes' theorem tells you the actual probability of having the disease given a positive test. That posterior probability depends critically on the prior prevalence and the test's false positive rate.
Decision making under uncertainty
When outcomes are uncertain, you can use marginal and conditional distributions to compute expected payoffs. For a business investment:
- Specify the marginal distribution of market conditions (e.g., 30% chance of recession, 70% chance of growth).
- Specify the conditional distribution of profit given each market condition.
- Compute the expected profit by weighting the conditional expected profits by the marginal probabilities.
This framework generalizes to any setting where you need to make choices before uncertainty resolves.