A joint probability distribution describes the probability of two or more random variables taking on values simultaneously. It gives you the complete picture of how multiple random variables relate to each other.

For discrete random variables, the joint distribution is a probability mass function (PMF) that assigns probabilities to each combination of values.
For continuous random variables, it's a probability density function (PDF) that describes the density over a region of possible value pairs.

Everything in this topic flows from the joint distribution. Marginal and conditional distributions are both derived from it.

Marginal distributions

A marginal distribution is the probability distribution of a single random variable, extracted from a joint distribution by "removing" the other variables. You're collapsing a multi-dimensional distribution down to one dimension.

The key idea: you recover the behavior of one variable alone, ignoring whatever the other variable does.

Marginal PMF

For discrete random variables $X$ and $Y$ , you obtain the marginal PMF of $X$ by summing the joint PMF over all possible values of $Y$ :

$P_X(x) = \sum_y P_{X,Y}(x,y)$

Think of it as adding up an entire row (or column) in a joint probability table. If $X$ and $Y$ represent the outcomes of two dice, the marginal PMF of $X$ gives you the probability of each outcome on the first die, regardless of what the second die shows. Since the dice are fair and independent, each marginal probability is $1/6$ .

Marginal PDF

For continuous random variables $X$ and $Y$ , you integrate out $Y$ instead of summing:

$f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy$

The logic is identical to the discrete case. You're accumulating all the density along the $y$ -axis for each fixed value of $x$ , which leaves you with a function of $x$ alone.

Obtaining marginal distributions

To find a marginal distribution from a joint distribution:

Start with the joint PMF or PDF.
Identify which variable you want to keep and which you want to eliminate.
Sum (discrete) or integrate (continuous) the joint distribution over all values of the variable you're eliminating.

This is useful whenever you need to analyze one variable in isolation, especially when the joint distribution is complex.

Conditional distributions

A conditional distribution describes the probability distribution of one random variable given that another variable has taken a specific value. It answers the question: how does knowing $Y = y$ change what we expect about $X$ ?

Marginal PMF, Random variable - Wikipedia

Conditional PMF

For discrete random variables $X$ and $Y$ , the conditional PMF of $X$ given $Y = y$ is:

$P_{X|Y}(x|y) = \frac{P_{X,Y}(x,y)}{P_Y(y)}, \quad \text{provided } P_Y(y) > 0$

You're taking the joint probability and normalizing by the marginal probability of the observed value of $Y$ . This rescales the probabilities so they sum to 1 over $x$ , given the constraint on $Y$ .

For two dice, the conditional PMF of $X$ given $Y = 6$ is the distribution of the first die given that the second die landed on 6. Since dice rolls are independent, this conditional distribution turns out to be the same as the marginal: uniform over $\{1, 2, 3, 4, 5, 6\}$ . That won't always be the case for dependent variables.

Conditional PDF

For continuous random variables, the same logic applies:

$f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}, \quad \text{provided } f_Y(y) > 0$

Note that $f_Y(y)$ here is the marginal PDF of $Y$ , which you'd compute by integrating the joint PDF over $x$ . The conditional PDF $f_{X|Y}(x|y)$ is a valid density in $x$ (it integrates to 1 over $x$ ) for each fixed $y$ .

Calculating conditional distributions

The steps are straightforward:

Write down the joint PMF/PDF $P_{X,Y}(x,y)$ or $f_{X,Y}(x,y)$ .
Compute the marginal distribution of the conditioning variable ( $P_Y(y)$ or $f_Y(y)$ ).
Divide the joint by the marginal: that ratio is the conditional distribution.

The conditional distribution reveals how the relationship between variables works directionally. If $X$ and $Y$ are dependent, conditioning on $Y$ will shift or reshape the distribution of $X$ .

Independence of random variables

Two random variables are independent if knowing the value of one tells you nothing about the other. Formally, their joint distribution factors into the product of their marginals.

Definition of independence

Discrete case: $X$ and $Y$ are independent if and only if $P_{X,Y}(x,y) = P_X(x) \cdot P_Y(y)$ for all $x, y$ .
Continuous case: $X$ and $Y$ are independent if and only if $f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y)$ for all $x, y$ .

This condition must hold for every pair of values, not just some. A single violation means the variables are dependent.

An equivalent way to check: $X$ and $Y$ are independent if and only if the conditional distribution of $X$ given $Y = y$ equals the marginal distribution of $X$ for all $y$ . That is, conditioning on $Y$ doesn't change $X$ at all.

Two fair dice are independent because the outcome of one roll has no effect on the other. The joint probability of any pair $(x, y)$ is $\frac{1}{6} \cdot \frac{1}{6} = \frac{1}{36}$ , which matches the product of the marginals.

Marginal PMF, Discrete Random Variables (2 of 5) | Concepts in Statistics

Properties of independent variables

Independence gives you powerful computational shortcuts:

Expectation of a product: $E[XY] = E[X] \cdot E[Y]$
Variance of a sum: $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$

Both of these can fail for dependent variables. In particular, the variance formula for dependent variables includes a covariance term: $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y)$ . Independence makes $\text{Cov}(X, Y) = 0$ , which eliminates that term.

Be careful with the converse: $\text{Cov}(X, Y) = 0$ does not guarantee independence. Uncorrelated and independent are different things.

Relationship between joint, marginal, and conditional distributions

Joint, marginal, and conditional distributions are three views of the same underlying probability structure. You can move between them using the product rule (also called the chain rule of probability).

Product rule for discrete variables

$P_{X,Y}(x,y) = P_X(x) \cdot P_{Y|X}(y|x) = P_Y(y) \cdot P_{X|Y}(x|y)$

This says: the joint probability of $(x, y)$ equals the probability of $x$ times the conditional probability of $y$ given $x$ . You can factor it either way.

Suppose $X$ is the type of car (sedan or SUV) and $Y$ is the color (red, blue, green). If you know $P_X(\text{sedan}) = 0.6$ and $P_{Y|X}(\text{red} | \text{sedan}) = 0.3$ , then $P_{X,Y}(\text{sedan}, \text{red}) = 0.6 \times 0.3 = 0.18$ .

Product rule for continuous variables

$f_{X,Y}(x,y) = f_X(x) \cdot f_{Y|X}(y|x) = f_Y(y) \cdot f_{X|Y}(x|y)$

The structure is identical. This factorization is especially useful when you can model one variable's marginal distribution easily and then specify how the second variable depends on the first through a conditional density.

Applications of marginal and conditional distributions

Bayesian inference

Bayes' theorem connects prior beliefs with observed evidence to produce updated (posterior) beliefs. It's built directly from conditional and marginal distributions:

$P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}$

$P(H)$ is the prior: your marginal belief about hypothesis $H$ before seeing evidence.
$P(E|H)$ is the likelihood: the conditional probability of the evidence given the hypothesis.
$P(E)$ is the marginal likelihood (or evidence): often computed by summing $P(E|H) \cdot P(H)$ over all possible hypotheses.
$P(H|E)$ is the posterior: your updated belief after observing the evidence.

In medical diagnosis, for example, if a disease has a prior prevalence of 1% and a test has a 95% detection rate (sensitivity), Bayes' theorem tells you the actual probability of having the disease given a positive test. That posterior probability depends critically on the prior prevalence and the test's false positive rate.

Decision making under uncertainty

When outcomes are uncertain, you can use marginal and conditional distributions to compute expected payoffs. For a business investment:

Specify the marginal distribution of market conditions (e.g., 30% chance of recession, 70% chance of growth).
Specify the conditional distribution of profit given each market condition.
Compute the expected profit by weighting the conditional expected profits by the marginal probabilities.

This framework generalizes to any setting where you need to make choices before uncertainty resolves.