Fiveable

๐Ÿ”€Stochastic Processes Unit 2 Review

QR code for Stochastic Processes practice questions

2.4 Marginal and conditional distributions

2.4 Marginal and conditional distributions

Written by the Fiveable Content Team โ€ข Last updated August 2025
Written by the Fiveable Content Team โ€ข Last updated August 2025
๐Ÿ”€Stochastic Processes
Unit & Topic Study Guides

Joint probability distributions

A joint probability distribution describes the probability of two or more random variables taking on values simultaneously. It gives you the complete picture of how multiple random variables relate to each other.

  • For discrete random variables, the joint distribution is a probability mass function (PMF) that assigns probabilities to each combination of values.
  • For continuous random variables, it's a probability density function (PDF) that describes the density over a region of possible value pairs.

Everything in this topic flows from the joint distribution. Marginal and conditional distributions are both derived from it.

Marginal distributions

A marginal distribution is the probability distribution of a single random variable, extracted from a joint distribution by "removing" the other variables. You're collapsing a multi-dimensional distribution down to one dimension.

The key idea: you recover the behavior of one variable alone, ignoring whatever the other variable does.

Marginal PMF

For discrete random variables XX and YY, you obtain the marginal PMF of XX by summing the joint PMF over all possible values of YY:

PX(x)=โˆ‘yPX,Y(x,y)P_X(x) = \sum_y P_{X,Y}(x,y)

Think of it as adding up an entire row (or column) in a joint probability table. If XX and YY represent the outcomes of two dice, the marginal PMF of XX gives you the probability of each outcome on the first die, regardless of what the second die shows. Since the dice are fair and independent, each marginal probability is 1/61/6.

Marginal PDF

For continuous random variables XX and YY, you integrate out YY instead of summing:

fX(x)=โˆซโˆ’โˆžโˆžfX,Y(x,y)โ€‰dyf_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy

The logic is identical to the discrete case. You're accumulating all the density along the yy-axis for each fixed value of xx, which leaves you with a function of xx alone.

Obtaining marginal distributions

To find a marginal distribution from a joint distribution:

  1. Start with the joint PMF or PDF.
  2. Identify which variable you want to keep and which you want to eliminate.
  3. Sum (discrete) or integrate (continuous) the joint distribution over all values of the variable you're eliminating.

This is useful whenever you need to analyze one variable in isolation, especially when the joint distribution is complex.

Conditional distributions

A conditional distribution describes the probability distribution of one random variable given that another variable has taken a specific value. It answers the question: how does knowing Y=yY = y change what we expect about XX?

Marginal PMF, Random variable - Wikipedia

Conditional PMF

For discrete random variables XX and YY, the conditional PMF of XX given Y=yY = y is:

PXโˆฃY(xโˆฃy)=PX,Y(x,y)PY(y),providedย PY(y)>0P_{X|Y}(x|y) = \frac{P_{X,Y}(x,y)}{P_Y(y)}, \quad \text{provided } P_Y(y) > 0

You're taking the joint probability and normalizing by the marginal probability of the observed value of YY. This rescales the probabilities so they sum to 1 over xx, given the constraint on YY.

For two dice, the conditional PMF of XX given Y=6Y = 6 is the distribution of the first die given that the second die landed on 6. Since dice rolls are independent, this conditional distribution turns out to be the same as the marginal: uniform over {1,2,3,4,5,6}\{1, 2, 3, 4, 5, 6\}. That won't always be the case for dependent variables.

Conditional PDF

For continuous random variables, the same logic applies:

fXโˆฃY(xโˆฃy)=fX,Y(x,y)fY(y),providedย fY(y)>0f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}, \quad \text{provided } f_Y(y) > 0

Note that fY(y)f_Y(y) here is the marginal PDF of YY, which you'd compute by integrating the joint PDF over xx. The conditional PDF fXโˆฃY(xโˆฃy)f_{X|Y}(x|y) is a valid density in xx (it integrates to 1 over xx) for each fixed yy.

Calculating conditional distributions

The steps are straightforward:

  1. Write down the joint PMF/PDF PX,Y(x,y)P_{X,Y}(x,y) or fX,Y(x,y)f_{X,Y}(x,y).
  2. Compute the marginal distribution of the conditioning variable (PY(y)P_Y(y) or fY(y)f_Y(y)).
  3. Divide the joint by the marginal: that ratio is the conditional distribution.

The conditional distribution reveals how the relationship between variables works directionally. If XX and YY are dependent, conditioning on YY will shift or reshape the distribution of XX.

Independence of random variables

Two random variables are independent if knowing the value of one tells you nothing about the other. Formally, their joint distribution factors into the product of their marginals.

Definition of independence

  • Discrete case: XX and YY are independent if and only if PX,Y(x,y)=PX(x)โ‹…PY(y)P_{X,Y}(x,y) = P_X(x) \cdot P_Y(y) for all x,yx, y.
  • Continuous case: XX and YY are independent if and only if fX,Y(x,y)=fX(x)โ‹…fY(y)f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) for all x,yx, y.

This condition must hold for every pair of values, not just some. A single violation means the variables are dependent.

An equivalent way to check: XX and YY are independent if and only if the conditional distribution of XX given Y=yY = y equals the marginal distribution of XX for all yy. That is, conditioning on YY doesn't change XX at all.

Two fair dice are independent because the outcome of one roll has no effect on the other. The joint probability of any pair (x,y)(x, y) is 16โ‹…16=136\frac{1}{6} \cdot \frac{1}{6} = \frac{1}{36}, which matches the product of the marginals.

Marginal PMF, Discrete Random Variables (2 of 5) | Concepts in Statistics

Properties of independent variables

Independence gives you powerful computational shortcuts:

  • Expectation of a product: E[XY]=E[X]โ‹…E[Y]E[XY] = E[X] \cdot E[Y]
  • Variance of a sum: Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)

Both of these can fail for dependent variables. In particular, the variance formula for dependent variables includes a covariance term: Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y). Independence makes Cov(X,Y)=0\text{Cov}(X, Y) = 0, which eliminates that term.

Be careful with the converse: Cov(X,Y)=0\text{Cov}(X, Y) = 0 does not guarantee independence. Uncorrelated and independent are different things.

Relationship between joint, marginal, and conditional distributions

Joint, marginal, and conditional distributions are three views of the same underlying probability structure. You can move between them using the product rule (also called the chain rule of probability).

Product rule for discrete variables

PX,Y(x,y)=PX(x)โ‹…PYโˆฃX(yโˆฃx)=PY(y)โ‹…PXโˆฃY(xโˆฃy)P_{X,Y}(x,y) = P_X(x) \cdot P_{Y|X}(y|x) = P_Y(y) \cdot P_{X|Y}(x|y)

This says: the joint probability of (x,y)(x, y) equals the probability of xx times the conditional probability of yy given xx. You can factor it either way.

Suppose XX is the type of car (sedan or SUV) and YY is the color (red, blue, green). If you know PX(sedan)=0.6P_X(\text{sedan}) = 0.6 and PYโˆฃX(redโˆฃsedan)=0.3P_{Y|X}(\text{red} | \text{sedan}) = 0.3, then PX,Y(sedan,red)=0.6ร—0.3=0.18P_{X,Y}(\text{sedan}, \text{red}) = 0.6 \times 0.3 = 0.18.

Product rule for continuous variables

fX,Y(x,y)=fX(x)โ‹…fYโˆฃX(yโˆฃx)=fY(y)โ‹…fXโˆฃY(xโˆฃy)f_{X,Y}(x,y) = f_X(x) \cdot f_{Y|X}(y|x) = f_Y(y) \cdot f_{X|Y}(x|y)

The structure is identical. This factorization is especially useful when you can model one variable's marginal distribution easily and then specify how the second variable depends on the first through a conditional density.

Applications of marginal and conditional distributions

Bayesian inference

Bayes' theorem connects prior beliefs with observed evidence to produce updated (posterior) beliefs. It's built directly from conditional and marginal distributions:

P(HโˆฃE)=P(EโˆฃH)โ‹…P(H)P(E)P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}

  • P(H)P(H) is the prior: your marginal belief about hypothesis HH before seeing evidence.
  • P(EโˆฃH)P(E|H) is the likelihood: the conditional probability of the evidence given the hypothesis.
  • P(E)P(E) is the marginal likelihood (or evidence): often computed by summing P(EโˆฃH)โ‹…P(H)P(E|H) \cdot P(H) over all possible hypotheses.
  • P(HโˆฃE)P(H|E) is the posterior: your updated belief after observing the evidence.

In medical diagnosis, for example, if a disease has a prior prevalence of 1% and a test has a 95% detection rate (sensitivity), Bayes' theorem tells you the actual probability of having the disease given a positive test. That posterior probability depends critically on the prior prevalence and the test's false positive rate.

Decision making under uncertainty

When outcomes are uncertain, you can use marginal and conditional distributions to compute expected payoffs. For a business investment:

  1. Specify the marginal distribution of market conditions (e.g., 30% chance of recession, 70% chance of growth).
  2. Specify the conditional distribution of profit given each market condition.
  3. Compute the expected profit by weighting the conditional expected profits by the marginal probabilities.

This framework generalizes to any setting where you need to make choices before uncertainty resolves.