Fiveable

📊Honors Statistics Unit 4 Review

QR code for Honors Statistics practice questions

4.5 Hypergeometric Distribution (Optional)

4.5 Hypergeometric Distribution (Optional)

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Honors Statistics
Unit & Topic Study Guides
Pep mascot

Hypergeometric Distribution

The hypergeometric distribution models the probability of drawing a specific number of "successes" from a population when you're sampling without replacement. It matters because many real-world sampling situations (quality control inspections, card games, ecological surveys) don't let you put items back after selecting them, so the probability shifts with every draw.

Pep mascot
more resources to help you study

Characteristics of Hypergeometric Experiments

A hypergeometric experiment has a specific structure that distinguishes it from other probability setups:

  • Fixed sample size — You decide in advance how many items you'll draw (nn).
  • Finite population split into two groups — The population of size NN contains KK items of interest ("successes") and NKN - K other items ("failures"). For example, a jar holds 8 red marbles and 12 blue marbles.
  • Sampling without replacement — Once an item is drawn, it's gone. This is the defining feature.

Because items aren't replaced, trials are not independent. Drawing a red marble from that jar means fewer red marbles remain, which changes the probability of drawing another red marble on the next trial. After every draw, both the group sizes and the overall population shrink by one.

This contrasts sharply with binomial experiments, where the probability of success stays constant from trial to trial.

Hypergeometric Distribution Calculations

Characteristics of hypergeometric experiments, 6.2 The Sampling Distribution of the Sample Mean (σ Known) – Significant Statistics

The Formula

P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}

where:

  • NN = total population size
  • KK = number of success items in the population
  • nn = number of items drawn (sample size)
  • kk = number of successes you want in your sample

What the Formula Is Doing

The formula counts favorable outcomes over total outcomes, just like classical probability, but uses combinations to do the counting.

  • Numerator: (Kk)\binom{K}{k} counts the ways to choose kk successes from the KK available, and (NKnk)\binom{N-K}{n-k} counts the ways to choose the remaining nkn - k items from the non-success group. Multiply these together to get the number of favorable samples.
  • Denominator: (Nn)\binom{N}{n} counts the total number of ways to choose any nn items from the population.
Characteristics of hypergeometric experiments, 6.2 The Sampling Distribution of the Sample Mean (σ Known) – Significant Statistics

Worked Example

Suppose a jar contains 8 red marbles and 12 blue marbles (N=20N = 20, K=8K = 8). You draw 5 marbles without replacement (n=5n = 5). What's the probability of getting exactly 2 red marbles?

  1. Identify the values: N=20N = 20, K=8K = 8, n=5n = 5, k=2k = 2.
  2. Count ways to choose 2 red from 8: (82)=28\binom{8}{2} = 28.
  3. Count ways to choose 3 blue from 12: (123)=220\binom{12}{3} = 220.
  4. Count total ways to choose 5 from 20: (205)=15,504\binom{20}{5} = 15{,}504.
  5. Compute: P(X=2)=28×22015,504=6,16015,5040.3974P(X = 2) = \frac{28 \times 220}{15{,}504} = \frac{6{,}160}{15{,}504} \approx 0.3974.

So there's roughly a 39.7% chance of drawing exactly 2 red marbles.

To find the probability of a range of outcomes (say, 2 to 4 red marbles), calculate P(X=2)P(X=2), P(X=3)P(X=3), and P(X=4)P(X=4) separately, then add them.

Hypergeometric vs. Binomial Distributions

These two distributions look similar on the surface, but the sampling method creates a key difference.

FeatureBinomialHypergeometric
SamplingWith replacementWithout replacement
TrialsIndependentDependent
P(success)P(\text{success})Constant across trialsChanges after each draw
PopulationCan be infiniteMust be finite

When does it matter? If the population is large relative to the sample size, removing one item barely changes the probabilities, so the binomial serves as a good approximation. A common rule of thumb: if the sample is less than 5–10% of the population, the binomial approximation works well. But when the sample is a substantial fraction of the population (drawing 5 cards from a 52-card deck, for instance), you need the hypergeometric distribution for accurate results.

Additional Concepts

  • Conditional probability is built into every hypergeometric calculation. Each draw's probability depends on what was drawn before, even though the formula handles this for you through the combinatorics.
  • The hypergeometric distribution is discreteXX can only take whole-number values (you can't draw 2.5 red marbles).
  • The mean of a hypergeometric random variable is μ=nKN\mu = \frac{nK}{N}, which matches the intuitive "expected proportion" of successes in your sample.
  • When a population contains more than two groups, the model extends to the multivariate hypergeometric distribution, though that's beyond the scope of most honors-level courses.