Fiveable

🎲Intro to Statistics Unit 4 Review

QR code for Intro to Statistics practice questions

4.5 Hypergeometric Distribution

4.5 Hypergeometric Distribution

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎲Intro to Statistics
Unit & Topic Study Guides

Hypergeometric Distribution

The hypergeometric distribution models what happens when you draw items without replacement from a finite population. It's the go-to distribution for scenarios like quality control sampling or lottery calculations, where each selection changes the probability of the next one.

This distribution differs from the binomial in one critical way: because items aren't replaced, the probability of success shifts with every draw. Getting this distinction right matters for accurately calculating probabilities in real sampling situations.

Hypergeometric Distribution

Characteristics of hypergeometric experiments, Sampling (statistics) - Wikipedia

Characteristics of hypergeometric experiments

A hypergeometric experiment starts with a finite population split into exactly two groups: a group of interest ("successes") and everything else ("failures"). Think of a jar with 12 red marbles and 8 blue marbles. You want to know the probability of drawing a certain number of red ones.

Here's what defines the setup:

  • Sampling without replacement: Once you draw an item, it stays out. This means each draw changes the composition of what's left, so your picks are not independent.
  • Fixed sample size: You decide in advance how many items to draw (say, 5 marbles from the jar).
  • Changing probabilities: Because items aren't returned, the chance of drawing a success shifts after every pick. If you pull a red marble first, there are now fewer reds left, so the probability of red on the next draw drops.
  • Counting successes in the sample: The random variable XX represents how many successes end up in your sample (e.g., how many of your 5 drawn marbles are red).
Characteristics of hypergeometric experiments, Producto y Cociente de Variables Independientes Hipergeométrica de Gauss

Hypergeometric distribution calculations

The probability of getting exactly kk successes in your sample is:

P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}

Where:

  • NN = total population size (e.g., 20 marbles in the jar)
  • KK = number of successes in the population (e.g., 12 red marbles)
  • nn = sample size (e.g., 5 marbles drawn)
  • kk = number of successes you want in the sample (e.g., 3 red marbles)

The formula works by counting favorable outcomes over total outcomes. The numerator, (Kk)(NKnk)\binom{K}{k} \binom{N-K}{n-k}, counts the number of ways to choose kk successes from the KK available and nkn-k failures from the NKN-K available. The denominator, (Nn)\binom{N}{n}, counts the total number of ways to choose any nn items from the population.

Steps to calculate:

  1. Identify NN, KK, nn, and kk from the problem.
  2. Plug these values into the formula.
  3. Compute each combination separately, then multiply and divide. For large values, use a calculator or statistical software.

Quick example: A jar has 20 marbles total (12 red, 8 blue). You draw 5. What's the probability of getting exactly 3 red?

P(X=3)=(123)(82)(205)=220×2815504=6160155040.3974P(X = 3) = \frac{\binom{12}{3} \binom{8}{2}}{\binom{20}{5}} = \frac{220 \times 28}{15504} = \frac{6160}{15504} \approx 0.3974

So there's about a 39.7% chance of drawing exactly 3 red marbles.

Hypergeometric vs binomial distributions

These two distributions look similar but apply to different situations. The key difference comes down to replacement.

Hypergeometric: Sampling without replacement from a finite population. Probabilities change with each draw. Draws are not independent.

Binomial: Sampling with replacement (or from a population so large it doesn't matter). Probability stays constant. Trials are independent.

FeatureHypergeometricBinomial
ReplacementWithoutWith (or effectively infinite population)
IndependenceDraws are dependentTrials are independent
Probability of successChanges each drawConstant (pp)
Population sizeFinite and knownInfinite or irrelevant
ExampleDrawing 5 marbles from a jar of 20Flipping a coin 10 times

Practical rule of thumb: If the sample size is small relative to the population (typically less than 5% of the population), the binomial distribution approximates the hypergeometric well, even without replacement. This is because removing a few items barely changes the probabilities. But when your sample is a large fraction of the population, you need the hypergeometric.

Statistical measures and functions

  • Expected value: E(X)=nKNE(X) = \frac{nK}{N}. This gives the average number of successes you'd expect. In the marble example: E(X)=5×1220=3E(X) = \frac{5 \times 12}{20} = 3.
  • Variance: Var(X)=nKNNKNNnN1\text{Var}(X) = n \cdot \frac{K}{N} \cdot \frac{N-K}{N} \cdot \frac{N-n}{N-1}. The last factor, NnN1\frac{N-n}{N-1}, is called the finite population correction factor. It accounts for the fact that sampling without replacement reduces variability compared to sampling with replacement.
  • Standard deviation: σ=Var(X)\sigma = \sqrt{\text{Var}(X)}, which tells you how much the number of successes typically deviates from the expected value.
  • Cumulative distribution function (CDF): P(Xk)P(X \leq k) gives the probability of obtaining at most kk successes. You calculate it by summing P(X=0)+P(X=1)++P(X=k)P(X = 0) + P(X = 1) + \cdots + P(X = k). For anything beyond small numbers, use a calculator or software.