Hypergeometric Distribution
The hypergeometric distribution models what happens when you draw items without replacement from a finite population. It's the go-to distribution for scenarios like quality control sampling or lottery calculations, where each selection changes the probability of the next one.
This distribution differs from the binomial in one critical way: because items aren't replaced, the probability of success shifts with every draw. Getting this distinction right matters for accurately calculating probabilities in real sampling situations.
Hypergeometric Distribution

Characteristics of hypergeometric experiments
A hypergeometric experiment starts with a finite population split into exactly two groups: a group of interest ("successes") and everything else ("failures"). Think of a jar with 12 red marbles and 8 blue marbles. You want to know the probability of drawing a certain number of red ones.
Here's what defines the setup:
- Sampling without replacement: Once you draw an item, it stays out. This means each draw changes the composition of what's left, so your picks are not independent.
- Fixed sample size: You decide in advance how many items to draw (say, 5 marbles from the jar).
- Changing probabilities: Because items aren't returned, the chance of drawing a success shifts after every pick. If you pull a red marble first, there are now fewer reds left, so the probability of red on the next draw drops.
- Counting successes in the sample: The random variable represents how many successes end up in your sample (e.g., how many of your 5 drawn marbles are red).

Hypergeometric distribution calculations
The probability of getting exactly successes in your sample is:
Where:
- = total population size (e.g., 20 marbles in the jar)
- = number of successes in the population (e.g., 12 red marbles)
- = sample size (e.g., 5 marbles drawn)
- = number of successes you want in the sample (e.g., 3 red marbles)
The formula works by counting favorable outcomes over total outcomes. The numerator, , counts the number of ways to choose successes from the available and failures from the available. The denominator, , counts the total number of ways to choose any items from the population.
Steps to calculate:
- Identify , , , and from the problem.
- Plug these values into the formula.
- Compute each combination separately, then multiply and divide. For large values, use a calculator or statistical software.
Quick example: A jar has 20 marbles total (12 red, 8 blue). You draw 5. What's the probability of getting exactly 3 red?
So there's about a 39.7% chance of drawing exactly 3 red marbles.
Hypergeometric vs binomial distributions
These two distributions look similar but apply to different situations. The key difference comes down to replacement.
Hypergeometric: Sampling without replacement from a finite population. Probabilities change with each draw. Draws are not independent.
Binomial: Sampling with replacement (or from a population so large it doesn't matter). Probability stays constant. Trials are independent.
| Feature | Hypergeometric | Binomial |
|---|---|---|
| Replacement | Without | With (or effectively infinite population) |
| Independence | Draws are dependent | Trials are independent |
| Probability of success | Changes each draw | Constant () |
| Population size | Finite and known | Infinite or irrelevant |
| Example | Drawing 5 marbles from a jar of 20 | Flipping a coin 10 times |
Practical rule of thumb: If the sample size is small relative to the population (typically less than 5% of the population), the binomial distribution approximates the hypergeometric well, even without replacement. This is because removing a few items barely changes the probabilities. But when your sample is a large fraction of the population, you need the hypergeometric.
Statistical measures and functions
- Expected value: . This gives the average number of successes you'd expect. In the marble example: .
- Variance: . The last factor, , is called the finite population correction factor. It accounts for the fact that sampling without replacement reduces variability compared to sampling with replacement.
- Standard deviation: , which tells you how much the number of successes typically deviates from the expected value.
- Cumulative distribution function (CDF): gives the probability of obtaining at most successes. You calculate it by summing . For anything beyond small numbers, use a calculator or software.