๐Ÿ“ŠAP Statistics

Types of Probability Distributions

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Probability distributions are mathematical models that let statisticians predict outcomes, quantify uncertainty, and make inferences about populations. On the AP Statistics exam, you need to recognize which distribution applies to a given scenario, understand the conditions required for each model, and calculate probabilities using their specific parameters. These distributions connect directly to Units 4, 5, and beyond, from basic probability calculations to sampling distributions to inference procedures like confidence intervals and hypothesis tests.

Don't just memorize formulas and shapes. Know why each distribution exists: What real-world process does it model? What conditions must be met? How does it connect to the Central Limit Theorem or chi-square tests? When you understand the underlying mechanism, you can tackle any FRQ scenario the exam throws at you, whether it's identifying the right distribution, checking conditions, or interpreting results in context.


Discrete Distributions: Counting Successes and Events

These distributions model situations where you're counting discrete outcomes: how many successes occur, how many trials until success, or how many events happen in a given interval. Each has specific conditions that determine when it's the appropriate model.

Bernoulli Distribution

A Bernoulli distribution models a single trial with exactly two outcomes: success (1) or failure (0). It's the simplest probability distribution and the building block for several others.

  • Single parameter pp represents the probability of success; the probability of failure is 1โˆ’p1 - p
  • Mean is ฮผ=p\mu = p and variance is p(1โˆ’p)p(1 - p)
  • A binomial distribution is simply the sum of nn independent Bernoulli trials, so understanding this one first makes the binomial click

Binomial Distribution

The binomial distribution counts the number of successes in a fixed number of independent trials, where each trial has the same probability of success. A helpful mnemonic for checking conditions is BINS: Binary outcomes, Independent trials, Number of trials fixed, Same probability on each trial.

  • Parameters nn and pp fully define the distribution
  • Mean: ฮผ=np\mu = np, Standard deviation: ฯƒ=np(1โˆ’p)\sigma = \sqrt{np(1-p)}
  • The large counts condition (npโ‰ฅ10np \geq 10 and n(1โˆ’p)โ‰ฅ10n(1-p) \geq 10) allows you to approximate the binomial with a normal distribution. This is critical for building confidence intervals for proportions.

For example, if you flip a fair coin 40 times and count heads, that's binomial with n=40n = 40 and p=0.5p = 0.5. The expected number of heads is 40ร—0.5=2040 \times 0.5 = 20, and the standard deviation is 40ร—0.5ร—0.5โ‰ˆ3.16\sqrt{40 \times 0.5 \times 0.5} \approx 3.16.

Poisson Distribution

The Poisson distribution models the count of events occurring in a fixed interval of time or space, when those events happen independently at a constant average rate.

  • Single parameter ฮป\lambda represents both the mean and the variance. This "mean equals variance" property is unique to the Poisson and can help you identify it on the exam.
  • Works well as an approximation to the binomial when nn is large and pp is small (a common rule of thumb: nโ‰ฅ20n \geq 20 and pโ‰ค0.05p \leq 0.05). In that case, set ฮป=np\lambda = np.
  • Classic Poisson scenarios: number of typos per page, number of calls to a hotline per hour, number of accidents at an intersection per month.

Compare: Binomial vs. Poisson: both count discrete events, but binomial requires a fixed number of trials while Poisson models events in continuous time or space with no fixed upper limit on the count. If an FRQ describes "the number of customers arriving per hour," think Poisson. If it's "the number of defective items in a sample of 50," think binomial.

Geometric Distribution

The geometric distribution counts how many trials it takes to get the first success in repeated independent Bernoulli trials. Where the binomial asks "how many successes in nn trials?", the geometric asks "how many trials until the first success?"

  • Single parameter pp (probability of success on each trial); mean number of trials until success is ฮผ=1p\mu = \frac{1}{p}
  • Memoryless property: the probability of success on the next trial doesn't depend on how many failures came before. Each trial is a fresh start. If p=0.1p = 0.1, the chance you succeed on the next trial is always 0.1, whether you've failed 2 times or 200 times.

Compare: Binomial vs. Geometric: binomial fixes the number of trials and counts successes; geometric fixes the number of successes (at one) and counts trials. Both require independent trials with constant pp.


Continuous Distributions: Modeling Measurements

Continuous distributions model variables that can take any value within an interval: time, height, test scores, or any measurement on a continuous scale. A key distinction from discrete distributions: probability is found as area under the density curve, not at individual points. The probability that a continuous variable equals any single exact value is 0.

Uniform Distribution

The uniform distribution applies when all outcomes are equally likely within a defined range from aa to bb.

  • Constant probability density of 1bโˆ’a\frac{1}{b-a} across the entire interval
  • Mean: a+b2\frac{a+b}{2}, Variance: (bโˆ’a)212\frac{(b-a)^2}{12}
  • There's also a discrete uniform version for equally likely categorical outcomes (like rolling a fair die, where each face has probability 16\frac{1}{6})

Finding probabilities with a continuous uniform is straightforward: just calculate the proportion of the interval. If wait times are uniformly distributed between 0 and 10 minutes, the probability of waiting between 3 and 7 minutes is 7โˆ’310โˆ’0=0.4\frac{7-3}{10-0} = 0.4.

Normal Distribution

The normal distribution is the most important distribution in AP Statistics. It's a symmetric, bell-shaped curve defined by two parameters: mean ฮผ\mu (center) and standard deviation ฯƒ\sigma (spread).

  • Empirical Rule (68-95-99.7): approximately 68% of data falls within 1ฯƒ1\sigma of the mean, 95% within 2ฯƒ2\sigma, and 99.7% within 3ฯƒ3\sigma
  • To find probabilities, convert to a z-score: z=xโˆ’ฮผฯƒz = \frac{x - \mu}{\sigma}, then use the standard normal table or calculator
  • Central to inference procedures: the Central Limit Theorem guarantees that sampling distributions of means and proportions approach normality as sample size grows, which is what enables z-based confidence intervals and hypothesis tests

The reason the normal distribution shows up everywhere is the Central Limit Theorem. Even if the underlying population isn't normal, the distribution of sample means will be approximately normal for large enough samples (typically nโ‰ฅ30n \geq 30 as a rough guideline).

Exponential Distribution

The exponential distribution models waiting time between events when those events occur at a constant rate ฮป\lambda.

  • Mean waiting time: 1ฮป\frac{1}{\lambda}, Standard deviation: 1ฮป\frac{1}{\lambda} (mean and SD are equal)
  • Has the memoryless property: the probability of waiting another tt minutes is the same regardless of how long you've already waited. This is the continuous analog of the geometric distribution's memoryless property.
  • Directly connected to Poisson: if events occur according to a Poisson process with rate ฮป\lambda, the time between consecutive events follows an exponential distribution with the same ฮป\lambda

Compare: Poisson vs. Exponential: Poisson counts how many events occur in a fixed time period; exponential measures how long between events. They're two sides of the same process, both using rate parameter ฮป\lambda.


Sampling and Inference Distributions

These distributions arise specifically in statistical inference. They describe how test statistics behave under the null hypothesis or how estimators vary across samples. You won't typically model raw data with these; instead, they tell you what to expect from your calculated statistics.

Student's t-Distribution

The t-distribution looks like the normal distribution but has heavier tails, meaning extreme values are more likely. Those heavier tails account for the extra uncertainty introduced when you estimate the population standard deviation from sample data.

  • Degrees of freedom (df) control the shape. For a one-sample t-test, df=nโˆ’1df = n - 1. As df increases, the t-distribution gets closer and closer to the standard normal.
  • Used when ฯƒ\sigma is unknown, which is almost always the case in practice. This makes it essential for confidence intervals and hypothesis tests about means.
  • With very large samples (say n>100n > 100), the t and z distributions are nearly identical, which is why you sometimes see z-procedures used for large samples even when ฯƒ\sigma is unknown.

Chi-Square Distribution

The chi-square (ฯ‡2\chi^2) distribution models the sum of squared standardized values, so it can only take positive values and is right-skewed.

  • Degrees of freedom determine the shape: small df means heavily skewed; larger df means the distribution becomes more symmetric
  • Powers two major AP Statistics tests:
    • Goodness-of-fit test: does observed data match an expected distribution? (df=numberย ofย categoriesโˆ’1df = \text{number of categories} - 1)
    • Test of independence/homogeneity: are two categorical variables related? (df=(rโˆ’1)(cโˆ’1)df = (r-1)(c-1))
  • The test statistic formula is ฯ‡2=โˆ‘(Oโˆ’E)2E\chi^2 = \sum \frac{(O - E)^2}{E}, where OO is observed count and EE is expected count

Compare: t-distribution vs. Chi-square: both depend on degrees of freedom, but t is symmetric around zero (for testing means) while chi-square is always positive and skewed (for testing categorical relationships). The t approaches normal quickly; chi-square approaches normal only with very large df.

F-Distribution

The F-distribution is the ratio of two independent chi-square distributions, each divided by their degrees of freedom. Like chi-square, it's always positive and right-skewed.

  • Two degrees of freedom parameters (numerator df and denominator df). Order matters: swapping them gives a different distribution.
  • Central to ANOVA (Analysis of Variance), which tests whether the means of three or more groups differ. The F-statistic compares variation between groups to variation within groups.
  • As both df increase, the F-distribution approaches normality.

Quick Reference Table

ConceptBest Examples
Counting successes in fixed trialsBinomial, Bernoulli
Counting events in continuous intervalPoisson
Waiting time/trials until successGeometric, Exponential
Symmetric continuous dataNormal, Uniform
Inference with unknown ฯƒ\sigmaStudent's t
Categorical data analysisChi-Square
Comparing variances/ANOVAF-Distribution
Memoryless propertyGeometric (discrete), Exponential (continuous)

Self-Check Questions

  1. A quality control inspector examines 100 items and records how many are defective. Which distribution models this scenario, and what conditions must be verified?

  2. Compare the geometric and exponential distributions: What do they have in common, and how do their applications differ?

  3. Why does the AP Statistics curriculum emphasize the normal distribution so heavily? Connect your answer to the Central Limit Theorem and inference procedures.

  4. An FRQ presents a chi-square test for independence. Explain why the chi-square distribution (rather than the normal or t) is the appropriate sampling distribution for the test statistic.

  5. Both the Poisson and binomial distributions count discrete events. Under what conditions can you use the Poisson as an approximation for the binomial, and why might you want to?