In AP Statistics, a randomization distribution is a collection of statistics generated by simulation assuming the null hypothesis parameter values are true, created by repeatedly re-randomizing or reassigning response values. It serves as a null distribution for finding p-values without a theoretical model.
A randomization distribution is what you get when you build a null distribution by hand (well, by computer) instead of pulling one off the shelf. You assume the null hypothesis is true, then repeatedly shuffle or reassign the response values, most often reallocating them to treatment groups in a randomized experiment, and record the statistic each time. After thousands of repetitions, the pile of simulated statistics shows you what values are plausible if H₀ is true.
The CED treats this as one of two ways to get a null distribution. You can use a theoretical distribution (z, t, or chi-square) when a probability model is assumed to be true and conditions are met, or you can use a randomization distribution built from simulation. Both answer the same question. The p-value is just the proportion of values in the null distribution as extreme or more extreme than what you actually observed. A randomization distribution counts that proportion directly from the simulated statistics instead of reading it off a curve.
The randomization distribution shows up in the essential knowledge for several inference topics. In Topic 5.3 (AP Stats 5.3.A), it's introduced alongside the Central Limit Theorem as a way to estimate sampling distributions using simulation. In Topic 6.5 (AP Stats 6.5.A), the CED explicitly says the null distribution for a proportion test 'can be either a randomization distribution or, when a probability model is assumed to be true, a theoretical distribution (z).' The same language appears in Topic 8.3 (AP Stats 8.3.A) for the chi-square goodness-of-fit test. The big conceptual payoff is that it makes the p-value definition concrete. Instead of an abstract area under a curve, the p-value becomes a literal count, the fraction of your 10,000 simulated statistics that beat the observed one. Understanding this makes AP Stats 6.5.B-style p-value interpretations much easier to write correctly.
Keep studying AP® Statistics Unit 3
Null distribution (Units 6, 8, 9)
The randomization distribution IS a null distribution, just built by simulation instead of theory. Every test you run in Units 6-9 needs a null distribution, and the CED gives you two ways to get one. Randomization is the empirical route; z, t, and chi-square are the theoretical shortcuts.
Central Limit Theorem (Unit 5)
Topic 5.3 introduces both ideas side by side because they solve the same problem. The CLT tells you the shape of a sampling distribution mathematically when n is large; a randomization distribution shows you that shape by brute-force simulation. When CLT conditions fail, simulation still works.
Chi-square goodness-of-fit test (Unit 8)
This is where the AP exam most often forces the choice. The theoretical chi-square distribution needs all expected counts to be at least 5. When small expected counts break that condition, a randomization distribution is the appropriate null distribution for the χ² statistic.
Interpreting p-values (Unit 6)
Topic 6.5 defines the p-value as the proportion of the null distribution as extreme or more extreme than the observed statistic. With a randomization distribution, that's literally counting simulated dots, which is the most intuitive version of what a p-value means.
Multiple-choice questions tend to test the WHEN, not the how. A classic stem describes a goodness-of-fit test and asks when a randomization distribution would be more appropriate than the theoretical chi-square distribution. The answer hinges on conditions, especially expected counts below 5 where the chi-square approximation isn't trustworthy. Another common angle gives you both a theoretical p-value and a simulated one and asks you to compare them or interpret the simulated p-value as a proportion of the 10,000 trials. No released FRQ has used this term verbatim, but simulation-based reasoning appears in investigative task questions, and you should be ready to read a dotplot of simulated statistics, count the dots as extreme or more extreme than the observed value, and turn that count into a p-value and a conclusion.
Both are null distributions, meaning distributions of the test statistic assuming H₀ is true. A theoretical distribution comes from a probability model and requires conditions like large expected counts or normality. A randomization distribution is generated empirically by simulation, so it works even when those conditions fail. Theory gives you a smooth curve; randomization gives you a pile of simulated statistics that approximates that curve. If conditions are met, they lead to nearly the same p-value.
A randomization distribution is a collection of statistics generated by simulation assuming the null hypothesis parameter values are true, often by re-randomizing response values to treatment groups.
It is one of two valid null distributions in the CED; the other option is a theoretical distribution like z, t, or chi-square when a probability model is assumed to be true.
With a randomization distribution, the p-value is the proportion of simulated statistics as extreme or more extreme than the observed statistic, counted directly from the simulation.
Randomization distributions are especially appropriate when theoretical conditions fail, such as expected counts below 5 in a chi-square goodness-of-fit test.
When conditions for the theoretical distribution are satisfied, the randomization p-value and the theoretical p-value should be very close to each other.
Topic 5.3 introduces randomization distributions alongside the CLT as a simulation-based way to estimate sampling distributions.
It's a collection of statistics generated by simulation assuming the null hypothesis values are true, typically built by repeatedly reassigning response values to treatment groups. It acts as the null distribution, so the p-value is just the proportion of simulated statistics as extreme or more extreme than the observed one.
Not quite. A sampling distribution is the distribution of a statistic across all possible samples from a population, while a randomization distribution is a simulated approximation of the null distribution built by re-randomizing the data you already have under the assumption that H₀ is true.
Use it when the conditions for the theoretical chi-square distribution aren't met, most commonly when one or more expected counts fall below 5. If all expected counts are at least 5, the theoretical chi-square distribution is appropriate and standard.
Count the simulated statistics that are as extreme or more extreme than your observed statistic, then divide by the total number of simulations. For example, if 320 of 10,000 simulated χ² values are at or above your observed χ², the p-value is about 0.032.
Usually only slightly. When conditions for the theoretical distribution hold, the two p-values are very close, with small differences from simulation randomness. They can diverge meaningfully when theoretical conditions are violated, which is exactly when you should trust the randomization approach.
Connect this key term to the AP exam workflow: review the course, practice questions, and check related study tools.
Review units, study guides, and course resources.
Check this vocabulary in multiple-choice context.
Apply key concepts in written AP responses.
Estimate the exam score you are working toward.
Review the highest-yield facts before practice.
Put the full course together before test day.