Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Statistical significance tests are the backbone of inferential statistics—they're how you move from "here's what I observed in my sample" to "here's what I can conclude about the population." On the AP exam, you're being tested on your ability to choose the right test for a given scenario, interpret results correctly, and understand the assumptions that must hold for each test to be valid. These aren't just formulas to memorize; they represent different tools for different jobs.
The key to mastering this topic is understanding when and why each test applies. Does your data meet normality assumptions? Are you comparing means or examining relationships? Are your samples independent or paired? These questions determine which test you'll use. Don't just memorize test names—know what conditions each test requires and what type of question it answers.
When you need to determine whether a sample mean differs significantly from a population mean—or whether two means differ from each other—and you have either a large sample or known population variance, these parametric tests are your go-to tools. The central limit theorem ensures that sampling distributions approach normality with sufficient sample size, making these tests reliable.
Compare: Z-test vs. T-test—both test hypotheses about means, but Z-tests require known while T-tests estimate variance from the sample. If an FRQ doesn't give you the population standard deviation, default to the T-test.
When you have three or more groups to compare, running multiple T-tests inflates your Type I error rate. ANOVA solves this by testing all groups simultaneously. The F-statistic compares variation between groups to variation within groups—if the between-group variation is large relative to within-group variation, at least one mean differs.
Compare: T-test vs. ANOVA—T-tests compare two means, while ANOVA handles three or more groups. Using multiple T-tests instead of ANOVA increases your chance of a Type I error (false positive).
Not all research questions are about comparing means. Sometimes you need to assess whether variables are associated or whether one variable predicts another. These tests examine relationships rather than differences.
Compare: Correlation vs. Regression—correlation measures association strength, while regression provides a predictive equation. Correlation is symmetric (), but regression has a clear dependent/independent variable distinction.
When your data consists of categories rather than continuous measurements, you need tests designed for frequency counts. These tests compare observed frequencies to what you'd expect under the null hypothesis.
Compare: Chi-square test of independence vs. goodness of fit—independence tests examine relationships between two categorical variables in a contingency table, while goodness of fit tests whether one variable's distribution matches a hypothesized distribution.
When your data violates normality assumptions or consists of ordinal (ranked) data, non-parametric tests provide valid alternatives. These tests make fewer assumptions about the underlying distribution and work with ranks rather than raw values.
Compare: Parametric vs. Non-parametric tests—parametric tests (T-test, ANOVA) are more powerful when assumptions are met, but non-parametric tests (Mann-Whitney, Kruskal-Wallis) are more robust when they're not. Know both versions for the exam.
| Concept | Best Examples |
|---|---|
| Comparing one sample mean to a value | Z-test (known ), One-sample T-test (unknown ) |
| Comparing two independent means | Two-sample T-test, Mann-Whitney U test |
| Comparing paired/matched data | Paired T-test, Wilcoxon signed-rank test |
| Comparing three+ group means | One-way ANOVA, Kruskal-Wallis test |
| Examining two factors simultaneously | Two-way ANOVA |
| Testing categorical associations | Chi-square test |
| Measuring linear relationships | Pearson correlation, Regression analysis |
| Comparing variances | F-test |
You're comparing test scores between three teaching methods, and your data is normally distributed with equal variances. Which test should you use, and what would you use instead if normality was violated?
A researcher wants to know if a new drug affects blood pressure by measuring the same patients before and after treatment. The difference scores are heavily skewed. Which test is most appropriate—paired T-test or Wilcoxon signed-rank test—and why?
Compare and contrast the Chi-square goodness of fit test and the Chi-square test of independence. What type of research question does each address?
An FRQ presents a scenario where you're testing whether the average height of students at a school differs from the national average of 67 inches. The population standard deviation is unknown, and you have a sample of 25 students. Which test applies, and what conditions must you verify?
What distinguishes correlation analysis from regression analysis? If you found between study hours and exam scores, what can you conclude—and what can you not conclude?