upgrade
upgrade

📊Probability and Statistics

Key Concepts in Statistical Hypothesis Tests

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Hypothesis testing is the backbone of statistical inference—it's how we move from "I think there's a pattern here" to "I can demonstrate this effect is real." Every time you're asked to determine whether results are statistically significant, compare group means, or assess whether data fits an expected distribution, you're applying these tests. The AP exam will test your ability to choose the right test for a given scenario, interpret results correctly, and understand the assumptions that make each test valid.

Don't just memorize test names and formulas. You're being tested on when to use each test, what assumptions must hold, and how to interpret the results. Know the difference between parametric and non-parametric approaches, understand why sample size matters, and recognize the relationship between tests that serve similar purposes under different conditions. Master these distinctions, and you'll handle any hypothesis testing question with confidence.


Comparing Means: The Core Parametric Tests

These tests answer the fundamental question: Is the difference between group means large enough to be meaningful, or just random noise? They assume your data follows a normal distribution and use that structure to calculate precise probabilities.

Z-Test

  • Used when population variance is known—compares a sample mean to a population mean using the standard normal distribution
  • Requires large samples (n>30n > 30) or normally distributed populations; the Central Limit Theorem justifies its use with larger samples
  • Test statistic formula: z=xˉμσ/nz = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}, where σ\sigma is the known population standard deviation

T-Test (One-Sample, Two-Sample, Paired)

  • One-sample T-test compares a sample mean to a known value; two-sample compares means of independent groups; paired compares the same subjects at different times
  • Used when population variance is unknown—estimates variance from sample data, making it more practical for real-world applications
  • Degrees of freedom affect the shape of the t-distribution; smaller samples produce wider, flatter curves with more area in the tails

Compare: Z-test vs. T-test—both compare means to detect significant differences, but Z-tests require known population variance while T-tests estimate it from sample data. On the AP exam, if you're given σ\sigma, think Z-test; if you're given ss, think T-test.

ANOVA (Analysis of Variance)

  • Compares means across three or more groups simultaneously—avoids the inflated Type I error rate from running multiple T-tests
  • One-way ANOVA uses one independent variable; two-way ANOVA examines two factors and their interaction effect
  • Assumptions: normality within groups, independence of observations, and homogeneity of variances (equal variances across groups)

Compare: T-test vs. ANOVA—T-tests handle two-group comparisons, while ANOVA extends this to three or more groups. If an FRQ asks you to compare multiple treatment conditions, ANOVA is your answer; follow up with post-hoc tests to identify which specific groups differ.


Comparing Variances and Relationships

Sometimes the question isn't about means—it's about spread or association. These tests examine whether variability differs between groups or whether variables move together in predictable ways.

F-Test

  • Compares variances of two populations to determine if they're significantly different; the test statistic is the ratio F=s12s22F = \frac{s_1^2}{s_2^2}
  • Critical assumption check for ANOVA—used to verify homogeneity of variances before running analysis of variance
  • Always produces positive values since it's a ratio of squared terms; the F-distribution is right-skewed

Regression Analysis

  • Models the relationship between variables—simple regression uses one predictor (y=a+bxy = a + bx), multiple regression uses several (y=a+b1x1+b2x2+...y = a + b_1x_1 + b_2x_2 + ...)
  • Coefficient of determination (R2R^2) indicates how much variance in the dependent variable is explained by the model
  • Residual analysis checks assumptions; look for random scatter in residual plots to confirm the model fits appropriately

Compare: F-test vs. ANOVA—both use the F-distribution, but F-tests compare two variances directly while ANOVA uses F-statistics to compare variation between groups to variation within groups. Understanding this distinction helps you interpret ANOVA output correctly.


Categorical Data Analysis

When your data consists of counts or categories rather than continuous measurements, you need tests designed for frequencies. These tests compare what you observed to what you'd expect under the null hypothesis.

Chi-Square Test

  • Tests association between categorical variables—compares observed frequencies to expected frequencies under independence
  • Test statistic: χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}, where OO is observed count and EE is expected count
  • Minimum expected frequency of 5 in each cell is the standard rule; violations compromise the test's validity

Compare: Chi-square test vs. T-test—Chi-square handles categorical data (counts in categories), while T-tests handle continuous data (measurements). If your data involves frequencies or proportions in a contingency table, Chi-square is your tool.


Non-Parametric Alternatives

When your data violates normality assumptions or uses ordinal scales, non-parametric tests save the day. They work with ranks rather than raw values, making fewer assumptions about the underlying distribution.

Wilcoxon Rank-Sum Test

  • Non-parametric alternative to the two-sample T-test—compares distributions of two independent groups using ranked data
  • Works with ordinal data or non-normal continuous data—ranks observations from both groups combined, then compares rank sums
  • Also called the Mann-Whitney U test—same procedure, different name; recognize both on the exam

Kruskal-Wallis Test

  • Non-parametric alternative to one-way ANOVA—compares distributions across three or more independent groups
  • Uses ranks instead of raw values—tests whether samples come from the same distribution without assuming normality
  • Follow-up tests needed to identify which specific groups differ, similar to post-hoc procedures in ANOVA

Compare: Wilcoxon vs. Kruskal-Wallis—Wilcoxon handles two-group comparisons (like the T-test), while Kruskal-Wallis extends to three or more groups (like ANOVA). Both are rank-based and assumption-free regarding distribution shape.


Testing Assumptions: Normality Checks

Before applying parametric tests, you need to verify their assumptions hold. These tests specifically assess whether your data comes from a normal distribution.

Shapiro-Wilk Test

  • Tests the null hypothesis that data is normally distributed—a significant result (low p-value) indicates non-normality
  • Preferred for small to moderate samples (n<50n < 50)—generally considered more powerful than alternatives for detecting departures from normality
  • Sensitive to sample size—very large samples may show significant non-normality even when the departure is trivially small

Kolmogorov-Smirnov Test

  • Compares sample distribution to a reference distribution—measures the maximum distance between empirical and theoretical cumulative distribution functions
  • More versatile than Shapiro-Wilk—can test against any specified distribution, not just normal; also works for two-sample comparisons
  • Less powerful for normality testing specifically—Shapiro-Wilk is generally preferred when normality is the specific question

Compare: Shapiro-Wilk vs. Kolmogorov-Smirnov—both test normality, but Shapiro-Wilk is more powerful for this specific purpose while K-S is more flexible for testing against other distributions. Choose Shapiro-Wilk for normality checks unless you need to test a different distribution.


Quick Reference Table

ConceptBest Examples
Comparing means (2 groups)Z-test, T-test (two-sample), Paired T-test
Comparing means (3+ groups)ANOVA (one-way, two-way)
Comparing variancesF-test
Modeling relationshipsRegression analysis (simple, multiple)
Categorical dataChi-square test
Non-parametric (2 groups)Wilcoxon rank-sum test
Non-parametric (3+ groups)Kruskal-Wallis test
Testing normalityShapiro-Wilk test, Kolmogorov-Smirnov test

Self-Check Questions

  1. You have three treatment groups and want to compare their means, but a Shapiro-Wilk test shows significant non-normality. Which test should you use instead of ANOVA, and why?

  2. Compare and contrast the Z-test and T-test: What assumption distinguishes when you'd use each, and how does sample size factor into this decision?

  3. A researcher wants to determine if gender and voting preference are independent. Which test is appropriate, and what would the null hypothesis state?

  4. Which two tests serve as non-parametric alternatives to the T-test and ANOVA, respectively? What do they have in common in terms of methodology?

  5. An FRQ presents a before-and-after study design measuring the same subjects twice. Which specific type of T-test applies, and why would a two-sample T-test be inappropriate here?