upgrade
upgrade

Key Concepts of Statistical Significance Tests

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Statistical significance tests are the backbone of inferential statistics—they're how you move from "here's what I observed in my sample" to "here's what I can conclude about the population." On the AP exam, you're being tested on your ability to choose the right test for a given scenario, interpret results correctly, and understand the assumptions that must hold for each test to be valid. These aren't just formulas to memorize; they represent different tools for different jobs.

The key to mastering this topic is understanding when and why each test applies. Does your data meet normality assumptions? Are you comparing means or examining relationships? Are your samples independent or paired? These questions determine which test you'll use. Don't just memorize test names—know what conditions each test requires and what type of question it answers.


Comparing Means with Known or Large-Sample Conditions

When you need to determine whether a sample mean differs significantly from a population mean—or whether two means differ from each other—and you have either a large sample or known population variance, these parametric tests are your go-to tools. The central limit theorem ensures that sampling distributions approach normality with sufficient sample size, making these tests reliable.

Z-Test

  • Used when population variance (σ2\sigma^2) is known—this is the key distinguishing feature from T-tests and appears frequently in exam scenarios
  • Requires large samples (n>30n > 30) or a normally distributed population; the test statistic follows a standard normal distribution
  • Common in hypothesis testing for population means—if an FRQ gives you σ\sigma, you're almost certainly using a Z-test

T-Test (One-Sample, Two-Sample, Paired)

  • One-sample T-test compares a sample mean to a known value; two-sample T-test compares means of two independent groups; paired T-test handles matched or repeated measurements
  • Used when population variance is unknown—you estimate it using sample standard deviation ss, which adds uncertainty reflected in the t-distribution's wider tails
  • Appropriate for smaller samples (n<30n < 30) when normality holds; degrees of freedom affect the critical values

Compare: Z-test vs. T-test—both test hypotheses about means, but Z-tests require known σ\sigma while T-tests estimate variance from the sample. If an FRQ doesn't give you the population standard deviation, default to the T-test.


Comparing Means Across Multiple Groups

When you have three or more groups to compare, running multiple T-tests inflates your Type I error rate. ANOVA solves this by testing all groups simultaneously. The F-statistic compares variation between groups to variation within groups—if the between-group variation is large relative to within-group variation, at least one mean differs.

ANOVA (One-Way, Two-Way)

  • One-way ANOVA tests for differences among three or more groups based on a single factor; two-way ANOVA examines two factors and their potential interaction effect
  • Assumes normality, equal variances (homoscedasticity), and independence—check these conditions before applying
  • Determines if at least one group differs but doesn't tell you which one—post-hoc tests (like Tukey's) identify specific differences

F-Test

  • Compares two variances to test if they're significantly different—the ratio of variances follows an F-distribution
  • Foundation of ANOVA and regression significance testing; you're essentially asking whether the model explains more variance than expected by chance
  • Assumes normality and independent samples—sensitive to violations of normality, especially with small samples

Compare: T-test vs. ANOVA—T-tests compare two means, while ANOVA handles three or more groups. Using multiple T-tests instead of ANOVA increases your chance of a Type I error (false positive).


Testing Relationships Between Variables

Not all research questions are about comparing means. Sometimes you need to assess whether variables are associated or whether one variable predicts another. These tests examine relationships rather than differences.

Pearson Correlation Test

  • Measures strength and direction of linear relationships between two continuous variables; produces rr ranging from 1-1 to +1+1
  • Assumes bivariate normality and linearity—always check a scatterplot first; correlation does not imply causation
  • rr near ±1\pm 1 indicates strong association; rr near 00 suggests weak or no linear relationship—but nonlinear patterns may still exist

Regression Analysis

  • Models the relationship between a dependent variable and one or more independent variables; simple regression uses one predictor, multiple regression uses several
  • Enables prediction and quantifies how much the dependent variable changes per unit change in the predictor (slope interpretation)
  • Assumes linearity, independence, homoscedasticity, and normal residuals—remember the acronym LINE for checking conditions

Compare: Correlation vs. Regression—correlation measures association strength, while regression provides a predictive equation. Correlation is symmetric (rxy=ryxr_{xy} = r_{yx}), but regression has a clear dependent/independent variable distinction.


Analyzing Categorical Data

When your data consists of categories rather than continuous measurements, you need tests designed for frequency counts. These tests compare observed frequencies to what you'd expect under the null hypothesis.

Chi-Square Test

  • Tests association between categorical variables (test of independence) or whether observed frequencies match expected proportions (goodness of fit)
  • Compares observed vs. expected frequencies—large discrepancies yield large χ2\chi^2 values and small p-values
  • Requires adequate expected cell counts (typically 5\geq 5)—small expected frequencies make the test unreliable

Compare: Chi-square test of independence vs. goodness of fit—independence tests examine relationships between two categorical variables in a contingency table, while goodness of fit tests whether one variable's distribution matches a hypothesized distribution.


Non-Parametric Alternatives

When your data violates normality assumptions or consists of ordinal (ranked) data, non-parametric tests provide valid alternatives. These tests make fewer assumptions about the underlying distribution and work with ranks rather than raw values.

Mann-Whitney U Test

  • Non-parametric alternative to the two-sample T-test—compares two independent groups without assuming normality
  • Tests whether one group tends to have larger values by analyzing rank sums; works with ordinal data or skewed distributions
  • Useful for small samples or when data clearly violates T-test assumptions—always a safe fallback

Wilcoxon Signed-Rank Test

  • Non-parametric alternative to the paired T-test—compares two related samples or before/after measurements
  • Tests for differences in medians rather than means; uses signed ranks of differences
  • Appropriate for ordinal data or non-normal differences—choose this when paired T-test conditions aren't met

Kruskal-Wallis Test

  • Non-parametric alternative to one-way ANOVA—compares three or more independent groups
  • Tests whether samples come from the same distribution by comparing mean ranks across groups
  • No normality assumption required—use when ANOVA conditions are violated or data is ordinal

Compare: Parametric vs. Non-parametric tests—parametric tests (T-test, ANOVA) are more powerful when assumptions are met, but non-parametric tests (Mann-Whitney, Kruskal-Wallis) are more robust when they're not. Know both versions for the exam.


Quick Reference Table

ConceptBest Examples
Comparing one sample mean to a valueZ-test (known σ\sigma), One-sample T-test (unknown σ\sigma)
Comparing two independent meansTwo-sample T-test, Mann-Whitney U test
Comparing paired/matched dataPaired T-test, Wilcoxon signed-rank test
Comparing three+ group meansOne-way ANOVA, Kruskal-Wallis test
Examining two factors simultaneouslyTwo-way ANOVA
Testing categorical associationsChi-square test
Measuring linear relationshipsPearson correlation, Regression analysis
Comparing variancesF-test

Self-Check Questions

  1. You're comparing test scores between three teaching methods, and your data is normally distributed with equal variances. Which test should you use, and what would you use instead if normality was violated?

  2. A researcher wants to know if a new drug affects blood pressure by measuring the same patients before and after treatment. The difference scores are heavily skewed. Which test is most appropriate—paired T-test or Wilcoxon signed-rank test—and why?

  3. Compare and contrast the Chi-square goodness of fit test and the Chi-square test of independence. What type of research question does each address?

  4. An FRQ presents a scenario where you're testing whether the average height of students at a school differs from the national average of 67 inches. The population standard deviation is unknown, and you have a sample of 25 students. Which test applies, and what conditions must you verify?

  5. What distinguishes correlation analysis from regression analysis? If you found r=0.85r = 0.85 between study hours and exam scores, what can you conclude—and what can you not conclude?