upgrade
upgrade

📊Honors Statistics

Types of Statistical Tests

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Choosing the right statistical test is one of the most critical skills you'll develop in Honors Statistics—and it's exactly what exam questions target. You're not just being tested on whether you can calculate a test statistic; you're being tested on whether you understand when to use each test and why that test fits the data. The key principles at play include parametric vs. non-parametric assumptions, number of groups being compared, type of variable (categorical vs. continuous), and independence of samples.

Every test in this guide exists to answer a specific type of question: Is there a difference? Is there a relationship? Is there an association? Understanding the underlying logic—what assumptions must hold, what kind of data you need, what your null hypothesis actually claims—will help you navigate both multiple-choice questions and FRQs with confidence. Don't just memorize test names—know what question each test answers and what conditions must be met to use it.


Comparing Means: Parametric Tests

When you need to determine whether group means differ significantly, parametric tests are your go-to tools—provided your data meets assumptions of normality and (often) equal variances. These tests form the foundation of inferential statistics.

t-Test (One-Sample, Two-Sample, Paired)

  • Compares means when population standard deviation is unknown—the one-sample version tests against a known value, two-sample compares independent groups, and paired compares the same subjects at different times
  • Designed for small samples (n<30n < 30) and assumes approximately normal distribution; two-sample version also requires equal variances unless using Welch's correction
  • Produces a t-statistic that measures how many standard errors the sample mean falls from the hypothesized value—larger absolute values indicate stronger evidence against H0H_0

Z-Test

  • Requires known population standard deviation or large samples (n30n \geq 30)—this is the key distinction from the t-test
  • Assumes normal distribution and can be one-sample (comparing to population mean) or two-sample (comparing two group means)
  • Produces a z-score using the standard normal distribution; commonly tested in the context of proportion tests where σ\sigma can be calculated from pp

Compare: t-test vs. Z-test—both compare means, but the t-test is for unknown σ\sigma and small samples while the Z-test requires known σ\sigma or n30n \geq 30. If an FRQ gives you a small sample and sample standard deviation, you need the t-test.

ANOVA (One-Way, Two-Way)

  • Extends mean comparison to three or more groups—avoids inflated Type I error from running multiple t-tests
  • One-way ANOVA tests one factor's effect; two-way ANOVA examines two factors plus their interaction effect
  • Produces an F-statistic (ratio of between-group variance to within-group variance); a significant result means at least one group differs, requiring post-hoc tests to identify which

Comparing Means: Non-Parametric Alternatives

When your data violates normality assumptions or consists of ordinal measurements, non-parametric tests use ranks instead of raw values. They're more flexible but generally less powerful than their parametric counterparts.

Mann-Whitney U Test

  • Non-parametric alternative to the two-sample t-test—compares two independent groups when normality cannot be assumed
  • Works by ranking all observations from both groups combined, then comparing the sum of ranks between groups
  • Produces a U statistic; ideal for ordinal data, skewed distributions, or small samples where t-test assumptions fail

Wilcoxon Signed-Rank Test

  • Non-parametric alternative to the paired t-test—compares two related samples or repeated measurements
  • Ranks the absolute differences between paired observations while preserving the direction (positive or negative) of each difference
  • Produces a W statistic; use when paired differences are non-normal or data is ordinal

Compare: Mann-Whitney U vs. Wilcoxon signed-rank—both are non-parametric, but Mann-Whitney handles independent groups while Wilcoxon handles paired/related samples. Match the test to your study design.

Kruskal-Wallis Test

  • Non-parametric alternative to one-way ANOVA—compares three or more independent groups without assuming normality
  • Ranks all data points across groups and compares rank sums; essentially an extension of Mann-Whitney to multiple groups
  • Produces an H statistic; like ANOVA, a significant result indicates at least one group differs but doesn't specify which

Testing Relationships Between Variables

These tests examine how variables relate to each other—whether one predicts another or whether they move together. The distinction between correlation and regression is heavily tested.

Correlation Analysis

  • Measures strength and direction of linear relationships between two continuous variables using the correlation coefficient rr, which ranges from 1-1 to +1+1
  • Does not imply causation—this is perhaps the most frequently tested concept; correlation only indicates association
  • Assumes linearity and bivariate normality; always check a scatterplot first to verify the relationship is actually linear

Regression Analysis (Simple Linear, Multiple)

  • Models a predictive relationship where one or more independent variables (xx) predict a dependent variable (yy)
  • Simple linear regression uses one predictor (y^=a+bx\hat{y} = a + bx); multiple regression uses two or more predictors
  • Requires four key assumptions: linearity, independence of residuals, homoscedasticity (constant variance), and normality of residuals—memorize these as LINE

Compare: Correlation vs. Regression—correlation measures association strength (symmetric between variables), while regression establishes a predictive equation (asymmetric: xx predicts yy). FRQs often ask you to interpret slope and r2r^2, not just rr.


Testing Categorical Associations

When both variables are categorical, you need tests designed for frequency data rather than means.

Chi-Square Test

  • Tests association between categorical variables using a contingency table that compares observed frequencies to expected frequencies under independence
  • Requires minimum expected cell frequency of 5—this assumption is commonly tested; violating it invalidates results
  • Two main applications: goodness-of-fit (one variable, testing against expected distribution) and test of independence (two variables, testing for association)

Testing Variance Equality

Before running certain parametric tests, you may need to verify that group variances are equal—this is where the F-test comes in.

F-Test

  • Compares variances between groups to determine if they differ significantly—calculated as the ratio of two sample variances
  • Foundational to ANOVA—the F-statistic in ANOVA is essentially testing whether between-group variance exceeds within-group variance
  • Assumes normality and independence; used to validate the equal-variance assumption required by two-sample t-tests and ANOVA

Compare: F-test vs. ANOVA F-statistic—both produce F-values, but the standalone F-test compares variances directly, while ANOVA's F-statistic tests whether means differ by comparing variance components. Know which question each answers.


Quick Reference Table

ConceptBest Examples
Comparing two means (parametric)t-test, Z-test
Comparing three+ means (parametric)One-way ANOVA, Two-way ANOVA
Comparing two means (non-parametric)Mann-Whitney U, Wilcoxon signed-rank
Comparing three+ means (non-parametric)Kruskal-Wallis
Testing relationships (continuous)Correlation, Regression
Testing categorical associationChi-square test
Comparing variancesF-test
Paired/related samplesPaired t-test, Wilcoxon signed-rank

Self-Check Questions

  1. You have two independent groups with small sample sizes and non-normal distributions. Which test should you use, and why can't you use a two-sample t-test?

  2. Compare and contrast correlation analysis and simple linear regression—what question does each answer, and how do their outputs differ?

  3. A researcher wants to compare mean test scores across four teaching methods. Which parametric test is appropriate, and what would happen to Type I error if they ran multiple t-tests instead?

  4. Which two tests serve as non-parametric alternatives to the paired t-test and two-sample t-test, respectively? What do they have in common?

  5. An FRQ presents a contingency table with some expected cell frequencies below 5. Why is this problematic for a chi-square test, and what might you conclude about the analysis?