upgrade
upgrade

🤝Collaborative Data Science

Statistical Tests for Data Analysis

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Statistical tests are the backbone of reproducible data science—they transform raw observations into defensible conclusions. In collaborative environments, your team needs shared language around when to use parametric vs. non-parametric tests, how to compare groups, and what assumptions must hold for results to be valid. Misapplying a test or violating its assumptions can invalidate entire analyses, waste computational resources, and erode trust in your findings.

You're being tested on more than memorizing formulas. Exams and real-world collaborations demand that you choose the right test for the data type and research question, verify assumptions before running analyses, and interpret outputs in context. Don't just memorize that a t-test compares means—know why you'd pick it over a Mann-Whitney U, and what breaks when normality fails.


Comparing Group Means (Parametric)

These tests assume your data follows a normal distribution and compare central tendencies across groups. The underlying principle is that if groups come from populations with identical means, observed differences should fall within predictable sampling variability.

T-Test

  • Compares means between two groups—determines whether observed differences are statistically significant or likely due to chance
  • Three variants serve different designs: independent (separate groups), paired (same subjects measured twice), and one-sample (group vs. known value)
  • Assumes normality and equal variances—violations push you toward non-parametric alternatives like Mann-Whitney U

ANOVA (Analysis of Variance)

  • Extends mean comparison to three or more groups—tests whether at least one group mean differs significantly from the others
  • One-way vs. two-way: one-way handles a single factor, two-way examines two factors plus their interaction
  • Uses the F-statistic internally—partitions total variance into between-group and within-group components

F-Test

  • Compares variances between groups—assesses whether variability differs significantly across populations
  • Foundation of ANOVA calculations—the F-statistic is the ratio of between-group variance to within-group variance
  • Critical for model validation—used in regression to test whether predictors collectively explain significant variance

Compare: T-test vs. ANOVA—both compare means assuming normality, but t-tests handle only two groups while ANOVA scales to three or more. If an FRQ asks you to compare multiple treatment conditions, ANOVA is your tool; reserve t-tests for pairwise follow-ups.


Comparing Groups (Non-Parametric)

When normality assumptions fail or you're working with ordinal data, these rank-based tests provide robust alternatives. They convert raw values to ranks, making them resistant to outliers and skewed distributions.

Mann-Whitney U Test

  • Non-parametric alternative to the independent t-test—compares two groups without assuming normal distributions
  • Ranks all observations together—then tests whether one group's ranks tend to be systematically higher or lower
  • Ideal for ordinal data or small samples—maintains validity when parametric assumptions are violated

Kruskal-Wallis Test

  • Non-parametric alternative to one-way ANOVA—compares three or more independent groups using ranks
  • Tests whether samples share the same distribution—significant results indicate at least one group differs in central tendency
  • No normality requirement—use when your data is ordinal, heavily skewed, or has unequal variances across groups

Compare: Mann-Whitney U vs. Kruskal-Wallis—both are rank-based and assumption-light, but Mann-Whitney handles two groups while Kruskal-Wallis extends to three or more. Think of Kruskal-Wallis as "non-parametric ANOVA."


Modeling Relationships (Regression)

Regression methods quantify how predictor variables relate to outcomes, enabling prediction and inference. The core idea is fitting a mathematical function that minimizes the discrepancy between predicted and observed values.

Linear Regression

  • Models a continuous outcome as a linear function of one predictor—estimates slope (β1\beta_1) and intercept (β0\beta_0) in Y=β0+β1X+ϵY = \beta_0 + \beta_1 X + \epsilon
  • Four key assumptions: linearity, independence of errors, homoscedasticity (constant variance), and normality of residuals
  • Widely used for forecasting—the slope quantifies how much YY changes per unit increase in XX

Multiple Regression

  • Extends linear regression to multiple predictors—models outcome as Y=β0+β1X1+β2X2+...+ϵY = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \epsilon
  • Controls for confounding variables—isolates each predictor's unique contribution while holding others constant
  • Interpretation grows complex—coefficients represent partial effects, and multicollinearity between predictors can destabilize estimates

Logistic Regression

  • Models binary outcomes—predicts the probability of an event (e.g., success/failure, yes/no) rather than a continuous value
  • Outputs odds ratios—a one-unit increase in a predictor multiplies the odds of the outcome by eβe^{\beta}
  • Assumes linearity in the logit—the log-odds of the outcome must be linearly related to predictors

Compare: Linear vs. Logistic Regression—linear regression predicts continuous outcomes; logistic regression predicts probabilities for categorical outcomes. If your dependent variable is binary (pass/fail, churned/retained), logistic is required—linear regression would predict impossible values outside 0-1.


Measuring Association

These methods assess whether and how strongly variables relate to each other, without necessarily implying one causes the other. Association tests help identify patterns worth investigating further.

Correlation Analysis

  • Quantifies strength and direction of linear relationships—Pearson's rr ranges from 1-1 (perfect negative) to +1+1 (perfect positive)
  • Correlation does not imply causation—two variables can move together due to a shared confounder or pure coincidence
  • Essential for exploratory analysis—quickly identifies which variable pairs warrant deeper investigation

Chi-Square Test

  • Tests association between categorical variables—compares observed cell frequencies to expected frequencies under independence
  • Two main applications: test of independence (contingency tables) and goodness-of-fit (observed vs. theoretical distribution)
  • Requires adequate sample size—expected frequencies below 5 in any cell can invalidate results

Compare: Correlation vs. Chi-Square—correlation measures relationships between continuous variables; chi-square tests associations between categorical variables. Choosing between them depends entirely on your data types, not your research question.


Quick Reference Table

ConceptBest Examples
Comparing two group means (parametric)T-test
Comparing 3+ group means (parametric)ANOVA, F-test
Comparing groups (non-parametric)Mann-Whitney U, Kruskal-Wallis
Predicting continuous outcomesLinear regression, Multiple regression
Predicting binary outcomesLogistic regression
Measuring continuous variable associationCorrelation analysis
Testing categorical variable associationChi-square test
Assumption-free alternativesMann-Whitney U, Kruskal-Wallis

Self-Check Questions

  1. You have three treatment groups and your data is heavily right-skewed. Which test should you use instead of ANOVA, and why?

  2. Compare and contrast linear regression and logistic regression—when would using linear regression on a binary outcome produce misleading results?

  3. A colleague runs a chi-square test on two continuous variables. What's wrong with this approach, and which test should they use instead?

  4. Both the t-test and Mann-Whitney U compare two groups. What specific data conditions would make you choose Mann-Whitney U over a t-test?

  5. If an FRQ asks you to "control for confounding variables" when examining the effect of study hours on exam scores, which regression approach allows this, and how does it isolate each predictor's effect?