๐ŸซIntro to Biostatistics

Common Statistical Tests

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Choosing the right statistical test is one of the most critical decisions in biostatistics, and it's exactly what exams love to test. You're not just being asked to memorize formulas; you're being evaluated on whether you understand when to use each test based on your data type, sample design, and research question. The tests in this guide fall into predictable patterns: comparing means vs. analyzing relationships, parametric vs. non-parametric approaches, and independent vs. paired designs.

Think of statistical tests as tools in a toolkit, each designed for a specific job. A t-test won't help you with categorical data any more than a hammer helps with screws. Master the decision logic behind test selection: What type of variable do I have? How many groups am I comparing? Does my data meet parametric assumptions? Don't just memorize that a Chi-square test exists. Know that it's your go-to when both variables are categorical and you're testing independence.


Comparing Means Between Groups

When your research question asks "Is there a difference between groups?" and your outcome is continuous, you need a test that compares means. The choice depends on how many groups you're comparing and whether your data meets normality assumptions.

t-test (Independent and Paired)

The t-test compares means of exactly two groups to determine if the difference is statistically significant or just due to chance.

  • Independent t-test: groups are separate (e.g., treatment vs. control, two different populations)
  • Paired t-test: measurements are linked (e.g., before vs. after in the same subjects, or matched pairs)
  • Assumes the data are approximately normally distributed and that variances are roughly equal across groups. When these assumptions are violated, switch to a non-parametric alternative like the Mann-Whitney U test.

ANOVA (One-Way and Two-Way)

ANOVA extends mean comparison to three or more groups. Running multiple t-tests instead would inflate your Type I error rate (the chance of a false positive), so ANOVA handles all groups in a single test.

  • One-way ANOVA tests one independent variable (e.g., comparing blood pressure across four drug dosages)
  • Two-way ANOVA examines two factors simultaneously and can detect an interaction effect, which tells you whether the impact of one factor depends on the level of the other
  • The F-statistic tells you whether any group differs from the others, but it doesn't tell you which groups differ. For that, you need post-hoc tests like Tukey's HSD.

F-test

The F-test compares variances between groups rather than means. It determines whether the spread of data differs significantly across populations.

  • It underlies ANOVA calculations and is also used to check the homogeneity of variances assumption required by parametric tests
  • The F-test is sensitive to departures from normality, so violations of that assumption can lead to incorrect conclusions about variance equality

Compare: t-test vs. ANOVA: both compare means of continuous outcomes, but t-tests handle exactly two groups while ANOVA handles three or more. If an exam asks which test to use for comparing blood pressure across four treatment arms, ANOVA is your answer.


Non-Parametric Alternatives

When your data violates normality assumptions, uses ordinal scales, or involves small samples, non-parametric tests provide robust alternatives. These tests work with ranks rather than raw values, making them distribution-free.

Each non-parametric test mirrors a specific parametric test. Learning these pairs is one of the highest-yield things you can do for an exam.

Mann-Whitney U Test

This is the non-parametric alternative to the independent t-test. It compares two unrelated groups when data are ordinal or non-normally distributed.

  • It works by ranking all observations across both groups, then comparing the sum of ranks. The output is a U statistic and a p-value.
  • Ideal for small samples or when you can't justify normality assumptions. Common in pilot studies and early-phase clinical research.

Wilcoxon Signed-Rank Test

This is the non-parametric alternative to the paired t-test. Use it for related samples when the difference scores aren't normally distributed.

  • It ranks the absolute differences between pairs while preserving the sign (positive or negative) of each difference
  • The W statistic reflects whether positive or negative ranks dominate, making it useful for before-after designs with non-normal data

Kruskal-Wallis Test

This is the non-parametric alternative to one-way ANOVA. It compares three or more independent groups without requiring normality.

  • All data points across groups are ranked together, producing an H statistic that assesses whether rank distributions differ
  • Just like ANOVA, a significant result only tells you something differs. You still need follow-up pairwise comparisons to identify which specific groups differ.

Compare: Mann-Whitney U vs. Wilcoxon signed-rank: both are non-parametric and rank-based, but Mann-Whitney handles independent groups while Wilcoxon handles paired/related samples. This mirrors the independent vs. paired t-test distinction. If the question mentions "matched pairs" or "same subjects measured twice" with non-normal data, Wilcoxon is correct.


Analyzing Relationships Between Variables

These tests ask "How are variables related?" rather than "Are groups different?" The choice depends on whether you're measuring association, predicting outcomes, or modeling probabilities.

Correlation Analysis

Correlation quantifies the strength and direction of a linear relationship between two continuous variables. Pearson's rr ranges from โˆ’1-1 (perfect negative) to +1+1 (perfect positive), with 00 meaning no linear relationship.

  • Correlation does not imply causation. This is perhaps the most tested concept in introductory biostatistics.
  • Pearson's rr assumes linearity and homoscedasticity (constant spread of data around the trend line). Always visualize with a scatter plot before interpreting rr, because a strong rr value can be misleading if the relationship is actually curved.
  • For ordinal data or non-linear relationships, Spearman's rank correlation (rsr_s) is the non-parametric alternative.

Linear Regression

Linear regression models a continuous outcome as a function of one or more predictors. It goes beyond correlation by letting you predict values and quantify the effect of each predictor.

  • Regression coefficients tell you the expected change in YY for a one-unit increase in XX
  • R2R^2 measures the proportion of variance in the outcome explained by the model. An R2R^2 of 0.45 means the predictors explain 45% of the variability in YY.
  • Key assumptions: linear relationship, independence of errors (residuals), and homoscedasticity. Check residual plots to verify these before trusting your results.

Logistic Regression

Logistic regression models binary outcomes (yes/no, disease/no disease). It's essential for predicting probabilities in clinical and epidemiological research.

  • Instead of coefficients that predict a continuous value, logistic regression produces odds ratios (OR). An OR > 1 means increased odds of the outcome; an OR < 1 means decreased odds. For example, an OR of 2.3 for a drug exposure means the exposed group has 2.3 times the odds of the outcome compared to the unexposed group.
  • There's no normality assumption for the outcome variable. Model fit is assessed using the likelihood ratio test and pseudo-R2R^2 values rather than a standard R2R^2.

Compare: Linear vs. logistic regression: both model relationships between predictors and outcomes, but linear regression requires a continuous dependent variable while logistic regression handles binary outcomes. If the outcome is "survived vs. died" or "positive vs. negative test," logistic regression is required.


Categorical Data Analysis

When both your variables are categorical (nominal or ordinal), you need tests designed for frequency data rather than means.

Chi-Square Test

The Chi-square test assesses whether there's an association between two categorical variables by comparing observed frequencies to the frequencies you'd expect if the variables were independent.

  • Goodness-of-fit version: tests whether observed proportions match a theoretical distribution (e.g., does the distribution of blood types in your sample match the known population distribution?)
  • Test of independence: examines relationships in a contingency table (e.g., is smoking status associated with disease status?)
  • Produces a ฯ‡2\chi^2 statistic and p-value. One important requirement: expected cell frequencies must be โ‰ฅ 5. If any expected cell count falls below 5, use Fisher's exact test instead.

Compare: Chi-square vs. correlation: both assess relationships, but Chi-square handles categorical-categorical associations while correlation handles continuous-continuous relationships. Exam trap: Don't use correlation for variables like "smoker/non-smoker" and "disease/no disease." That's a Chi-square question.


Quick Reference Table

ScenarioAppropriate Test
Comparing two group means (parametric)Independent t-test, Paired t-test
Comparing three+ group means (parametric)One-way ANOVA, Two-way ANOVA
Non-parametric two-group comparisonMann-Whitney U (independent), Wilcoxon signed-rank (paired)
Non-parametric three+ group comparisonKruskal-Wallis test
Variance comparisonF-test
Continuous outcome predictionLinear regression, Correlation analysis
Binary outcome predictionLogistic regression
Categorical variable associationChi-square test (or Fisher's exact if expected counts < 5)

Self-Check Questions

  1. A researcher wants to compare pain scores (ordinal scale) between three treatment groups with small, non-normally distributed samples. Which test should they use, and why is ANOVA inappropriate?

  2. Compare and contrast the Mann-Whitney U test and the Wilcoxon signed-rank test. What study design features determine which one to use?

  3. You're analyzing whether smoking status (yes/no) is associated with lung cancer diagnosis (yes/no). Which test is appropriate, and what assumption must be checked before proceeding?

  4. A clinical trial measures blood glucose before and after a new medication in the same 50 patients. Data appear normally distributed. Which test should be used? What would change your answer to a non-parametric alternative?

  5. A question presents regression output with an odds ratio of 2.3 for a predictor variable. What type of regression produced this output, and how would you interpret this odds ratio in context?