upgrade
upgrade

📊AP Statistics

Hypothesis Testing Steps

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Hypothesis testing is the backbone of statistical inference—it's how statisticians move from sample data to conclusions about populations. On the AP Statistics exam, you're not just being tested on whether you can plug numbers into formulas. You're being tested on whether you understand the logic of inference: why we assume the null hypothesis is true, how we measure evidence against it, and what our conclusions actually mean in context. Every free-response question involving inference expects you to demonstrate this reasoning process.

The steps of hypothesis testing connect directly to core concepts like probability, sampling distributions, Type I and Type II errors, and the interpretation of p-values. Understanding each step—and why it matters—will help you tackle everything from one-sample z-tests to chi-square tests for independence. Don't just memorize the sequence; know what each step accomplishes and how errors at any stage can invalidate your conclusions.


Setting Up the Test

Before any calculations happen, you need to establish what you're testing and what would count as convincing evidence. This foundation determines everything that follows—get it wrong, and your entire analysis falls apart.

State the Null and Alternative Hypotheses

  • The null hypothesis (H0H_0) represents the "no effect" or "no difference" claim—it's what you assume is true until evidence suggests otherwise
  • The alternative hypothesis (HaH_a) captures what you're trying to find evidence for—this determines whether you use a one-sided or two-sided test
  • Write hypotheses using population parameters, not sample statistics (use μ\mu, pp, or p1p2p_1 - p_2, never xˉ\bar{x} or p^\hat{p})

Choose the Significance Level (α\alpha)

  • The significance level α\alpha is your threshold for "surprising enough to reject H0H_0"—typically 0.05 unless otherwise specified
  • α\alpha equals the probability of a Type I error—rejecting H0H_0 when it's actually true (a false positive)
  • Choosing α\alpha involves trade-offs: lowering α\alpha reduces Type I error risk but increases Type II error risk and decreases power

Compare: Null hypothesis vs. Alternative hypothesis—both describe population parameters, but H0H_0 assumes no effect while HaH_a specifies the effect you're looking for. On FRQs, always define your parameters in context before writing the hypotheses.


Checking Conditions and Selecting Methods

The validity of your inference depends entirely on whether the conditions for the test are met. This is where many students lose points—skipping conditions or stating them without verification.

Select the Appropriate Test Statistic

  • Match your test to your data type and question: z-tests for proportions, t-tests for means, chi-square for categorical distributions
  • The test statistic measures how far your sample result is from what H0H_0 predicts, standardized by the standard error
  • Common formulas include z=p^p0p0(1p0)nz = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} for proportions and t=xˉμ0s/nt = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} for means

Verify Conditions for Inference

  • Random condition: data must come from a random sample or randomized experiment—this ensures the sample is representative
  • Independence condition (10% rule): sample size must be 10%\leq 10\% of the population when sampling without replacement
  • Large counts or normality condition: for proportions, check np010np_0 \geq 10 and n(1p0)10n(1-p_0) \geq 10; for means, check sample size or population normality

Compare: Conditions for proportions vs. conditions for means—both require randomness and independence, but proportions need the Large Counts condition while means rely on the Central Limit Theorem or population normality. If an FRQ asks you to "state and check conditions," you must do both.


Calculating and Evaluating Evidence

This is where the math happens—but remember, the calculations serve the reasoning. You're quantifying how surprising your sample result would be if H0H_0 were true.

Calculate the Test Statistic

  • Plug your sample data into the appropriate formula to get a standardized value measuring distance from the null
  • Show your work clearly: write the formula, substitute values, and compute—this earns process points on FRQs
  • The test statistic tells you how many standard errors your sample is from the hypothesized parameter

Determine the P-Value

  • The p-value is the probability of getting a result as extreme or more extreme than yours, assuming H0H_0 is true
  • Use the appropriate distribution: standard normal (z) for proportions, t-distribution with df=n1df = n-1 for means, chi-square for categorical tests
  • Direction matters: for one-sided tests, find the tail probability; for two-sided tests, double it (or find both tails)

Compare P-Value to α\alpha

  • If p-value α\leq \alpha, reject H0H_0—the evidence against the null hypothesis is statistically significant
  • If p-value >α> \alpha, fail to reject H0H_0—there's insufficient evidence to support the alternative
  • Never say "accept H0H_0"—you can only fail to reject it; absence of evidence isn't evidence of absence

Compare: P-value approach vs. Critical value approach—both lead to the same decision, but p-values tell you how much evidence you have while critical values just give you a yes/no answer. The AP exam strongly favors the p-value approach.


Drawing Conclusions

The final steps connect your statistical results back to the real-world question. This is where you demonstrate understanding—not just calculation ability.

State Your Decision

  • Use correct language: "reject H0H_0" or "fail to reject H0H_0"—never "accept" either hypothesis
  • Link the decision to your p-value and α\alpha: "Since p-value = 0.023 < α\alpha = 0.05, we reject H0H_0"
  • Acknowledge uncertainty: your decision could be wrong—rejecting H0H_0 risks Type I error; failing to reject risks Type II error

Interpret Results in Context

  • Write a conclusion in plain language that someone without statistics training could understand
  • Reference the alternative hypothesis and the context: "There is convincing evidence that the true proportion of defective items is greater than 0.02"
  • Avoid causal language unless the study was a randomized experiment—observational studies only support association claims

Assess Practical Significance

  • Statistical significance \neq practical significance—a tiny difference can be "significant" with a large enough sample
  • Consider effect sizes and real-world implications: Is the detected difference large enough to matter in practice?
  • Discuss limitations: sample size, scope of inference, potential confounding variables, and generalizability

Compare: Statistical significance vs. Practical significance—a drug that lowers blood pressure by 0.5 mmHg might be statistically significant with n=10,000n = 10,000 but practically meaningless. FRQs increasingly ask you to address whether results matter in context, not just whether they're "significant."


Quick Reference Table

ConceptKey Steps/Elements
Setting up hypothesesState H0H_0 and HaH_a using parameters, define parameters in context
Significance levelChoose α\alpha (usually 0.05), understand as Type I error probability
Conditions for inferenceRandom, Independence (10% rule), Large Counts or Normality
Test statisticz for proportions, t for means, χ2\chi^2 for categorical
P-value interpretationProbability of observed result (or more extreme) if H0H_0 true
Decision ruleReject H0H_0 if p-value α\leq \alpha; fail to reject if p-value >α> \alpha
Type I vs. Type II errorType I: reject true H0H_0; Type II: fail to reject false H0H_0
Conclusion language"Convincing evidence" for reject; "insufficient evidence" for fail to reject

Self-Check Questions

  1. What is the difference between the null hypothesis and the alternative hypothesis, and why do we assume H0H_0 is true when calculating the p-value?

  2. A student writes "We accept H0H_0 because the p-value is 0.12." Identify two errors in this statement and explain how to correct them.

  3. Compare and contrast Type I and Type II errors: Which one does α\alpha control, and how does increasing sample size affect each?

  4. Why must you check conditions before performing a hypothesis test, and what happens to your conclusions if the conditions aren't met?

  5. An FRQ presents a hypothesis test with p-value = 0.001 and asks whether the result is practically significant. What additional information would you need to answer this question, and why might statistical significance not imply practical importance?