📊AP Statistics

Hypothesis Testing Steps

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Hypothesis testing is the backbone of statistical inference—it's how statisticians move from sample data to conclusions about populations. On the AP Statistics exam, you're not just being tested on whether you can plug numbers into formulas. You're being tested on whether you understand the logic of inference: why we assume the null hypothesis is true, how we measure evidence against it, and what our conclusions actually mean in context. Every free-response question involving inference expects you to demonstrate this reasoning process.

The steps of hypothesis testing connect directly to core concepts like probability, sampling distributions, Type I and Type II errors, and the interpretation of p-values. Understanding each step—and why it matters—will help you tackle everything from one-sample z-tests to chi-square tests for independence. Don't just memorize the sequence; know what each step accomplishes and how errors at any stage can invalidate your conclusions.

Setting Up the Test

Before any calculations happen, you need to establish what you're testing and what would count as convincing evidence. This foundation determines everything that follows—get it wrong, and your entire analysis falls apart.

State the Null and Alternative Hypotheses

The null hypothesis ( $H_0$ ) represents the "no effect" or "no difference" claim—it's what you assume is true until evidence suggests otherwise
The alternative hypothesis ( $H_a$ ) captures what you're trying to find evidence for—this determines whether you use a one-sided or two-sided test
Write hypotheses using population parameters, not sample statistics (use $\mu$ , $p$ , or $p_1 - p_2$ , never $\bar{x}$ or $\hat{p}$ )

Choose the Significance Level ( $\alpha$ )

The significance level $\alpha$ is your threshold for "surprising enough to reject $H_0$ "—typically 0.05 unless otherwise specified
$\alpha$ equals the probability of a Type I error—rejecting $H_0$ when it's actually true (a false positive)
Choosing $\alpha$ involves trade-offs: lowering $\alpha$ reduces Type I error risk but increases Type II error risk and decreases power

Compare: Null hypothesis vs. Alternative hypothesis—both describe population parameters, but $H_0$ assumes no effect while $H_a$ specifies the effect you're looking for. On FRQs, always define your parameters in context before writing the hypotheses.

Checking Conditions and Selecting Methods

The validity of your inference depends entirely on whether the conditions for the test are met. This is where many students lose points—skipping conditions or stating them without verification.

Select the Appropriate Test Statistic

Match your test to your data type and question: z-tests for proportions, t-tests for means, chi-square for categorical distributions
The test statistic measures how far your sample result is from what $H_0$ predicts, standardized by the standard error
Common formulas include $z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$ for proportions and $t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$ for means

Verify Conditions for Inference

Random condition: data must come from a random sample or randomized experiment—this ensures the sample is representative
Independence condition (10% rule): sample size must be $\leq 10\%$ of the population when sampling without replacement
Large counts or normality condition: for proportions, check $np_0 \geq 10$ and $n(1-p_0) \geq 10$ ; for means, check sample size or population normality

Compare: Conditions for proportions vs. conditions for means—both require randomness and independence, but proportions need the Large Counts condition while means rely on the Central Limit Theorem or population normality. If an FRQ asks you to "state and check conditions," you must do both.

Calculating and Evaluating Evidence

This is where the math happens—but remember, the calculations serve the reasoning. You're quantifying how surprising your sample result would be if $H_0$ were true.

Calculate the Test Statistic

Plug your sample data into the appropriate formula to get a standardized value measuring distance from the null
Show your work clearly: write the formula, substitute values, and compute—this earns process points on FRQs
The test statistic tells you how many standard errors your sample is from the hypothesized parameter

Determine the P-Value

The p-value is the probability of getting a result as extreme or more extreme than yours, assuming $H_0$ is true
Use the appropriate distribution: standard normal (z) for proportions, t-distribution with $df = n-1$ for means, chi-square for categorical tests
Direction matters: for one-sided tests, find the tail probability; for two-sided tests, double it (or find both tails)

Compare P-Value to $\alpha$

If p-value $\leq \alpha$ , reject $H_0$ —the evidence against the null hypothesis is statistically significant
If p-value $> \alpha$ , fail to reject $H_0$ —there's insufficient evidence to support the alternative
Never say "accept $H_0$ "—you can only fail to reject it; absence of evidence isn't evidence of absence

Compare: P-value approach vs. Critical value approach—both lead to the same decision, but p-values tell you how much evidence you have while critical values just give you a yes/no answer. The AP exam strongly favors the p-value approach.

Drawing Conclusions

The final steps connect your statistical results back to the real-world question. This is where you demonstrate understanding—not just calculation ability.

State Your Decision

Use correct language: "reject $H_0$ " or "fail to reject $H_0$ "—never "accept" either hypothesis
Link the decision to your p-value and $\alpha$ : "Since p-value = 0.023 < $\alpha$ = 0.05, we reject $H_0$ "
Acknowledge uncertainty: your decision could be wrong—rejecting $H_0$ risks Type I error; failing to reject risks Type II error

Interpret Results in Context

Write a conclusion in plain language that someone without statistics training could understand
Reference the alternative hypothesis and the context: "There is convincing evidence that the true proportion of defective items is greater than 0.02"
Avoid causal language unless the study was a randomized experiment—observational studies only support association claims

Assess Practical Significance

Statistical significance $\neq$ practical significance—a tiny difference can be "significant" with a large enough sample
Consider effect sizes and real-world implications: Is the detected difference large enough to matter in practice?
Discuss limitations: sample size, scope of inference, potential confounding variables, and generalizability

Compare: Statistical significance vs. Practical significance—a drug that lowers blood pressure by 0.5 mmHg might be statistically significant with $n = 10,000$ but practically meaningless. FRQs increasingly ask you to address whether results matter in context, not just whether they're "significant."

Quick Reference Table

Concept	Key Steps/Elements
Setting up hypotheses	State $H_0$ and $H_a$ using parameters, define parameters in context
Significance level	Choose $\alpha$ (usually 0.05), understand as Type I error probability
Conditions for inference	Random, Independence (10% rule), Large Counts or Normality
Test statistic	z for proportions, t for means, $\chi^2$ for categorical
P-value interpretation	Probability of observed result (or more extreme) if $H_0$ true
Decision rule	Reject $H_0$ if p-value $\leq \alpha$ ; fail to reject if p-value $> \alpha$
Type I vs. Type II error	Type I: reject true $H_0$ ; Type II: fail to reject false $H_0$
Conclusion language	"Convincing evidence" for reject; "insufficient evidence" for fail to reject

Self-Check Questions

What is the difference between the null hypothesis and the alternative hypothesis, and why do we assume $H_0$ is true when calculating the p-value?
A student writes "We accept $H_0$ because the p-value is 0.12." Identify two errors in this statement and explain how to correct them.
Compare and contrast Type I and Type II errors: Which one does $\alpha$ control, and how does increasing sample size affect each?
Why must you check conditions before performing a hypothesis test, and what happens to your conclusions if the conditions aren't met?
An FRQ presents a hypothesis test with p-value = 0.001 and asks whether the result is practically significant. What additional information would you need to answer this question, and why might statistical significance not imply practical importance?

📊AP Statistics

Hypothesis Testing Steps

Why This Matters

Setting Up the Test

State the Null and Alternative Hypotheses

Choose the Significance Level (α\alphaα)

Checking Conditions and Selecting Methods

Select the Appropriate Test Statistic

Verify Conditions for Inference

Calculating and Evaluating Evidence

Calculate the Test Statistic

Determine the P-Value

Compare P-Value to α\alphaα

Drawing Conclusions

State Your Decision

Interpret Results in Context

Assess Practical Significance

Quick Reference Table

Self-Check Questions

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes

Choose the Significance Level ( $\alpha$ )

Compare P-Value to $\alpha$