๐Ÿ“ŠAP Statistics

Hypothesis Testing Steps

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Hypothesis testing is how statisticians move from sample data to conclusions about populations. On the AP Statistics exam, you're not just plugging numbers into formulas. You're being tested on the logic of inference: why we assume the null hypothesis is true, how we measure evidence against it, and what our conclusions actually mean in context. Every free-response question involving inference expects you to show this reasoning process.

The steps of hypothesis testing connect directly to core concepts like probability, sampling distributions, Type I and Type II errors, and the interpretation of p-values. Understanding each step will help you tackle everything from one-sample z-tests to chi-square tests for independence. Don't just memorize the sequence; know what each step accomplishes and how errors at any stage can invalidate your conclusions.


Setting Up the Test

Before any calculations happen, you need to establish what you're testing and what would count as convincing evidence. This foundation determines everything that follows.

State the Null and Alternative Hypotheses

  • The null hypothesis (H0H_0) represents the "no effect" or "no difference" claim. It's what you assume is true until evidence suggests otherwise.
  • The alternative hypothesis (HaH_a) captures what you're trying to find evidence for. This determines whether you run a one-sided or two-sided test.
  • Write hypotheses using population parameters, not sample statistics. Use ฮผ\mu, pp, or p1โˆ’p2p_1 - p_2, never xห‰\bar{x} or p^\hat{p}. For example, write H0:p=0.5H_0: p = 0.5, not H0:p^=0.5H_0: \hat{p} = 0.5.

Choose the Significance Level (ฮฑ\alpha)

  • The significance level ฮฑ\alpha is your threshold for "surprising enough to reject H0H_0." It's typically 0.05 unless the problem states otherwise.
  • ฮฑ\alpha equals the probability of a Type I error: rejecting H0H_0 when it's actually true (a false positive).
  • Choosing ฮฑ\alpha involves trade-offs. Lowering ฮฑ\alpha reduces your Type I error risk but increases your Type II error risk and decreases power. You should set ฮฑ\alpha before looking at the data.

Compare: Null hypothesis vs. Alternative hypothesis: both describe population parameters, but H0H_0 assumes no effect while HaH_a specifies the effect you're looking for. On FRQs, always define your parameters in context before writing the hypotheses (e.g., "Let pp = the true proportion of students who prefer brand A").


Checking Conditions and Selecting Methods

The validity of your inference depends entirely on whether the conditions for the test are met. This is where many students lose points, either by skipping conditions or by stating them without actually verifying them with the given numbers.

Select the Appropriate Test Statistic

  • Match your test to your data type and question. Use z-tests for proportions, t-tests for means, and chi-square for categorical distributions.
  • The test statistic measures how far your sample result is from what H0H_0 predicts, standardized by the standard error.
  • Common formulas:
    • For a one-proportion z-test: z=p^โˆ’p0p0(1โˆ’p0)nz = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}
    • For a one-sample t-test: t=xห‰โˆ’ฮผ0s/nt = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}

Notice that the proportion test uses p0p_0 (the null value) in the standard error, while the t-test uses ss (the sample standard deviation). That's because under H0H_0 we know p0p_0, but we never know the population standard deviation ฯƒ\sigma for means.

Verify Conditions for Inference

You need to check three conditions. On FRQs, name each condition and show the math that confirms it.

  1. Random condition: The data must come from a random sample or randomized experiment. This ensures the sample is representative of the population.
  2. Independence condition (10% rule): When sampling without replacement, the sample size must be โ‰ค10%\leq 10\% of the population. This keeps individual observations approximately independent.
  3. Large Counts or Normality condition: For proportions, check that np0โ‰ฅ10np_0 \geq 10 and n(1โˆ’p0)โ‰ฅ10n(1-p_0) \geq 10. For means, if nโ‰ฅ30n \geq 30, the Central Limit Theorem covers you; if n<30n < 30, you need the population distribution to be roughly normal (check for strong skew or outliers in the sample).

Compare: Conditions for proportions vs. conditions for means: both require randomness and independence, but proportions need the Large Counts condition while means rely on the Central Limit Theorem or an approximately normal population. If an FRQ asks you to "state and check conditions," you must do both.


Calculating and Evaluating Evidence

This is where the math happens. You're quantifying how surprising your sample result would be if H0H_0 were true.

Calculate the Test Statistic

  1. Write out the formula for your chosen test.
  2. Substitute in your sample values and the null parameter.
  3. Compute the result.

Showing all three steps earns you process points on FRQs, even if you make an arithmetic mistake. The test statistic tells you how many standard errors your sample result falls from the hypothesized parameter.

Determine the P-Value

  • The p-value is the probability of getting a result as extreme as (or more extreme than) yours, assuming H0H_0 is true.
  • Use the correct distribution: standard normal (z) for proportions, t-distribution with df=nโˆ’1df = n - 1 for means, chi-square for categorical tests.
  • Direction matters. For a one-sided test, find the area in one tail. For a two-sided test, double the one-tail probability.

A common mistake: students sometimes describe the p-value as "the probability that H0H_0 is true." That's wrong. The p-value assumes H0H_0 is true and asks how likely your data would be under that assumption.

Compare P-Value to ฮฑ\alpha

  • If p-value โ‰คฮฑ\leq \alpha: reject H0H_0. The evidence against the null is statistically significant.
  • If p-value >ฮฑ> \alpha: fail to reject H0H_0. There's insufficient evidence to support the alternative.
  • Never say "accept H0H_0." You can only fail to reject it. Absence of evidence isn't evidence of absence.

Compare: P-value approach vs. Critical value approach: both lead to the same decision, but p-values tell you how much evidence you have, while critical values just give a yes/no answer. The AP exam strongly favors the p-value approach.


Drawing Conclusions

The final steps connect your statistical results back to the real-world question. This is where you show understanding, not just calculation ability.

State Your Decision

  • Use correct language: "reject H0H_0" or "fail to reject H0H_0." Never "accept" either hypothesis.
  • Link the decision to your p-value and ฮฑ\alpha: "Since the p-value = 0.023 < ฮฑ\alpha = 0.05, we reject H0H_0."
  • Acknowledge uncertainty. Rejecting H0H_0 risks a Type I error; failing to reject risks a Type II error.

Interpret Results in Context

  • Write a conclusion in plain language that someone without statistics training could understand.
  • Reference the alternative hypothesis and the context. For example: "There is convincing evidence that the true proportion of defective items is greater than 0.02."
  • Avoid causal language unless the study was a randomized experiment. Observational studies only support claims of association, not causation.

Assess Practical Significance

Statistical significance โ‰ \neq practical significance. A tiny difference can be "significant" with a large enough sample size.

Consider effect sizes and real-world implications. For example, a drug that lowers blood pressure by 0.5 mmHg might reach statistical significance with n=10,000n = 10{,}000 patients, but a 0.5 mmHg drop is clinically meaningless. FRQs increasingly ask you to address whether results matter in context, not just whether they're "significant."

Also consider limitations: sample size, scope of inference, potential confounding variables, and how broadly the results can be generalized.

Compare: Statistical significance vs. Practical significance: statistical significance only tells you the result is unlikely under H0H_0. Practical significance asks whether the size of the effect is large enough to matter in the real world. Always think about both.


Quick Reference Table

ConceptKey Steps/Elements
Setting up hypothesesState H0H_0 and HaH_a using parameters; define parameters in context
Significance levelChoose ฮฑ\alpha (usually 0.05); understand it as the Type I error probability
Conditions for inferenceRandom, Independence (10% rule), Large Counts or Normality
Test statisticz for proportions, t for means, ฯ‡2\chi^2 for categorical
P-value interpretationProbability of observed result (or more extreme) if H0H_0 is true
Decision ruleReject H0H_0 if p-value โ‰คฮฑ\leq \alpha; fail to reject if p-value >ฮฑ> \alpha
Type I vs. Type II errorType I: reject a true H0H_0; Type II: fail to reject a false H0H_0
Conclusion language"Convincing evidence" for reject; "insufficient evidence" for fail to reject

Self-Check Questions

  1. What is the difference between the null hypothesis and the alternative hypothesis, and why do we assume H0H_0 is true when calculating the p-value?

  2. A student writes "We accept H0H_0 because the p-value is 0.12." Identify two errors in this statement and explain how to correct them.

  3. Compare and contrast Type I and Type II errors: which one does ฮฑ\alpha control, and how does increasing sample size affect each?

  4. Why must you check conditions before performing a hypothesis test, and what happens to your conclusions if the conditions aren't met?

  5. An FRQ presents a hypothesis test with p-value = 0.001 and asks whether the result is practically significant. What additional information would you need to answer this question, and why might statistical significance not imply practical importance?

Hypothesis Testing Steps to Know for AP Statistics