Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Hypothesis testing is how statisticians move from sample data to conclusions about populations. On the AP Statistics exam, you're not just plugging numbers into formulas. You're being tested on the logic of inference: why we assume the null hypothesis is true, how we measure evidence against it, and what our conclusions actually mean in context. Every free-response question involving inference expects you to show this reasoning process.
The steps of hypothesis testing connect directly to core concepts like probability, sampling distributions, Type I and Type II errors, and the interpretation of p-values. Understanding each step will help you tackle everything from one-sample z-tests to chi-square tests for independence. Don't just memorize the sequence; know what each step accomplishes and how errors at any stage can invalidate your conclusions.
Before any calculations happen, you need to establish what you're testing and what would count as convincing evidence. This foundation determines everything that follows.
Compare: Null hypothesis vs. Alternative hypothesis: both describe population parameters, but assumes no effect while specifies the effect you're looking for. On FRQs, always define your parameters in context before writing the hypotheses (e.g., "Let = the true proportion of students who prefer brand A").
The validity of your inference depends entirely on whether the conditions for the test are met. This is where many students lose points, either by skipping conditions or by stating them without actually verifying them with the given numbers.
Notice that the proportion test uses (the null value) in the standard error, while the t-test uses (the sample standard deviation). That's because under we know , but we never know the population standard deviation for means.
You need to check three conditions. On FRQs, name each condition and show the math that confirms it.
Compare: Conditions for proportions vs. conditions for means: both require randomness and independence, but proportions need the Large Counts condition while means rely on the Central Limit Theorem or an approximately normal population. If an FRQ asks you to "state and check conditions," you must do both.
This is where the math happens. You're quantifying how surprising your sample result would be if were true.
Showing all three steps earns you process points on FRQs, even if you make an arithmetic mistake. The test statistic tells you how many standard errors your sample result falls from the hypothesized parameter.
A common mistake: students sometimes describe the p-value as "the probability that is true." That's wrong. The p-value assumes is true and asks how likely your data would be under that assumption.
Compare: P-value approach vs. Critical value approach: both lead to the same decision, but p-values tell you how much evidence you have, while critical values just give a yes/no answer. The AP exam strongly favors the p-value approach.
The final steps connect your statistical results back to the real-world question. This is where you show understanding, not just calculation ability.
Statistical significance practical significance. A tiny difference can be "significant" with a large enough sample size.
Consider effect sizes and real-world implications. For example, a drug that lowers blood pressure by 0.5 mmHg might reach statistical significance with patients, but a 0.5 mmHg drop is clinically meaningless. FRQs increasingly ask you to address whether results matter in context, not just whether they're "significant."
Also consider limitations: sample size, scope of inference, potential confounding variables, and how broadly the results can be generalized.
Compare: Statistical significance vs. Practical significance: statistical significance only tells you the result is unlikely under . Practical significance asks whether the size of the effect is large enough to matter in the real world. Always think about both.
| Concept | Key Steps/Elements |
|---|---|
| Setting up hypotheses | State and using parameters; define parameters in context |
| Significance level | Choose (usually 0.05); understand it as the Type I error probability |
| Conditions for inference | Random, Independence (10% rule), Large Counts or Normality |
| Test statistic | z for proportions, t for means, for categorical |
| P-value interpretation | Probability of observed result (or more extreme) if is true |
| Decision rule | Reject if p-value ; fail to reject if p-value |
| Type I vs. Type II error | Type I: reject a true ; Type II: fail to reject a false |
| Conclusion language | "Convincing evidence" for reject; "insufficient evidence" for fail to reject |
What is the difference between the null hypothesis and the alternative hypothesis, and why do we assume is true when calculating the p-value?
A student writes "We accept because the p-value is 0.12." Identify two errors in this statement and explain how to correct them.
Compare and contrast Type I and Type II errors: which one does control, and how does increasing sample size affect each?
Why must you check conditions before performing a hypothesis test, and what happens to your conclusions if the conditions aren't met?
An FRQ presents a hypothesis test with p-value = 0.001 and asks whether the result is practically significant. What additional information would you need to answer this question, and why might statistical significance not imply practical importance?