๐Ÿ“ŠAP Statistics

Confidence Interval Formulas

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Confidence intervals are the backbone of statistical inference on the AP Statistics exam. They show up in Units 6, 7, and 9, and you'll encounter them in both multiple-choice questions and FRQs. The College Board wants you to understand that we use intervals (not single values) to estimate population parameters because sample statistics vary from sample to sample. Every confidence interval you construct reflects this fundamental truth: you're acknowledging uncertainty while still making useful claims about populations.

Here's what you're really being tested on: knowing which interval procedure fits which situation, verifying the conditions that make each formula valid, and interpreting your results correctly. The formulas themselves follow a consistent structure: point estimate ยฑ (critical value)(standard error). But the details change depending on whether you're estimating proportions vs. means, one sample vs. two samples, or categorical vs. quantitative relationships. Don't just memorize formulas; know what type of data and research question each one addresses.


One-Sample Intervals for Proportions

When you have categorical data from a single sample and want to estimate the true population proportion, you'll use the one-sample z-interval. The sampling distribution of p^\hat{p} is approximately normal when sample sizes are large enough, which is why we can use the standard normal (z) distribution here.

One-Sample Z-Interval for a Proportion

  • Formula: p^ยฑzโˆ—p^(1โˆ’p^)n\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} where p^\hat{p} is the sample proportion and zโˆ—z^* is the critical value (e.g., zโˆ—=1.96z^* = 1.96 for 95% confidence)
  • Success-failure condition: both np^โ‰ฅ10n\hat{p} \geq 10 and n(1โˆ’p^)โ‰ฅ10n(1-\hat{p}) \geq 10 must be satisfied. This ensures the sampling distribution is approximately normal so that using zโˆ—z^* is valid.
  • Independence conditions: data must come from a random sample or randomized experiment, and the 10% condition (nโ‰ค0.10Nn \leq 0.10N) applies when sampling without replacement. Without independence, the standard error formula breaks down.

One-Sample Intervals for Means

When estimating a population mean from quantitative data, the choice between z and t depends on whether you know the population standard deviation. In practice, you almost never know ฯƒ\sigma, so the t-interval dominates AP Statistics.

Z-Interval for a Mean (Known ฯƒ\sigma)

  • Formula: xห‰ยฑzโˆ—(ฯƒn)\bar{x} \pm z^* \left(\frac{\sigma}{\sqrt{n}}\right) using the known population standard deviation ฯƒ\sigma in the standard error
  • Rarely used in practice because knowing ฯƒ\sigma while not knowing ฮผ\mu is an unusual situation. This formula appears mainly in theoretical or textbook problems.
  • Normality required: either the population is normally distributed, or nโ‰ฅ30n \geq 30 so the Central Limit Theorem kicks in

T-Interval for a Mean (Unknown ฯƒ\sigma)

  • Formula: xห‰ยฑtโˆ—(sn)\bar{x} \pm t^* \left(\frac{s}{\sqrt{n}}\right) where ss replaces ฯƒ\sigma, with degrees of freedom df=nโˆ’1df = n - 1
  • The t-distribution is wider than the z-distribution, especially for small samples. Those heavier tails account for the extra uncertainty that comes from estimating ฯƒ\sigma with ss.
  • Conditions: random sample, independence (10% condition), and population approximately normal OR large sample size (nโ‰ฅ30n \geq 30)

Compare: Z-interval vs. T-interval for means: both estimate ฮผ\mu, but the t-interval uses ss instead of ฯƒ\sigma and has heavier tails. On the AP exam, if ฯƒ\sigma isn't explicitly given, use the t-interval. This is the default for quantitative data.


Two-Sample Intervals for Comparing Groups

Comparing two populations is where inference gets more interesting. The key question: are the samples independent (two separate groups) or paired (same subjects measured twice)?

Two-Sample T-Interval for Difference of Means

  • Formula: (xห‰1โˆ’xห‰2)ยฑtโˆ—s12n1+s22n2(\bar{x}_1 - \bar{x}_2) \pm t^* \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} where the standard error combines variability from both samples
  • Degrees of freedom: use your calculator's "2-SampTInt" function, which applies the Welch approximation. Don't try to compute df by hand on the AP exam.
  • Conditions: two independent random samples, 10% condition for each group, and both populations approximately normal OR both sample sizes large

Two-Sample Z-Interval for Difference of Proportions

  • Formula: (p^1โˆ’p^2)ยฑzโˆ—p^1(1โˆ’p^1)n1+p^2(1โˆ’p^2)n2(\hat{p}_1 - \hat{p}_2) \pm z^* \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} Note that you use each sample's p^\hat{p} separately. There's no pooling for confidence intervals (pooling only happens in hypothesis tests for proportions).
  • Interpretation: if the interval contains zero, you don't have convincing evidence of a difference. The sign of the values tells you which group has the larger proportion.
  • Success-failure condition: check all four values: n1p^1n_1\hat{p}_1, n1(1โˆ’p^1)n_1(1-\hat{p}_1), n2p^2n_2\hat{p}_2, and n2(1โˆ’p^2)n_2(1-\hat{p}_2). Each must be โ‰ฅ10\geq 10.

Compare: Two-sample means vs. two-sample proportions: both compare independent groups, but means use the t-distribution while proportions use z. If an FRQ asks you to compare two treatments with a binary outcome (yes/no, success/failure), you need the two-proportion z-interval.

Paired T-Interval for Mean Difference

  • Formula: dห‰ยฑtโˆ—(sdn)\bar{d} \pm t^* \left(\frac{s_d}{\sqrt{n}}\right) where dห‰\bar{d} is the mean of the differences and sds_d is the standard deviation of the differences
  • When to use: matched pairs designs, before-and-after studies, or any situation where each observation in one sample is linked to a specific observation in the other
  • The core idea: you're reducing a two-sample problem to a one-sample problem by computing the differences first, then analyzing those differences. Here df=nโˆ’1df = n - 1 where nn is the number of pairs.

Compare: Two-sample t-interval vs. paired t-interval: the paired approach controls for individual variability and often produces narrower intervals. Watch for FRQ setups where subjects are measured twice or matched by characteristics. That's your cue to use paired procedures.


Inference for Regression Slopes

Unit 9 extends confidence intervals to linear regression. Here you're estimating the true population slope ฮฒ\beta based on your sample slope bb.

T-Interval for the Slope of a Regression Line

  • Formula: bยฑtโˆ—โ‹…SEbb \pm t^* \cdot SE_b where SEb=sโˆ‘(xiโˆ’xห‰)2SE_b = \frac{s}{\sqrt{\sum(x_i - \bar{x})^2}} and ss is the residual standard deviation. In practice, your calculator or computer output provides SEbSE_b directly.
  • Degrees of freedom: df=nโˆ’2df = n - 2. You lose two degrees of freedom because you're estimating both the slope and the intercept.
  • Conditions (LINE):
    • Linear relationship (check the residual plot for no curved pattern)
    • Independence (random sample, 10% condition)
    • Normal residuals (check with a histogram or Normal probability plot of residuals)
    • Equal variance (residuals show constant spread across all x-values, no "fan" shape)

Compare: T-interval for slope vs. t-interval for a mean: both use the t-distribution, but slope inference has df=nโˆ’2df = n - 2 instead of df=nโˆ’1df = n - 1, and the conditions focus on residual behavior rather than the raw data. If you see regression output on an FRQ, look for the standard error of the slope coefficient in the table.


Advanced Intervals (Beyond Core AP Content)

These formulas occasionally appear in enrichment contexts but are not central to the AP Statistics exam. Know they exist, but prioritize the intervals above.

Confidence Interval for Population Variance

  • Formula: ((nโˆ’1)s2ฯ‡ฮฑ/22,ย (nโˆ’1)s2ฯ‡1โˆ’ฮฑ/22)\left(\frac{(n-1)s^2}{\chi^2_{\alpha/2}},\ \frac{(n-1)s^2}{\chi^2_{1-\alpha/2}}\right) using the chi-squared distribution, which is right-skewed
  • Not symmetric: unlike z and t intervals, this interval is asymmetric around the point estimate
  • Strong normality assumption: the population must be normally distributed. This procedure is sensitive to departures from normality.

Confidence Interval for Ratio of Two Variances

  • Formula: (s12s22โ‹…1Fฮฑ/2,ย s12s22โ‹…Fฮฑ/2)\left(\frac{s_1^2}{s_2^2} \cdot \frac{1}{F_{\alpha/2}},\ \frac{s_1^2}{s_2^2} \cdot F_{\alpha/2}\right) using the F-distribution with df1=n1โˆ’1df_1 = n_1 - 1 and df2=n2โˆ’1df_2 = n_2 - 1
  • Application: testing whether two populations have equal variances before running a pooled two-sample t-test
  • Requires normality in both populations. Rarely tested on AP Statistics but useful for understanding ANOVA assumptions.

Quick Reference Table

SituationProcedure to Use
Estimating a single proportionOne-sample z-interval for pp
Estimating a single meanT-interval for ฮผ\mu (use z only if ฯƒ\sigma is known)
Comparing two independent proportionsTwo-sample z-interval for p1โˆ’p2p_1 - p_2
Comparing two independent meansTwo-sample t-interval for ฮผ1โˆ’ฮผ2\mu_1 - \mu_2
Comparing paired/matched dataPaired t-interval for ฮผd\mu_d
Estimating a regression slopeT-interval for ฮฒ\beta with df=nโˆ’2df = n - 2
Intervals using z-distributionOne-proportion, two-proportion (large samples)
Intervals using t-distributionOne-mean, two-means, paired, regression slope

Self-Check Questions

  1. What conditions must you verify before constructing a one-sample z-interval for a proportion, and why does each condition matter?

  2. Compare the t-interval for a single mean and the paired t-interval: what do they have in common, and when would you choose one over the other?

  3. If a confidence interval for p1โˆ’p2p_1 - p_2 is (0.03,0.15)(0.03, 0.15), what can you conclude about the relationship between the two population proportions?

  4. An FRQ gives you regression output including b=2.4b = 2.4 and SEb=0.6SE_b = 0.6 with n=22n = 22. What critical value would you use for a 95% confidence interval, and what are the degrees of freedom?

  5. A researcher wants to determine whether a new teaching method improves test scores. Students are tested before and after the intervention. Which confidence interval procedure is appropriate, and why would using a two-sample t-interval be incorrect here?