๐ŸƒEngineering Probability

Confidence Intervals

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Confidence intervals let you quantify how uncertain a sample-based estimate really is. Every time you estimate a parameter from data (a mean strength, a failure rate, a proportion of defective parts), you need to communicate how much that estimate could vary. CIs do exactly that, connecting directly to concepts like sampling distributions, the Central Limit Theorem, and hypothesis testing.

Don't just memorize the formulas. Know when each interval type applies and why the mechanics work. Can you explain why larger samples shrink your interval? Why you switch from Z to t? Why a 99% CI is wider than a 95% CI? These conceptual questions show up constantly, and the formulas alone won't save you. Master the underlying logic: confidence level trade-offs, sample size effects, distributional assumptions, and the duality between CIs and hypothesis tests.


Foundational Concepts

Before diving into specific interval formulas, you need a solid understanding of what a confidence interval actually means and what controls its width. The confidence level determines how often the procedure captures the true parameter; the margin of error determines how precise your estimate is.

Definition and Interpretation

A confidence interval is a range of plausible values for an unknown population parameter, constructed from sample data. It is not a probability statement about where the parameter lies after you've computed it.

The confidence level (e.g., 95%) means that if you repeated the sampling procedure many times, about 95% of the resulting intervals would contain the true parameter. The width of the interval reflects your uncertainty: wider intervals mean less precision, narrower intervals mean your point estimate is more tightly pinned down.

Confidence Level and Significance Level

  • Confidence level 1โˆ’ฮฑ1 - \alpha and significance level ฮฑ\alpha are complementary. A 95% CI corresponds to ฮฑ=0.05\alpha = 0.05.
  • Higher confidence levels produce wider intervals because you're demanding more certainty that you've captured the true value.
  • The trade-off is precision vs. confidence. You can't have both a narrow interval and high confidence without increasing your sample size.

Margin of Error

The margin of error (ME) is the "ยฑ" portion of your estimate, equal to half the total interval width. It quantifies the maximum expected sampling error.

ME depends on three factors:

  • Confidence level: higher confidence โ†’ larger ME
  • Sample size: larger nn โ†’ smaller ME
  • Data variability: more spread in the data โ†’ larger ME

In practice, you'll often specify a required ME first, then solve for the sample size needed to achieve it.

Compare: Confidence level vs. margin of error. Both affect interval width, but confidence level is chosen based on risk tolerance while margin of error reflects data quality. If a problem asks you to "improve precision," think sample size. If it asks about "reliability of the procedure," think confidence level.


Choosing the Right Distribution

The critical decision in constructing any CI is selecting the appropriate sampling distribution. Your choice depends on what you know about the population and how large your sample is.

Z-Score Intervals

Use the Z-distribution when ฯƒ\sigma (population standard deviation) is known and either the population is normal or n>30n > 30 (so the CLT applies).

Z critical values are fixed for each confidence level:

  • 90% confidence: Z=1.645Z = 1.645
  • 95% confidence: Z=1.96Z = 1.96
  • 99% confidence: Z=2.576Z = 2.576

This situation is common in quality control settings where historical process data provides a reliable ฯƒ\sigma value.

T-Score Intervals

Use the t-distribution when ฯƒ\sigma is unknown and you estimate it with the sample standard deviation ss.

Degrees of freedom (df=nโˆ’1df = n - 1) control the shape. Smaller samples produce heavier tails, which makes the interval wider to account for the extra uncertainty in estimating ฯƒ\sigma. As nโ†’โˆžn \to \infty, the t-distribution converges to the Z-distribution. Using t is always valid when ฯƒ\sigma is unknown, even for large samples.

Chi-Squared Distribution for Variance

When you need a CI for a population variance (not a mean), you use the ฯ‡2\chi^2 distribution. This works because the quantity (nโˆ’1)s2ฯƒ2\frac{(n-1)s^2}{\sigma^2} follows a chi-squared distribution when the data is normal.

The resulting interval is asymmetric:

((nโˆ’1)s2ฯ‡ฮฑ/22,(nโˆ’1)s2ฯ‡1โˆ’ฮฑ/22)\left(\frac{(n-1)s^2}{\chi^2_{\alpha/2}}, \frac{(n-1)s^2}{\chi^2_{1-\alpha/2}}\right)

Note the "flipped" critical values in the denominators. The normality assumption is critical here; variance intervals are much more sensitive to non-normality than mean intervals.

Compare: Z vs. t intervals. Both estimate population means, but t accounts for the extra uncertainty from estimating ฯƒ\sigma. On exams, if ฯƒ\sigma is given, use Z. If you see "sample standard deviation" or just ss, use t.


Intervals for Means

These are your workhorse formulas. The structure is always the same: point estimate ยฑ (critical value) ร— (standard error).

CI for Mean (Known ฯƒ\sigma)

xห‰ยฑZฮฑ/2โ‹…ฯƒn\bar{x} \pm Z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}

The term ฯƒn\frac{\sigma}{\sqrt{n}} is the standard error of the mean. This formula requires the sampling distribution of xห‰\bar{x} to be approximately normal, which holds when the population is normal or n>30n > 30.

This scenario is rare in practice since ฯƒ\sigma is seldom truly known, but it's common on exams to test your understanding of the Z framework.

CI for Mean (Unknown ฯƒ\sigma)

xห‰ยฑtฮฑ/2,โ€‰nโˆ’1โ‹…sn\bar{x} \pm t_{\alpha/2, \, n-1} \cdot \frac{s}{\sqrt{n}}

Here ss replaces ฯƒ\sigma, and the t critical value depends on df=nโˆ’1df = n - 1. Always check your t-table or calculator with the correct degrees of freedom.

This is the default real-world scenario. When in doubt on an exam and ฯƒ\sigma isn't explicitly given, use t.

CI for Difference Between Two Means

For independent samples with known variances:

(xห‰1โˆ’xห‰2)ยฑZฮฑ/2ฯƒ12n1+ฯƒ22n2(\bar{x}_1 - \bar{x}_2) \pm Z_{\alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}

For unknown variances (the more common case), use a t-interval (pooled or Welch's):

(xห‰1โˆ’xห‰2)ยฑtฮฑ/2s12n1+s22n2(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2} \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}

If the resulting interval contains zero, you cannot conclude the means differ at that confidence level. This connects directly to two-sample hypothesis tests.

Compare: Single mean vs. difference of means. Both use the same logic (point estimate ยฑ ME), but the standard error for differences combines variability from both samples. Exam problems frequently ask whether zero falls in the difference interval.


Intervals for Proportions

Proportion intervals apply when your data is categorical (success/failure, defective/non-defective). The normal approximation requires checking sample size conditions before you use these formulas.

CI for Single Proportion

p^ยฑZฮฑ/2p^(1โˆ’p^)n\hat{p} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

where p^=x/n\hat{p} = x/n is the sample proportion.

Check conditions before using this formula: np^โ‰ฅ10n\hat{p} \geq 10 and n(1โˆ’p^)โ‰ฅ10n(1-\hat{p}) \geq 10. These ensure the normal approximation to the binomial is reasonable. This interval is used extensively to estimate defect rates, failure probabilities, and compliance percentages.

CI for Difference Between Two Proportions

(p^1โˆ’p^2)ยฑZฮฑ/2p^1(1โˆ’p^1)n1+p^2(1โˆ’p^2)n2(\hat{p}_1 - \hat{p}_2) \pm Z_{\alpha/2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}

Both samples must independently satisfy the normal approximation conditions (nip^iโ‰ฅ10n_i\hat{p}_i \geq 10 and ni(1โˆ’p^i)โ‰ฅ10n_i(1-\hat{p}_i) \geq 10 for each group). If zero is in the interval, you cannot conclude the proportions differ.

Compare: Mean intervals vs. proportion intervals. Means use ฯƒ/n\sigma/\sqrt{n} or s/ns/\sqrt{n} for standard error, while proportions use p^(1โˆ’p^)/n\sqrt{\hat{p}(1-\hat{p})/n}. The proportion formula has no separate "known vs. unknown" cases because the variance depends entirely on pp itself.


Sample Size and Interval Design

Often you work backwards: given a desired precision, how much data do you need? Sample size determination is about controlling the margin of error before collecting data.

Sample Size Effects

Larger nn shrinks the interval because standard error has n\sqrt{n} in the denominator. But the relationship is a square root, so the returns diminish quickly:

  • To cut ME in half, you need 4ร—4\times the sample size.
  • To cut ME to one-third, you need 9ร—9\times the sample size.

This means there are real resource constraints. You must balance statistical precision against time, cost, and feasibility.

One-Sided vs. Two-Sided Intervals

Two-sided intervals (the most common type) bound the parameter above and below: "we're 95% confident ฮผ\mu is between A and B."

One-sided intervals provide only an upper or lower bound: "we're 95% confident ฮผ\mu is at least A" or "we're 95% confident ฮผ\mu is at most B." Use one-sided intervals when the research question is directional, for example, "does the new process reduce defects?" only cares about the lower bound on improvement.

Compare: One-sided vs. two-sided. A one-sided 95% CI uses Z0.05=1.645Z_{0.05} = 1.645 while a two-sided 95% CI uses Z0.025=1.96Z_{0.025} = 1.96. One-sided intervals are narrower but only answer directional questions.


Assumptions and Connections

Valid inference requires meeting assumptions, and CIs connect directly to hypothesis testing. Understanding these relationships separates strong exam performance from formula memorization.

Assumptions for Valid CIs

Random sampling and independence are non-negotiable. Biased samples produce meaningless intervals regardless of formula correctness.

  • For means: population normality or n>30n > 30 (CLT kicks in)
  • For proportions: np^โ‰ฅ10n\hat{p} \geq 10 and n(1โˆ’p^)โ‰ฅ10n(1-\hat{p}) \geq 10
  • For variance: the chi-squared method requires strict normality; variance intervals are the most sensitive to this assumption

CI and Hypothesis Test Duality

A two-sided (1โˆ’ฮฑ)(1-\alpha) CI corresponds to a two-tailed test at significance level ฮฑ\alpha. If the null value falls outside the CI, you reject H0H_0. If it falls inside, you fail to reject.

This gives you a visual decision rule: construct the CI, then check whether the hypothesized value is inside or outside. CIs also give more information than tests alone because they show the full range of plausible values, not just a binary "reject" or "fail to reject."

Interpreting and Reporting CIs

Never say "there's a 95% probability the parameter is in this interval." The parameter is fixed; the interval is what's random across repeated samples.

Correct interpretation: "We are 95% confident that [parameter] lies between [lower bound] and [upper bound]."

When reporting, state the confidence level, sample size, assumptions you checked, and what the result means in context.

Compare: CIs vs. hypothesis tests. Both make inferences about parameters, but CIs provide a range of plausible values while tests give binary decisions. If a problem asks you to "test whether ฮผ=50\mu = 50" and you've already built a 95% CI, just check if 50 is inside.


Quick Reference Table

ConceptBest Examples
Known ฯƒ\sigma (use Z)CI for mean with historical process data, large-sample proportion intervals
Unknown ฯƒ\sigma (use t)CI for mean with sample std dev, small-sample studies
Proportion intervalsDefect rate estimation, reliability percentages, A/B testing
Difference intervalsComparing two processes, treatment vs. control, before/after studies
Variance/spreadCI for population variance using chi-squared distribution
Sample size planningMargin of error requirements, precision vs. cost trade-offs
CI-hypothesis test linkUsing CI to make reject/fail-to-reject decisions
One-sided vs. two-sidedDirectional research questions vs. general estimation

Self-Check Questions

  1. You're estimating the mean tensile strength of a new alloy. You have n=25n = 25 specimens and calculate s=12s = 12 MPa. Should you use a Z or t interval, and why does this choice affect your interval width?

  2. Compare the CI for a single proportion to the CI for the difference between two proportions. What additional complexity arises in the two-sample case, and what conditions must both samples satisfy?

  3. An engineer wants to estimate a defect rate with a margin of error of ยฑ2% at 95% confidence. If a pilot study suggests p^โ‰ˆ0.10\hat{p} \approx 0.10, approximately what sample size is needed? How would this change if p^โ‰ˆ0.50\hat{p} \approx 0.50?

  4. You construct a 95% CI for ฮผ1โˆ’ฮผ2\mu_1 - \mu_2 and obtain (โˆ’3.2,1.8)(-3.2, 1.8). What can you conclude about a hypothesis test of H0:ฮผ1=ฮผ2H_0: \mu_1 = \mu_2 at ฮฑ=0.05\alpha = 0.05? What if the interval were (โˆ’3.2,โˆ’0.4)(-3.2, -0.4)?

  5. Explain why a 99% confidence interval is wider than a 95% confidence interval for the same data. If you needed to maintain the same width while increasing confidence from 95% to 99%, what would you have to change?