When the population standard deviation is unknown, you can't use a z-score to build a confidence interval. Instead, you use the Student's t-distribution, which accounts for the extra uncertainty that comes from estimating the standard deviation from your sample. The t-distribution is wider and has heavier tails than the normal distribution, giving you appropriately wider intervals when you have less information.

more resources to help you study

practice questions

Confidence Intervals Using the t-Distribution

The t-distribution is used to construct confidence intervals when the population standard deviation ( $\sigma$ ) is unknown and must be estimated from the sample using $s$ . While textbooks often cite $n < 30$ as the threshold for using $t$ , in practice you should use the t-distribution any time $\sigma$ is unknown, regardless of sample size. For large $n$ , the t- and z-intervals will be nearly identical anyway.

Steps to construct a confidence interval:

Calculate the sample mean ( $\bar{x}$ ) and sample standard deviation ( $s$ ) from your data.
Determine the degrees of freedom: $df = n - 1$ .
Look up the critical t-value ( $t^*$ ) for your desired confidence level (e.g., 95%) and your $df$ , using a t-table or calculator.
Compute the margin of error: $E = t^* \cdot \frac{s}{\sqrt{n}}$
Build the interval: $\bar{x} \pm E$

The result is an interval $(\bar{x} - E,\; \bar{x} + E)$ that you are, say, 95% confident contains the true population mean $\mu$ .

What affects the width of the interval?

Sample size ( $n$ ): Larger samples shrink the margin of error because $\sqrt{n}$ is in the denominator. More data means a more precise estimate.
Confidence level: A higher confidence level (e.g., 99% vs. 90%) requires a larger $t^*$ , which widens the interval. You're trading precision for greater confidence.
Sample variability ( $s$ ): More spread in your data increases the margin of error.

Example: Suppose you sample 20 students and find $\bar{x} = 74.5$ and $s = 8.2$ . For a 95% confidence interval, $df = 19$ and $t^* \approx 2.093$ . The margin of error is $2.093 \cdot \frac{8.2}{\sqrt{20}} \approx 3.84$ . Your interval is $(70.66,\; 78.34)$ .

Confidence intervals using t-distribution, Estimating a Population Mean (3 of 3) | Concepts in Statistics

T-Scores vs. Z-Scores

Both t-scores and z-scores measure how far a sample mean is from a hypothesized population mean, in units of standard error. The key difference is which standard deviation you're using.

Z-score: Used when $\sigma$ is known. The formula is $z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}$ .
T-score: Used when $\sigma$ is unknown and you substitute the sample standard deviation $s$ . The formula is:

$t = \frac{\bar{x} - \mu}{s / \sqrt{n}}$

Where:

$\bar{x}$ = sample mean
$\mu$ = hypothesized population mean
$s$ = sample standard deviation
$n$ = sample size

Because $s$ is itself a random variable (it changes from sample to sample), the t-score has more variability than a z-score. That extra variability is exactly what the heavier tails of the t-distribution capture.

As $n$ grows large, $s$ becomes a very good estimate of $\sigma$ , and the t-distribution converges to the standard normal (z) distribution.

Confidence intervals using t-distribution, Margin of error - Wikipedia

Properties of the Student's t-Distribution

The t-distribution shares some features with the standard normal distribution but differs in important ways, especially at small sample sizes.

Symmetric and bell-shaped, centered at zero, just like the standard normal.
Heavier tails than the standard normal. This means extreme values are more likely under the t-distribution, which reflects the added uncertainty from estimating $\sigma$ with $s$ .
Defined by degrees of freedom ( $df = n - 1$ ). The $df$ is the single parameter that controls the shape. Lower $df$ means heavier tails; higher $df$ means the distribution looks more and more like the standard normal.
Standard deviation is greater than 1 for small $df$ , and approaches 1 as $df$ increases.
For $df > 30$ or so, the t-distribution is nearly indistinguishable from the standard normal. This is why older rules of thumb suggest switching to z at $n = 30$ , but there's no harm in always using $t$ when $\sigma$ is unknown.

Why heavier tails matter: Heavier tails produce larger critical values ( $t^*$ ), which widen your confidence interval. This is the t-distribution's way of saying, "You have less information, so your interval should be wider to compensate."

Type I and Type II Errors

These concepts connect confidence intervals to the broader framework of hypothesis testing.

Type I error (false positive): Rejecting the null hypothesis when it is actually true. The probability of this is $\alpha$ , your significance level. If you construct a 95% confidence interval, $\alpha = 0.05$ , meaning there's a 5% chance the interval fails to contain the true mean.
Type II error (false negative): Failing to reject the null hypothesis when it is actually false. The probability of this is $\beta$ .

There's a tradeoff: making $\alpha$ smaller (to reduce false positives) increases $\beta$ (more false negatives), assuming sample size stays the same.

Statistical Power and Effect Size

Statistical power is the probability of correctly rejecting a false null hypothesis. It equals $1 - \beta$ . Higher power means you're more likely to detect a real effect when one exists.

Three main factors influence power:

Sample size: Larger $n$ reduces the standard error ( $s / \sqrt{n}$ ), making it easier to detect differences. This is the factor researchers have the most control over.
Effect size: The magnitude of the true difference between the hypothesized value and the actual population parameter. Larger effects are easier to detect. For example, detecting a 10-point difference in mean test scores is much easier than detecting a 1-point difference.
Significance level ( $\alpha$ ): A more stringent threshold (e.g., $\alpha = 0.01$ instead of $0.05$ ) requires stronger evidence to reject the null, which decreases power.

Effect size also helps you assess practical significance. A result can be statistically significant (small p-value) but have a tiny effect size, meaning the difference may not matter in the real world. Conversely, a meaningful effect might not reach statistical significance if the sample is too small. Both statistical and practical significance should be considered when interpreting results.