Confidence Intervals and the Student's t-Distribution
When the population standard deviation is unknown, you can't use a z-score to build a confidence interval. Instead, you use the Student's t-distribution, which accounts for the extra uncertainty that comes from estimating the standard deviation from your sample. The t-distribution is wider and has heavier tails than the normal distribution, giving you appropriately wider intervals when you have less information.

Confidence Intervals Using the t-Distribution
The t-distribution is used to construct confidence intervals when the population standard deviation () is unknown and must be estimated from the sample using . While textbooks often cite as the threshold for using , in practice you should use the t-distribution any time is unknown, regardless of sample size. For large , the t- and z-intervals will be nearly identical anyway.
Steps to construct a confidence interval:
-
Calculate the sample mean () and sample standard deviation () from your data.
-
Determine the degrees of freedom: .
-
Look up the critical t-value () for your desired confidence level (e.g., 95%) and your , using a t-table or calculator.
-
Compute the margin of error:
-
Build the interval:
The result is an interval that you are, say, 95% confident contains the true population mean .
What affects the width of the interval?
- Sample size (): Larger samples shrink the margin of error because is in the denominator. More data means a more precise estimate.
- Confidence level: A higher confidence level (e.g., 99% vs. 90%) requires a larger , which widens the interval. You're trading precision for greater confidence.
- Sample variability (): More spread in your data increases the margin of error.
Example: Suppose you sample 20 students and find and . For a 95% confidence interval, and . The margin of error is . Your interval is .

T-Scores vs. Z-Scores
Both t-scores and z-scores measure how far a sample mean is from a hypothesized population mean, in units of standard error. The key difference is which standard deviation you're using.
- Z-score: Used when is known. The formula is .
- T-score: Used when is unknown and you substitute the sample standard deviation . The formula is:
Where:
- = sample mean
- = hypothesized population mean
- = sample standard deviation
- = sample size
Because is itself a random variable (it changes from sample to sample), the t-score has more variability than a z-score. That extra variability is exactly what the heavier tails of the t-distribution capture.
As grows large, becomes a very good estimate of , and the t-distribution converges to the standard normal (z) distribution.

Properties of the Student's t-Distribution
The t-distribution shares some features with the standard normal distribution but differs in important ways, especially at small sample sizes.
- Symmetric and bell-shaped, centered at zero, just like the standard normal.
- Heavier tails than the standard normal. This means extreme values are more likely under the t-distribution, which reflects the added uncertainty from estimating with .
- Defined by degrees of freedom (). The is the single parameter that controls the shape. Lower means heavier tails; higher means the distribution looks more and more like the standard normal.
- Standard deviation is greater than 1 for small , and approaches 1 as increases.
- For or so, the t-distribution is nearly indistinguishable from the standard normal. This is why older rules of thumb suggest switching to z at , but there's no harm in always using when is unknown.
Why heavier tails matter: Heavier tails produce larger critical values (), which widen your confidence interval. This is the t-distribution's way of saying, "You have less information, so your interval should be wider to compensate."
Type I and Type II Errors
These concepts connect confidence intervals to the broader framework of hypothesis testing.
- Type I error (false positive): Rejecting the null hypothesis when it is actually true. The probability of this is , your significance level. If you construct a 95% confidence interval, , meaning there's a 5% chance the interval fails to contain the true mean.
- Type II error (false negative): Failing to reject the null hypothesis when it is actually false. The probability of this is .
There's a tradeoff: making smaller (to reduce false positives) increases (more false negatives), assuming sample size stays the same.
Statistical Power and Effect Size
Statistical power is the probability of correctly rejecting a false null hypothesis. It equals . Higher power means you're more likely to detect a real effect when one exists.
Three main factors influence power:
- Sample size: Larger reduces the standard error (), making it easier to detect differences. This is the factor researchers have the most control over.
- Effect size: The magnitude of the true difference between the hypothesized value and the actual population parameter. Larger effects are easier to detect. For example, detecting a 10-point difference in mean test scores is much easier than detecting a 1-point difference.
- Significance level (): A more stringent threshold (e.g., instead of ) requires stronger evidence to reject the null, which decreases power.
Effect size also helps you assess practical significance. A result can be statistically significant (small p-value) but have a tiny effect size, meaning the difference may not matter in the real world. Conversely, a meaningful effect might not reach statistical significance if the sample is too small. Both statistical and practical significance should be considered when interpreting results.