Fiveable

๐ŸŽฒIntro to Statistics Unit 7 Review

QR code for Intro to Statistics practice questions

7.1 The Central Limit Theorem for Sample Means (Averages)

7.1 The Central Limit Theorem for Sample Means (Averages)

Written by the Fiveable Content Team โ€ข Last updated August 2025
Written by the Fiveable Content Team โ€ข Last updated August 2025
๐ŸŽฒIntro to Statistics
Unit & Topic Study Guides

The Central Limit Theorem for Sample Means (Averages)

The Central Limit Theorem (CLT) answers a critical question in statistics: how do sample averages behave when you draw repeated samples from a population? It turns out that no matter what the original population looks like, the distribution of sample means will approximate a normal distribution as the sample size grows.

This matters because it's the foundation of statistical inference. Even if your population data is skewed, bimodal, or oddly shaped, the CLT lets you use normal distribution tools to make conclusions about the population mean based on sample data.

The Central Limit Theorem for Sample Means

The CLT says that if you take many random samples of the same size from a population and calculate the mean of each sample, the distribution of those sample means will be approximately normal, regardless of the population's original shape.

Three conditions need to hold:

  • The samples must be independent (one observation doesn't influence another).
  • The sample size must be sufficiently large. The common rule of thumb is nโ‰ฅ30n \geq 30, though if the population is already roughly normal, smaller samples work fine.
  • If sampling without replacement, the sample should be less than 10% of the population.

As sample size increases, three things happen to the sampling distribution of sample means:

  • It becomes more symmetric and bell-shaped.
  • Its mean equals the population mean: ฮผxห‰=ฮผ\mu_{\bar{x}} = \mu
  • Its spread shrinks. The standard deviation of the sampling distribution (called the standard error) decreases by a factor of n\sqrt{n}.

This is what makes the CLT so powerful: you don't need to know the shape of the population distribution to draw conclusions about the population mean.

Standard Error Calculation

The standard error of the mean (ฯƒxห‰\sigma_{\bar{x}}) measures how much sample means vary from sample to sample. It tells you how tightly your sample means cluster around the true population mean.

The formula is:

ฯƒxห‰=ฯƒn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

where ฯƒ\sigma is the population standard deviation and nn is the sample size.

Notice what happens as nn gets larger: you're dividing by a bigger number, so the standard error gets smaller. That means larger samples give you more precise estimates of the population mean.

Example: Suppose a population has ฯƒ=20\sigma = 20. With a sample of n=25n = 25, the standard error is 2025=205=4\frac{20}{\sqrt{25}} = \frac{20}{5} = 4. If you increase the sample to n=100n = 100, the standard error drops to 20100=2010=2\frac{20}{\sqrt{100}} = \frac{20}{10} = 2. Quadrupling the sample size cut the standard error in half.

When the population standard deviation is unknown (which is common), you can estimate it using the sample standard deviation ss:

ฯƒxห‰โ‰ˆsn\sigma_{\bar{x}} \approx \frac{s}{\sqrt{n}}

This estimate works well for large sample sizes.

Central Limit Theorem for sample means, 6.2 The Sampling Distribution of the Sample Mean (ฯƒ Known) โ€“ Significant Statistics

Z-Scores in Sampling Distributions

A z-score tells you how many standard errors a particular sample mean is from the population mean. It converts your sample mean into a standardized value you can look up on a normal distribution table.

The formula is:

z=xห‰โˆ’ฮผฯƒxห‰z = \frac{\bar{x} - \mu}{\sigma_{\bar{x}}}

where xห‰\bar{x} is the sample mean, ฮผ\mu is the population mean, and ฯƒxห‰\sigma_{\bar{x}} is the standard error.

  • A positive z-score means the sample mean is above the population mean.
  • A negative z-score means the sample mean is below the population mean.
  • The magnitude tells you how far away it is in standard-error units.

Example: A population has ฮผ=500\mu = 500 and ฯƒ=40\sigma = 40. You take a sample of n=64n = 64 and get xห‰=510\bar{x} = 510.

  1. Calculate the standard error: ฯƒxห‰=4064=408=5\sigma_{\bar{x}} = \frac{40}{\sqrt{64}} = \frac{40}{8} = 5

  2. Calculate the z-score: z=510โˆ’5005=105=2.0z = \frac{510 - 500}{5} = \frac{10}{5} = 2.0

  3. Interpret: the sample mean of 510 is 2 standard errors above the population mean.

Z-scores in sampling distributions are used to:

  • Find the probability of getting a sample mean at least as extreme as the one observed (this connects to p-values later in the course).
  • Build confidence intervals for the population mean. For a 95% confidence interval: xห‰ยฑ1.96โ‹…ฯƒxห‰\bar{x} \pm 1.96 \cdot \sigma_{\bar{x}}

Additional Concepts in Sampling Theory

  • Law of Large Numbers: As sample size increases, the sample mean converges to the true population mean. This is related to but different from the CLT. The Law of Large Numbers is about a single sample mean getting more accurate; the CLT is about the distribution of many sample means becoming normal.
  • Random Variable: A variable whose value is determined by the outcome of a random process. The sample mean xห‰\bar{x} is itself a random variable because it changes from sample to sample.
  • Sampling Bias: A systematic error in how a sample is selected that leads to a non-representative sample. The CLT assumes your samples are randomly and independently drawn. If there's sampling bias, the theorem's guarantees don't apply.