Fiveable

๐ŸŽฒIntro to Probability Unit 14 Review

QR code for Intro to Probability practice questions

14.3 Applications of the central limit theorem

14.3 Applications of the central limit theorem

Written by the Fiveable Content Team โ€ข Last updated August 2025
Written by the Fiveable Content Team โ€ข Last updated August 2025
๐ŸŽฒIntro to Probability
Unit & Topic Study Guides

Approximating Probabilities with CLT

The Central Limit Theorem (CLT) says that when you take the mean of a large enough sample, that mean will follow an approximately normal distribution, regardless of what the original population looks like. This is powerful because it lets you use normal distribution tools (z-tables, standard normal calculations) on data that isn't normal at all.

Fundamentals of CLT

As sample size increases, the sampling distribution of the sample mean approaches a normal distribution with:

  • Mean equal to the population mean ฮผ\mu
  • Standard error equal to ฯƒn\frac{\sigma}{\sqrt{n}}

The conventional guideline is that nโ‰ฅ30n \geq 30 is "large enough" for the CLT to kick in, though populations that are already close to normal need smaller samples, and heavily skewed populations may need larger ones.

To find probabilities involving sample means, you convert to a z-score:

z=xห‰โˆ’ฮผฯƒ/nz = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}

where xห‰\bar{x} is the sample mean, ฮผ\mu is the population mean, ฯƒ\sigma is the population standard deviation, and nn is the sample size.

This works even when the underlying population is uniform, exponential, or some other non-normal shape.

Probability Calculations

Once you have a z-score, you look up the corresponding probability using a standard normal table (or calculator).

Typical problem types:

  • What's the probability that the sample mean falls within a specific range? Convert both endpoints to z-scores and find the area between them.
  • What's the probability that the sample mean exceeds a certain value? Convert to a z-score and find the area in the upper tail.

Example: Suppose a population has ฮผ=50\mu = 50 and ฯƒ=10\sigma = 10. You draw a sample of n=64n = 64. What's the probability that xห‰>52\bar{x} > 52?

  1. Compute the standard error: 1064=1.25\frac{10}{\sqrt{64}} = 1.25

  2. Compute the z-score: 52โˆ’501.25=1.6\frac{52 - 50}{1.25} = 1.6

  3. Look up P(Z>1.6)โ‰ˆ0.0548P(Z > 1.6) \approx 0.0548

So there's about a 5.5% chance the sample mean exceeds 52.

When the population standard deviation ฯƒ\sigma is unknown, you substitute the sample standard deviation ss and use the t-distribution instead of the z-distribution. For large nn, the t-distribution is very close to the standard normal, so the results are similar.

Confidence Intervals with CLT

A confidence interval gives you a range of plausible values for the population mean based on your sample data.

Fundamentals of CLT, Teorema del lรญmite central - Central limit theorem - xcv.wiki

Constructing Confidence Intervals

The general formula is:

xห‰ยฑ(criticalย value)ร—ฯƒn\bar{x} \pm (\text{critical value}) \times \frac{\sigma}{\sqrt{n}}

Here's how to build one step by step:

  1. Compute the sample mean xห‰\bar{x}.
  2. Determine the standard error ฯƒn\frac{\sigma}{\sqrt{n}} (or sn\frac{s}{\sqrt{n}} if ฯƒ\sigma is unknown).
  3. Choose your confidence level (common choices: 90%, 95%, 99%).
  4. Find the critical value. For large samples using the z-distribution, the critical values are approximately 1.645 (90%), 1.96 (95%), and 2.576 (99%).
  5. Calculate the margin of error: critical value ร—\times standard error.
  6. Form the interval: xห‰โˆ’marginย ofย error\bar{x} - \text{margin of error} to xห‰+marginย ofย error\bar{x} + \text{margin of error}.

Three factors control how wide the interval is:

  • Sample size: Larger nn shrinks the standard error, giving a narrower interval.
  • Population variability: Higher ฯƒ\sigma means a wider interval.
  • Confidence level: Higher confidence (say 99% vs. 95%) requires a wider interval.

Interpretation and Application

A 95% confidence interval does not mean there's a 95% chance the population mean is inside your particular interval. It means that if you repeated the sampling process many times, about 95% of the resulting intervals would contain the true mean.

The CLT is what makes these intervals approximately valid for large samples, even when the population isn't normal. Without the CLT, you'd need to know (or assume) the population's exact distribution to build a confidence interval.

Hypothesis Testing with CLT

Hypothesis testing uses sample data to evaluate a claim about a population parameter. The CLT makes this possible for means even when the population distribution is unknown.

Fundamentals of CLT, 6.2 The Sampling Distribution of the Sample Mean (ฯƒ Known) โ€“ Significant Statistics

Fundamentals of Hypothesis Testing

The setup involves two competing statements:

  • Null hypothesis (H0H_0): The default claim, typically "no effect" or "no difference." For example, H0:ฮผ=ฮผ0H_0: \mu = \mu_0.
  • Alternative hypothesis (H1H_1): What you're trying to find evidence for. This can be one-sided (ฮผ>ฮผ0\mu > \mu_0 or ฮผ<ฮผ0\mu < \mu_0) or two-sided (ฮผโ‰ ฮผ0\mu \neq \mu_0).

The test statistic measures how far your sample result is from what H0H_0 predicts:

z=xห‰โˆ’ฮผ0ฯƒ/nz = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}

where ฮผ0\mu_0 is the hypothesized population mean. Thanks to the CLT, this statistic is approximately standard normal for large nn, so you can use z-tables to assess the result.

Testing Approaches and Considerations

There are two common ways to make a decision:

P-value approach:

  1. Calculate the test statistic zz.
  2. Find the p-value: the probability of observing a result at least as extreme as yours, assuming H0H_0 is true.
  3. If the p-value โ‰คฮฑ\leq \alpha (your significance level, often 0.05), reject H0H_0.

Critical value approach:

  1. Calculate the test statistic zz.
  2. Determine the critical value(s) from the significance level and test type (one-tailed or two-tailed).
  3. If zz falls in the rejection region, reject H0H_0.

Both approaches give the same conclusion. The p-value approach is more common because it tells you how much evidence you have against H0H_0, not just whether you crossed a threshold.

Two types of errors to watch for:

  • Type I error: Rejecting H0H_0 when it's actually true. The probability of this equals ฮฑ\alpha.
  • Type II error: Failing to reject H0H_0 when it's actually false. Harder to control, and depends on the true parameter value and sample size.

Limitations of CLT

Assumptions and Sample Size Considerations

The CLT requires that the random variables be independent and identically distributed (i.i.d.). This assumption breaks down with time series data (where observations are correlated) or clustered data (where observations within groups are more similar).

Sample size matters more than the nโ‰ฅ30n \geq 30 rule suggests:

  • For symmetric or mildly skewed populations, n=30n = 30 usually works fine.
  • For heavily skewed distributions (exponential, Pareto), you may need nn in the hundreds before the sampling distribution looks normal.
  • For distributions with very heavy tails, like the Cauchy distribution, the CLT doesn't apply at all because the population mean and variance don't exist.

One common misconception: the CLT does not say that your data becomes normal as you collect more of it. It says the distribution of the sample mean (across many hypothetical samples) becomes normal.

Scope and Alternative Methods

The CLT applies to sums and means of random variables. It does not automatically extend to other statistics like medians, ranges, or variances.

For other types of data, different tools are more appropriate:

  • Proportions: The CLT can approximate binomial proportions when npnp and n(1โˆ’p)n(1-p) are both at least 5 or 10 (depending on the convention). Outside that range, use the binomial distribution directly.
  • Counts of rare events: The Poisson distribution is typically a better model than a normal approximation.
  • Small samples from skewed populations: Consider exact methods, bootstrapping, or non-parametric tests rather than relying on the CLT.