The Central Limit Theorem (CLT) explains why we can use normal distribution techniques on data that isn't normally distributed. It states that the sampling distribution of sample means will be approximately normal, regardless of the shape of the population distribution, as long as the sample size is large enough (typically $n \geq 30$ ). The pocket change example makes this concrete: the amount of change people carry is almost certainly skewed, but if you repeatedly sample groups of people and plot the means of those samples, you get a bell curve anyway.

more resources to help you study

practice questions

Simulating the Distribution of Sample Means

Pocket change data works well here because the population distribution is clearly non-normal. Most people carry small amounts of change (lots of values near zero), with a few people carrying much more, creating a right-skewed distribution.

To simulate the CLT with pocket change data:

Define your population (e.g., the amount of change carried by all students at your school).
Draw a large number of random samples (100 or more), each of a fixed size (say $n = 30$ ).
Calculate the mean pocket change for each sample.
Plot all those sample means in a histogram.

Even though the original pocket change distribution is skewed right, the histogram of sample means will look approximately normal and centered on the true population mean. That's the CLT at work.

Note: the guide's original text mentioned "the law of large numbers" here. That's a related but different concept. The law of large numbers says a single sample mean gets closer to the population mean as that sample gets larger. The CLT is specifically about the shape of the sampling distribution becoming normal across many samples.

Simulate the distribution of sample means using pocket change data to demonstrate the Central Limit Theorem, A Simulation Showing the Role of Central Limit Theorem in Handling Non-Normal Distributions

Sample Size Effects on the Sampling Distribution

Shape: As sample size increases, the sampling distribution of the mean becomes more normal. With small samples (say $n = 5$ ), the distribution of sample means may still reflect the skewness of the population. By $n = 30$ or so, the distribution is usually close to normal. The more skewed the population, the larger $n$ needs to be.

Center: The mean of the sampling distribution equals the population mean $\mu$ , regardless of sample size. Whether you're sampling 5 people or 50, the average of all your sample means will land on $\mu$ .

Spread: The standard deviation of the sampling distribution is called the standard error, calculated as:

$SE = \frac{\sigma}{\sqrt{n}}$

As $n$ increases, the standard error decreases, so the sampling distribution gets narrower. This means larger samples produce sample means that cluster more tightly around the true population mean. In practical terms, a sample of 100 people's pocket change will give you a much more reliable estimate of the population mean than a sample of 10.

Simulate the distribution of sample means using pocket change data to demonstrate the Central Limit Theorem, Distribution of Sample Means (3 of 4) | Concepts in Statistics

Probability Calculations with the Central Limit Theorem

Because the CLT tells you the sampling distribution is approximately normal, you can use z-scores to find probabilities about sample means. Here's the process:

Identify the population parameters: the population mean $\mu$ and population standard deviation $\sigma$ of the pocket change data.
Calculate the standard error: $SE = \frac{\sigma}{\sqrt{n}}$ , where $n$ is your sample size.
Compute the z-score for your sample mean:

$z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}$

where $\bar{x}$ is the sample mean you're interested in.

Use the z-table (or calculator) to find the probability associated with that z-score.

For example, if the population mean pocket change is $\mu = \$0.88$ with $\sigma = \$0.60$ , and you take a sample of $n = 36$ , the standard error is $\frac{0.60}{\sqrt{36}} = \$0.10$ . To find the probability that a sample mean exceeds $1.00, you'd calculate $z = \frac{1.00 - 0.88}{0.10} = 1.2$ , then look up $z = 1.2$ to get the corresponding probability.

Statistical Inference and the CLT

Statistical inference means drawing conclusions about a population based on sample data. The CLT is what makes most inference techniques work, because it justifies treating the sampling distribution as normal.

Once you know the sampling distribution is approximately normal, you can:

Construct confidence intervals for the population mean:

$\bar{x} \pm z^* \times \frac{\sigma}{\sqrt{n}}$

where $z^*$ is the critical value for your chosen confidence level (e.g., 1.96 for 95% confidence). If $\sigma$ is unknown and estimated by the sample standard deviation $s$ , you use a $t$ -score instead of $z^*$ .

Conduct hypothesis tests about the population mean using the test statistic:

$z = \frac{\bar{x} - \mu_0}{\frac{\sigma}{\sqrt{n}}}$

where $\mu_0$ is the hypothesized population mean.

A random variable is a variable whose value depends on the outcome of a random process. The amount of pocket change a randomly selected person carries is a random variable, and its probability distribution describes how likely each possible value is. The CLT connects individual random variables to the predictable behavior of sample means, which is why it's so central to the rest of the course.