7.2 The Central Limit Theorem for Sums

2 min readjune 25, 2024

The is a powerful statistical tool. It states that when you add up many , the result tends to follow a normal distribution, even if the individual variables aren't .

This theorem is key for making predictions about large-scale events. It allows us to estimate probabilities for sums of random variables, like total sales or cumulative test scores, using the properties of the normal distribution.

The Central Limit Theorem for Sums

Central Limit Theorem for sums

Top images from around the web for Central Limit Theorem for sums
Top images from around the web for Central Limit Theorem for sums
  • CLT states sum of large number of independent and identically distributed () random variables will be approximately normally distributed, regardless of distribution of individual random variables
    • Applies when is sufficiently large, typically n ≥ 30
    • of sum will be equal to sum of individual means: [μ](https://www.fiveableKeyTerm:μ)X=μX[μ](https://www.fiveableKeyTerm:μ)_{\sum X} = \sum μ_X
    • of sum will be equal to sum of individual variances: σX2=σX2σ^2_{\sum X} = \sum σ^2_X
  • CLT for sums allows making inferences about sum of random variables, even if individual variables are not normally distributed (heights, weights, test scores)
  • The complements the CLT, stating that as sample size increases, the sample mean converges to the true population mean

Mean and standard deviation of sums

  • Calculate mean of distribution of sums by adding means of individual random variables: μX=μX1+μX2+...+μXnμ_{\sum X} = μ_{X_1} + μ_{X_2} + ... + μ_{X_n}
  • Calculate variance of distribution of sums by adding variances of individual random variables: σX2=σX12+σX22+...+σXn2σ^2_{\sum X} = σ^2_{X_1} + σ^2_{X_2} + ... + σ^2_{X_n}
    • is square root of variance: σX=σX2σ_{\sum X} = \sqrt{σ^2_{\sum X}}
  • Properties hold for independent random variables, meaning value of one variable does not affect of another (rolling dice, flipping coins)

Z-scores in sum analysis

  • formula standardizes sum of random variables, allowing calculation of probabilities using : Z=XμσZ = \frac{X - μ}{σ}
    • In context of CLT for sums, XX represents sum of random variables, μμ is mean of distribution of sums, and σσ is standard deviation of distribution of sums
  • Find probability of sum being less than or greater than specific value by calculating and using table or calculator
    • To find P(X<a)P(\sum X < a), calculate Z=aμXσXZ = \frac{a - μ_{\sum X}}{σ_{\sum X}} and find corresponding probability using standard normal distribution (, online calculator)
  • Z-score formula allows making probability statements about sums of random variables, assuming conditions for CLT are met (sample size, independence)

Statistical Inference and Sampling Distributions

  • The CLT for sums is crucial for , allowing us to make conclusions about based on sample data
  • The of a statistic (such as the sum) describes how that statistic varies across different samples from the same population
  • As sample size increases, the becomes more normal, facilitating more accurate statistical inference

Key Terms to Review (26)

Binomial probability distribution: A binomial probability distribution models the number of successes in a fixed number of independent trials, each with the same probability of success. It is defined by two parameters: the number of trials (n) and the probability of success (p).
Central Limit Theorem: The Central Limit Theorem states that when a sample of size 'n' is taken from any population with a finite mean and variance, the distribution of the sample means will tend to be normally distributed as 'n' becomes large, regardless of the original population's distribution. This theorem allows for the use of normal probability models in various statistical applications, making it fundamental for inference and hypothesis testing.
Central limit theorem for means: The Central Limit Theorem for Sample Means states that the distribution of sample means will approximate a normal distribution, regardless of the population's distribution, provided the sample size is sufficiently large. This approximation improves as the sample size increases.
Central limit theorem for sums: The Central Limit Theorem for Sums states that the distribution of the sum of a large number of independent, identically distributed random variables approaches a normal distribution, regardless of the original distribution. This theorem allows for approximations of sums using the normal distribution when sample sizes are sufficiently large.
Convergence: Convergence refers to the tendency of a sequence of values, such as sample means or proportions, to approach a specific target or limiting value as the sample size increases. It is a fundamental concept in probability theory and statistics, particularly in the context of the Central Limit Theorem.
I.i.d.: The term 'i.i.d.' stands for 'independent and identically distributed.' It is a fundamental concept in probability theory and statistics, particularly in the context of the Central Limit Theorem for Sums, which describes the behavior of the sum of a large number of random variables.
Law of large numbers: The Law of Large Numbers states that as the sample size increases, the sample mean will get closer to the population mean. This principle is fundamental in probability and statistics.
Law of Large Numbers: The law of large numbers is a fundamental concept in probability theory that states that as the number of independent trials or observations increases, the average of the results will converge towards the expected value or mean of the probability distribution. This principle underlies the predictability of large-scale events and the reliability of statistical inferences.
Mean: The mean, also known as the average, is a measure of central tendency that represents the arithmetic average of a set of values. It is calculated by summing up all the values in the dataset and dividing by the total number of values. The mean provides a central point that summarizes the overall distribution of the data.
Normally distributed: A normally distributed variable follows a symmetric, bell-shaped curve where most values cluster around the mean. It is characterized by its mean and standard deviation.
Population Parameters: Population parameters are numerical characteristics that describe the entire population of interest. They are the true, underlying values that exist in the population, as opposed to sample statistics which are estimates of those parameters based on a subset of the population.
Probability Distribution: A probability distribution is a mathematical function that describes the likelihood or probability of different possible outcomes or events occurring in a given situation or experiment. It provides a comprehensive representation of the possible values a random variable can take on and their corresponding probabilities.
Random Variables: A random variable is a numerical quantity that is subject to variation due to chance. It is a variable whose value is determined by the outcome of a random phenomenon or experiment. Random variables are a fundamental concept in probability theory and statistics, and they play a crucial role in understanding and analyzing data.
Sample Size: Sample size refers to the number of observations or data points collected in a statistical study or experiment. It is a crucial factor in determining the reliability and precision of the results, as well as the ability to make inferences about the larger population from the sample data.
Sampling distribution: A sampling distribution is the probability distribution of a given statistic based on a random sample. It reflects how the statistic would vary if you repeatedly sampled from the same population.
Sampling Distribution: The sampling distribution is the probability distribution of a statistic, such as the sample mean or sample proportion, obtained from repeated sampling of a population. It describes the variability of the statistic and is a crucial concept in statistical inference, allowing for the assessment of the reliability and precision of sample-based estimates of population parameters.
Sigma (Σ): Sigma (Σ) is a mathematical symbol used to represent the summation or addition of a series of numbers or values. It is a fundamental concept in statistics and is used extensively in various statistical analyses and calculations.
Standard Deviation: Standard deviation is a statistic that measures the dispersion or spread of a set of values around the mean. It helps quantify how much individual data points differ from the average, indicating the extent to which values deviate from the central tendency in a dataset.
Standard normal distribution: The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. It is used to standardize scores from different normal distributions for comparison.
Standard Normal Distribution: The standard normal distribution, also known as the Z-distribution, is a special case of the normal distribution where the mean is 0 and the standard deviation is 1. It is a fundamental concept in statistics that is used to model and analyze data that follows a normal distribution.
Statistical Inference: Statistical inference is the process of using data analysis and probability theory to draw conclusions about a population from a sample. It allows researchers to make educated guesses or estimates about unknown parameters or characteristics of a larger group based on the information gathered from a smaller, representative subset.
Variance: Variance is a statistical measurement that describes the spread or dispersion of a set of data points in relation to their mean. It quantifies how far each data point in the set is from the mean and thus from every other data point. A higher variance indicates that the data points are more spread out from the mean, while a lower variance shows that they are closer to the mean.
Z-score: A z-score represents the number of standard deviations a data point is from the mean. It is used to determine how unusual a particular observation is within a normal distribution.
Z-Score: A z-score is a standardized measure that expresses how many standard deviations a data point is from the mean of a distribution. It allows for the comparison of data points across different distributions by converting them to a common scale.
Z-table: The z-table, also known as the standard normal distribution table, is a statistical tool that provides the probabilities associated with a standard normal distribution. It is a crucial resource for understanding and working with normal distributions, which are fundamental in statistical analysis.
μ: The symbol 'μ' represents the population mean in statistics, which is the average of all data points in a given population. Understanding μ is essential as it serves as a key measure of central tendency and is crucial in the analysis of data distributions, impacting further calculations related to spread, normality, and hypothesis testing.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.