6.1 The Standard Normal Distribution

3 min readjune 25, 2024

The standard is a powerful tool for analyzing data. It uses to measure how far data points are from the mean in terms of standard deviations. This lets us compare values across different datasets easily.

The is a key feature, showing how data clusters around the mean. It tells us that 68% of data falls within one , 95% within two, and 99.7% within three. This helps predict where most data will lie in a .

The Standard Normal Distribution

Z-score calculation for data deviation

Top images from around the web for Z-score calculation for data deviation
Top images from around the web for Z-score calculation for data deviation
  • Calculate using formula z=xμσz = \frac{x - \mu}{\sigma}
    • xx represents individual data point value
    • μ\mu represents population mean
    • σ\sigma represents population standard deviation
  • indicates number of standard deviations a data point is from the mean
    • Positive z-score signifies data point above the mean (right side of distribution)
    • Negative z-score signifies data point below the mean (left side of distribution)
  • Example calculation: Given x=85x = 85, μ=75\mu = 75, and σ=5\sigma = 5, z=85755=2z = \frac{85 - 75}{5} = 2
    • Interpretation: The data point 85 is 2 standard deviations above the mean (75)

Empirical Rule for normal distributions

  • Empirical Rule () applies to normally distributed data
    • 68% of data within 1 standard deviation of mean (μ±1σ\mu \pm 1\sigma)
    • 95% of data within 2 standard deviations of mean (μ±2σ\mu \pm 2\sigma)
    • 99.7% of data within 3 standard deviations of mean (μ±3σ\mu \pm 3\sigma)
  • Calculate percentage of data within specific range using z-scores and Empirical Rule
    • Example: Percentage of data between 70 and 80 with μ=75\mu = 75 and σ=5\sigma = 5
      • Calculate z-scores: z70=70755=1z_{70} = \frac{70 - 75}{5} = -1 and z80=80755=1z_{80} = \frac{80 - 75}{5} = 1
      • Range spans -1 to 1 standard deviations from mean
      • Empirical Rule states 68% of data falls within this range (-1σ to 1σ)
  • The between two z-scores represents the probability of data falling within that range

Interpretation of positive vs negative z-scores

  • Z-scores represent relative position of data point compared to mean
    • Positive z-score indicates data point above mean (right side)
      • Example: z=1.5z = 1.5 means data point is 1.5 standard deviations above mean
    • Negative z-score indicates data point below mean (left side)
      • Example: z=2z = -2 means data point is 2 standard deviations below mean
  • Absolute value of z-score represents distance from mean in standard deviations
    • Larger absolute z-scores indicate data points further from mean
      • Example: z=2.5z = 2.5 further from mean than z=1.2z = 1.2 (2.5σ vs 1.2σ)
  • Z-scores enable comparison of relative positions across different normal distributions
    • Example: z=1.5z = 1.5 in distribution A is equivalent to z=1.5z = 1.5 in distribution B

Additional Concepts in Normal Distribution

  • The describes the shape of the normal distribution curve
  • states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases
  • is the standard deviation of the sampling distribution of a statistic
  • A can be used to assess whether a dataset follows a normal distribution

Key Terms to Review (26)

68-95-99.7 Rule: The 68-95-99.7 rule, also known as the empirical rule, is a statistical principle that describes the distribution of data in a normal or bell-shaped curve. It states that approximately 68% of the data falls within one standard deviation of the mean, 95% of the data falls within two standard deviations of the mean, and 99.7% of the data falls within three standard deviations of the mean. This rule is particularly useful in understanding the Standard Normal Distribution and in applying the Normal Distribution to real-world scenarios.
Area Under the Curve: The area under the curve, in the context of probability and statistics, refers to the region bounded by the x-axis and a continuous probability density function (PDF) curve. This area represents the probability or likelihood of a random variable falling within a specified range of values.
Bell Curve: The bell curve, also known as the normal distribution, is a symmetrical, bell-shaped probability distribution that describes how a set of data is distributed around the mean. It is a fundamental concept in statistics and probability theory, with applications across various fields, including 6.1 The Standard Normal Distribution, 6.2 Using the Normal Distribution, and 7.2 Using the Central Limit Theorem.
Central Limit Theorem: The central limit theorem is a fundamental concept in probability and statistics that states that the sampling distribution of the mean of a random variable will tend to a normal distribution as the sample size increases, regardless of the underlying distribution of the variable.
Cumulative Probability: Cumulative probability refers to the sum of all the probabilities of events up to a certain point. It represents the total likelihood of an event occurring or a value being less than or equal to a specified point on a probability distribution.
Empirical Rule: The Empirical Rule, also known as the 68-95-99.7 rule, is a statistical concept that describes the distribution of data in a normal distribution. It provides a general guideline for understanding the relationship between the standard deviation and the proportion of data that falls within certain ranges around the mean.
Equal standard deviations: Equal standard deviations, also known as homoscedasticity, occur when the variability within each group being compared is similar. This is an important assumption for performing One-Way ANOVA.
Normal distribution: A normal distribution is a continuous probability distribution that is symmetrical and bell-shaped, where most of the observations cluster around the central peak. It is characterized by its mean ($\mu$) and standard deviation ($\sigma$).
Normal Distribution: The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is symmetrical and bell-shaped. It is one of the most widely used probability distributions in statistics and plays a crucial role in various statistical analyses and concepts covered in this course.
Normal Probability Plot: A normal probability plot is a graphical technique used to assess whether a dataset follows a normal distribution. It provides a visual representation of how closely the data aligns with the expected pattern of a normal distribution, allowing for the identification of any significant deviations from normality.
Percentile: A percentile is a statistical measure that indicates the relative position of a value within a distribution. It represents the percentage of values in a dataset that fall below a given value. Percentiles are particularly relevant in the context of the standard normal distribution and when using the normal distribution to make inferences about data.
Percentiles: Percentiles are measures that divide a dataset into 100 equal parts, indicating the relative standing of a value within the data. For example, the 25th percentile (first quartile) is the value below which 25% of the observations fall.
Probability density function: A probability density function (PDF) describes the likelihood of a continuous random variable taking on a particular value. It is represented by a curve where the area under the curve within a given interval represents the probability that the variable falls within that interval.
Probability Density Function: A probability density function (PDF) is a mathematical function that describes the relative likelihood of a continuous random variable taking on a specific value. It provides a way to represent the distribution of a continuous random variable and is a fundamental concept in probability and statistics.
Sigma Notation (Σ): Sigma notation, denoted by the Greek letter Σ, is a concise way to represent the sum of a series of values or the application of a mathematical operation across multiple elements. It is a fundamental concept in statistics and various mathematical disciplines, allowing for the efficient expression and calculation of sums, means, and other statistical measures.
Standard Deviation: Standard deviation is a measure of the spread or dispersion of a set of data around the mean. It quantifies the typical deviation of values from the average, providing insight into the variability within a dataset.
Standard error: Standard error measures the accuracy with which a sample distribution represents a population by using standard deviation. It is crucial for estimating population parameters and conducting hypothesis tests.
Standard Error: The standard error is a measure of the variability or spread of a sample statistic, such as the sample mean. It represents the standard deviation of the sampling distribution of a statistic, indicating how much the statistic is expected to vary from one sample to another drawn from the same population.
Standardization: Standardization is the process of establishing a set of standards or guidelines to ensure consistency, efficiency, and quality across various processes, products, or systems. In the context of statistics, standardization is a technique used to transform data into a common scale, allowing for meaningful comparisons and analyses.
Symmetry: Symmetry refers to the balanced and proportional arrangement of elements or features around a central axis or point. It is a fundamental concept that is closely tied to the measures of center, skewness, and the normal distribution in statistics.
The Central Limit Theorem: The Central Limit Theorem (CLT) states that the distribution of the sample mean approaches a normal distribution as the sample size grows, regardless of the original population's distribution. This theorem is fundamental in inferential statistics because it allows for making predictions about population parameters.
Z-score: A z-score represents the number of standard deviations a data point is from the mean. It is used to determine how unusual or typical a value is within a normal distribution.
Z-Score: A z-score, also known as a standard score, is a statistical measure that expresses how many standard deviations a data point is from the mean of a dataset. It is a fundamental concept in probability and statistics that is widely used in various statistical analyses and hypothesis testing.
Z-scores: A z-score measures how many standard deviations an element is from the mean of a distribution. It is calculated using the formula $z = \frac{x - \mu}{\sigma}$ where $x$ is the value, $\mu$ is the mean, and $\sigma$ is the standard deviation.
Z-test: The z-test is a statistical hypothesis test that uses the standard normal distribution to determine whether the mean of a population is significantly different from a hypothesized value. It is commonly used in various contexts, including the analysis of the Standard Normal Distribution, evaluating Type I and Type II Errors, and selecting the appropriate Probability Distribution for Hypothesis Testing.
μ (Mu): Mu (μ) is a Greek letter commonly used in statistics to represent the population mean or average. It is a central parameter that describes the central tendency or typical value of a population distribution. Mu is a crucial concept in understanding various statistical measures and distributions covered in this course.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.