Intro to Z-Scores
This section introduces you to z-scores. When I think of statistics, one of the first things that come in my mind is standard deviation and z-scores. So what are the z-scores? z-scores measure the distance of a value from the mean in standard deviations. The formula is simple but very powerful. It is resistant to units, and it can be used to compare any activity.
For this reason, z-scores are also called standardized values. In sports, when the judges have to calculate the final score for athletes, use z- scores. The negative z- score means that the data value is below the meanwhile, the positive means that the data value is higher than the mean. The further the value is from the mean, irrespective of the sign, the unusual the value is. Here is the formula for z-score:
z = x - x̄ / s
As you see, when we are standardizing data into z-scores, we are shifting them by the mean and rescaling by the standard deviation. But how does standardization affect the distribution? In general, shifting data changes the distribution but leaves the shape and spread unchanged. The center shifts with other measures of the position such as percentiles, min., and max by the same amount of value. What about rescaling? You may guess already that with rescaling data when we multiply or divide any number to a data set, the shape of distribution won’t change (it will just look stretched or squeezed), but everything else will change, the mean, min. max., range, IQR, and standard deviation. AP exam MCQs always will ask questions like this to trick you if you know how the shifting and rescaling affect the shape, center, and spread. Get ready!
You may have learned about Normal models or bell-shaped curves in your Algebra class and through calculus. Normal models are appropriate for symmetric and unimodal distributions. The Normal model has two parameters: mean and standard deviation and written as N(mean, sd). These parameters do not come from data but are part of the model.
Standard Normal Model
The Normal model with mean 0 and standard deviation 1 is called the Standard Normal model or the Standard Normal distribution. The Standard model can be written as N(0,1). To standardize the Normal model, we need to subtract from mean and rescale by the standard deviation.
z = x - x̄ / s
Since standardizing doesn’t change the shape of the distribution, nor the Normal model or Standard Normal model cannot be used if the distribution is not unimodal and near symmetric. Like with all the models we work, we have to make an assumption. In real-world data, hardly behaves normally, so our assumption is more realistic than idealistic. We check near Normal condition by looking at the histogram or Normal probability plot. The histogram should look roughly symmetric and be unimodal. The Normal probability plot should look like a straight line.
Don’t model data with a Normal model without checking Nearly Normal Condition. When we choose to make a histogram, we should check quantitative data condition, similar check Nearly Normal Condition, when you work with Normal models.
The 68–95–99.7 Rule
Often we ask ourselves whether we are normal or not. If we are normal, then we should be doing about the same things as the average people do. The 68-95-99.7 rule (Empirical Rule) tells us that if we all behave normally then about 68% of the values fall within 1 standard deviation of the mean, about 95% of the values fall within 2 standard deviations of the mean, and about 99.7%—almost all—of the values fall within 3 standard deviations of the mean.
Source: The College Board
This rule works fine in Normal models, but do not ever try it for skewed distributions as it will fail. For skewed distributions, we can use Chebisheeve’s (a Russian mathematician) rule, but that’s beyond the AP Statistics course. When sketching the Normal model, start with the center and extend the tails to the sides, but you do not need to go beyond 3 standard deviations as there is very little left beyond it and also don’t touch the line because it extends forever. The place where the bell shape starts to curve downward is called the inflection point, which is exactly one standard deviation away from the mean.
🎥 Watch: AP Stats - Normal Distributions
Standard Normal Distribution
Normal Probability Plot