What Are Summary Statistics?
Statistics is a measure taken from the sample to help us analyze the data. The parameter is the measure taken from the population. In inferential statistics, we will use statistics to make inferences about the parameters. But let’s not jump in there and focus on summary statistics. Mean, median, standard deviation, IQR, range, all are summary statistics for a quantitative variable. The mean median, quartiles, and percentiles measure the center and position for quantitative data, whereas the range IQR, and standard deviation measure the variability for quantitative data. The summary measures change if we convert them to different units.
Statistics of Center
Mean or average, as you learned before, is easy to calculate, we add up all the values of the variable and divide the sum by number. The formula follows:
x̄ = ∑x / n
x̄ is read as x bar, it’s the mean value of the x values of data. By the way it doesn't need to be x, it can be y as well. Means are the best summary measures for symmetric distribution, as mentioned before, it is the balancing point of the distributions. The mean has few disadvantages though. It does not tell about all individuals ( that is why we also need summary measures of spread), and it is not resistant to outliers. The mean number can easily be affected by one high value in our data set and affect our study results to make wrong decisions if we wrongly choose to report the mean instead of the median.
Medians is the middle number of data. When data are even we calculate the median by finding the average of the middle two numbers. Medians are good alternatives of summarizing the center of for skewed distributions or distribution with an outlier. The median is resistant to outliers. However, it is not easy to find the median from the histogram, but you don’t need to do it. We will need only to find its position by dividing the total number of our data by 2. If the total amount is odd, we add one ( n/2 for even cases and (n + 1)/2 for odd ones). In the following section, when we compare two histograms, you will see how to find the median from the histogram.
Mean or Median
Rule of thumb! If the histogram is unimodal and symmetric, choose the mean and choose median if the distribution is skewed or has an outlier. In right-skewed distributions, means are always higher than the median. And in left-skewed distributions, the mean is lower than the median. It is ok to report both summary measures and explain why these are different. AP previous exams have MCQs to check if you know how to compare mean and median in skewed distributions. And! Always report units with the summary measures of the center as you do in math class.
Statistics of Spread
The standard deviation is like lungs in statistics. You cannot breathe without it. You cannot analyze data without it. It shows how far or close the values are dispersed, deviated, or vary from the mean. The process of calculating standard deviation is lengthy and time-consuming, and definitely, you already know by now. You will mostly rely on your calculator to do it for you, but in case here is the formula:
s = √[∑(x-x̄)^2/n-1]
You may wonder, if not already before, why subtract one from n? Yes, complicated, but the most straightforward answer is because we want to be close to population parameters (measures). Calculating the standard deviation for the population () we do not need to subtract anything.
As you read more units, you will revisit the concept of standard deviation and will understand it more.
From your algebra course, you may already know how to find the IQR. IQR is just the difference between two quartiles:
IQR = upper quartile (q3) - lower quartile (q1)
However, IQR cannot reflect all the variability of our individual as it is the difference of only two quartiles which leaves part of variability unknown.
Standard Deviation or IQR
IQR is usually slightly larger than the standard deviation if the distribution is symmetric, and there are no outliers. If the distribution is unimodal and symmetric, report standard deviation with mean and like median, report the IQR for the skewed distributions. It is important that we report both center and spread together and more important to pair mean with standard deviation and median with IQR. If we report only spread, it will be awkward.
🎥Watch: AP Stats - Unit 1 Streams