Verified for the 2025 AP Statistics exam•Last Updated on June 18, 2024
Once we finish organizing the set of data of our interest into a certain display of our choice, the next task is to describe the data. In other words we should tell what we see. There are three things that we should look for when trying to find trends and patterns: shape, center and spread. ❗
To describe the shape of the display, check the following:
Symmetry in a histogram can occur if the distribution of the data is symmetric around the central value. This means that if you were to fold the histogram in half, the two halves would be approximately mirror images of each other.
A symmetric distribution is one in which the values on either side of the central value (such as the median or mean) are roughly equal. For example, a bell-shaped curve is a symmetric distribution because the values on either side of the peak are roughly equal.
To determine if a histogram is symmetric, you can visually inspect the shape of the histogram and see if it appears to be roughly symmetrical. You can also use statistical measures such as the mean and median to determine if the distribution is symmetric.
Source: Padhai TimeIt's worth noting that a histogram does not have to be perfectly symmetrical to be considered symmetric. Some degree of skewness or asymmetry is often present in real-world data, and a histogram may still be considered symmetric if the degree of asymmetry is relatively small.
Another way to think about skewness is that there are two types of skewness: positive skewness and negative skewness. Positive skewness occurs when the distribution is skewed to the right, with a long tail on the right side and a shorter tail on the left side. This means that the majority of the values in the distribution are clustered on the left side, with a few values on the right side that are much larger or smaller.
On the other hand, negative skewness occurs when the distribution is skewed to the left, with a long tail on the left side and a shorter tail on the right side. This means that the majority of the values in the distribution are clustered on the right side, with a few values on the left side that are much smaller or larger.
Souce: ResearchGateA distribution can have one mode, in which case it is called a unimodal distribution, or it can have two or more modes, in which case it is called a multimodal distribution. A bimodal distribution, for example, is a distribution with two modes.
It's worth noting that a distribution can have a mode even if it is not symmetrical or has skewed data. For example, a positively skewed distribution (with a long tail on the right side) can still have a mode if there is a value or values that occur more frequently than any other values in the distribution.
Uniform distributions, on the other hand, do not have a mode because all of the values in the distribution occur with roughly the same frequency. In a uniform distribution, there is no single value that stands out as being more common than any other value.
Source: Towards Data ScienceIt's important to be aware of the number and location of modes in a distribution because they can provide valuable insights into the underlying data and how it is distributed. For example, the presence of two modes in a distribution may indicate the presence of two distinct groups or subpopulations within the data.
It's important to be aware of outliers in a dataset because they can skew the results of statistical analyses and cause them to be less representative of the underlying data. For this reason, it is often useful to analyze data both with and without outliers to see how they affect the results.
There are a few different ways to identify outliers in a dataset. One way is to use graphical methods such as boxplots, which can help you visualize the distribution of the data and identify any values that are significantly different from the rest of the data. You can also use statistical measures such as the mean and standard deviation to identify outliers.
Source: MinitabIt's important to remember that outliers are not necessarily bad data, and they should not be automatically excluded from analysis. However, it is important to consider whether the outliers are representative of the underlying data or whether they may be the result of errors or other factors that need to be taken into account when interpreting the results of the analysis.
There are three commonly used measures of the "center" of a distribution: mean, median, and mode. 🎯
In a symmetric distribution, the mean, median, and mode are often close to each other or even equal, depending on the exact shape of the distribution. However, in skewed distributions or datasets with outliers, the mean, median, and mode can be significantly different from each other. It's important to consider which measure of central tendency is most appropriate for a given dataset, taking into account the symmetry or skewness of the data as well as the presence of outliers.
The center is a good measure, but it's definitely not perfect if we don’t report it with the spread. There are several measures that can be used to describe the spread or dispersion of a dataset, including the range, standard deviation, and interquartile range (IQR). 🕸️
In general, it is a good practice to report both the center and spread of a dataset when describing its statistical properties. In symmetric distributions, it is common to report the mean with the standard deviation, while in skewed distributions, it is common to report the median with the IQR. This allows you to provide a more complete description of the distribution of the data and how it is dispersed around the center.
As we go through the next couple of sections, you'll be more familiar and comfortable describing distributions in terms of shape, center, and spread... woo-hoo! 🤩