Once we organized the set of data of our interest into a certain display of our choice, the next task is to describe the data. In other words we should tell what we see. There are three things that we should look for and never forget it. The three things are shape, center and spread. Let's discuss one by one.
To describe the shape of the display, check the following:
Skewness. The shapes can be right-skew and left-skew, the least or highest number in distribution pulls it to its side, and so it makes it look skewed. The skewed distribution will have one tail longer than the other, whereas the symmetric distribution has equal tails. If the tail is longer at the left side, then it is called left skewed, and right skewed for the ones that the tail is longer on the right side.
Peaks (modes). Look for the modes in your data. Modes can be seen in histograms, stemplots, and dot-plots, but can not be seen in boxplots. There can be one to multiple modes. Symmetric distributions have one mode and so are called unimodal. Be aware of bimodal (and multimodal) displays and treat them carefully. Bimodal distributions may indicate two possible (more than two) data groups or sources. We call the distribution uniform if it doesn’t have any mode.
Outlier. Beware of outliers. Outliers are extremely low or high numbers. They need special treatment. In most cases, we would like students to analyze data with and without outliers. Those are unusual individuals in our study that are as important as others but can alter our distribution significantly and change all our summary measures.
There are three measures for the center; mean, median, and mode. Mean is the best choice for symmetric distributions; it is the balancing point of the histogram. Median is the best measure to report for skewed distributions since the median is resistant to outliers. Mode is the repeating number, and the most often number you see in your data set, as the word meaning, fashion, in French. In symmetric distributions, these three are about the same or the same for perfect symmetric cases, which is rare in real-world applications.
The center is a good measure, but it is not perfect if we don’t report it with the spread. Range, standard deviation, and IQR (Interquartile Range) are the main indicators to measure the spread or dispersion of our data. In symmetric distributions we report the mean with standard deviation, and in skewed distributions, we report the median with IQR. Those are well paired with each other so do not mess these up. The range which is found by subtracting the minimum value from the maximum value has the disadvantage of hiding the real variability in our data, just looking at the overall difference. Next three sections will discuss the shape, center spread in more detail.