Measures of central tendency and variability are the core tools for summarizing datasets into meaningful numbers. They give you a snapshot of what's typical in your data and how spread out the values are. Graphical displays like histograms and box plots let you see those patterns visually, making it easier to spot outliers and understand the overall shape of a distribution.
Measures of Central Tendency and Variability
Central Tendency
Central tendency gives you a single value that represents the "center" of your data. There are three main measures, and each one works best in different situations.
- Mean: Add up all the values and divide by how many there are. If five students scored 70, 80, 85, 90, and 95 on an exam, the mean is .
- Median: The middle value when you line up all data points from smallest to largest. With those same five scores, the median is 85 (the third value). For an even number of values, average the two middle ones.
- Mode: The value that appears most often. If the scores were 70, 80, 80, 90, 95, the mode is 80.
Choosing the right measure matters. The mean is sensitive to extreme values, so it works best with symmetric distributions. If one student scored 200 on extra credit, the mean would jump, but the median would barely budge. That's why median is preferred for skewed data, like household incomes in a city where a few very wealthy residents pull the mean upward. Mode is most useful for categorical data or identifying the most common outcome.
Variability
Variability tells you how spread out the data is. Two classes could have the same average test score, but one might have scores clustered tightly around the mean while the other has scores all over the place.
- Range: Maximum value minus minimum value. Simple but easily distorted by a single outlier. If weekly temperatures ranged from 58°F to 82°F, the range is 24°F.
- Variance: The average of the squared deviations from the mean. Denoted for a population and for a sample. Squaring the deviations ensures negative and positive differences don't cancel out.
- Standard deviation: The square root of the variance, denoted for a population and for a sample. This brings the units back to the original scale, making it much easier to interpret than variance.
A larger standard deviation means the data points are more spread out from the mean. A smaller one means they're clustered closer together. For example, test scores with a standard deviation of 3 points are much more consistent than scores with a standard deviation of 15 points.
Graphical Displays of Data
Histograms
Histograms show the distribution of a continuous variable by grouping values into intervals (called bins).
- The x-axis shows the range of values divided into bins (e.g., age groups 0–10, 10–20, 20–30).
- The y-axis shows the frequency or relative frequency of data points in each bin.
- The shape of the histogram reveals the distribution: bell-shaped (symmetric), skewed left or right, bimodal (two peaks), or uniform (roughly flat).

Box Plots
Box plots (also called box-and-whisker plots) summarize a distribution using five key statistics:
- Minimum: the smallest value (excluding outliers)
- Q1 (first quartile): the value below which 25% of the data falls
- Median (Q2): the middle value (50th percentile)
- Q3 (third quartile): the value below which 75% of the data falls
- Maximum: the largest value (excluding outliers)
The "box" spans from Q1 to Q3, capturing the middle 50% of the data. The interquartile range (IQR) is , and it's a useful measure of spread that isn't affected by outliers. Points plotted beyond the whiskers are typically flagged as potential outliers.
Analyzing Any Graph
When you look at a graphical display, focus on four things:
- Shape: Is it symmetric, skewed left, skewed right, bimodal, or uniform?
- Center: Where does the typical value fall (mean or median)?
- Spread: How variable is the data (range, IQR, standard deviation)?
- Outliers: Are there any data points far from the rest?
Scatter plots are another common display. They plot pairs of values to show the relationship between two continuous variables. Patterns in the scatter plot can reveal positive, negative, or no correlation.
Z-Scores for Data Comparison
A z-score tells you how many standard deviations a data point is from the mean. This is calculated with:
where is the data point, is the mean, and is the standard deviation.
- A z-score of 0 means the value equals the mean.
- A positive z-score means the value is above the mean. A z-score of +2 means it's two standard deviations above.
- A negative z-score means the value is below the mean. A z-score of -1.5 means it's 1.5 standard deviations below.
The real power of z-scores is comparing values from different distributions. Say you scored 82 on a biology exam (mean 75, SD 5) and 78 on a chemistry exam (mean 70, SD 4). Which performance was better relative to the class?
-
Biology z-score:
-
Chemistry z-score:
Your chemistry score was actually more impressive relative to the class, even though the raw score was lower. That's what standardizing does: it converts everything to a common scale with a mean of 0 and a standard deviation of 1.
Larger absolute z-scores indicate values that are more unusual. A z-score of +3.2 or -2.8 would be quite far from the center of the distribution.
Population, Sample, and Descriptive Statistics
- A population is the entire group you're interested in studying (e.g., all registered voters in a state).
- A sample is a subset of that population that you actually collect data from (e.g., 500 randomly selected voters).
Descriptive statistics summarize and describe the features of a dataset. Everything covered in this unit, including measures of central tendency, variability, graphical displays, and z-scores, falls under descriptive statistics. They describe what the data looks like rather than making predictions or generalizations beyond the data.
A frequency distribution organizes data into categories or intervals and shows how often each value or range occurs. This is the foundation for building histograms and understanding how your data is distributed.