🎲Intro to Statistics Unit 2 Review

Measures of central tendency give you a single number that represents the "typical" value in a dataset. The mean, median, and mode each approach this differently, and each has strengths depending on the shape of your data. Knowing which one to use (and why) is a core skill in descriptive statistics.

Calculation of Mean and Median

Mean

The mean is the arithmetic average: add up all the values and divide by how many there are.

$\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$

$\bar{x}$ is the mean
$\sum_{i=1}^{n} x_i$ is the sum of all values
$n$ is the total number of values

Example: For the dataset {4, 7, 9, 12, 15}, the mean is $\frac{4+7+9+12+15}{5} = 9.4$

The mean uses every value in the dataset, which makes it sensitive to outliers. If you added a value of 100 to that dataset, the mean would jump to 24.5, even though most values are still small.

A weighted mean is a variation where some values count more than others. For instance, if your final exam is worth 40% of your grade and homework is worth 10%, those weights get multiplied by the corresponding scores before averaging.

Median

The median is the middle value when you arrange the data in order from smallest to largest.

Finding the median:

Sort the data in ascending order.
If there's an odd number of values, the median is the single middle value.
If there's an even number of values, the median is the average of the two middle values.

Odd example: For {4, 7, 9, 12, 15}, the median is 9 (the third value out of five).

Even example: For {4, 7, 9, 12, 15, 18}, the median is $\frac{9+12}{2} = 10.5$

The median is much less affected by outliers than the mean. If you replaced 15 with 500 in the odd example, the median would still be 9, while the mean would skyrocket. This is why you'll often see median reported for things like household income, where a few extremely high earners would distort the mean.

Sample vs. Population Means

The formulas for sample and population means look almost identical, but the distinction matters because in practice you rarely have data from an entire population.

	Population Mean	Sample Mean
Symbol	$\mu$	$\bar{x}$
Formula	$\mu = \frac{\sum_{i=1}^{N} x_i}{N}$	$\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$
Uses data from	Every member of the population	A subset (sample) of the population
Example	Average height of all students in a school	Average height of a random sample of 100 students from that school

The population mean ( $\mu$ ) is a fixed value, sometimes called a parameter. You'd use it when you have data on every individual in the group you care about.

The sample mean ( $\bar{x}$ ) is a statistic that estimates $\mu$ . Since measuring an entire population is usually impractical, you collect a representative sample and use $\bar{x}$ as your best estimate. Different samples will give slightly different values of $\bar{x}$ , which is a concept you'll revisit later in the course.

Calculation of mean and median, Mean and Median (1 of 2) | Concepts in Statistics

Mode and Bimodal Datasets

Mode

The mode is the value that appears most frequently in a dataset. Unlike the mean and median, the mode can be used with categorical data (like favorite color or brand preference), not just numerical data.

Unimodal: one mode. In {4, 7, 7, 9, 12, 15}, the mode is 7.
Bimodal: two modes. In {4, 7, 7, 9, 12, 12, 15}, the modes are 7 and 12.
Multimodal: more than two modes.
No mode: every value appears the same number of times. In {4, 7, 9, 12, 15}, there's no mode.

Bimodal Datasets

A bimodal dataset has two distinct peaks, which often signals that two different groups are mixed together. For example, if you measured the heights of all adults in a room, you might see peaks around 163 cm and 178 cm, reflecting typical heights for women and men. Similarly, exam scores might cluster around 65 and 85 if the class has two distinct performance groups.

Recognizing a bimodal pattern is useful because reporting a single mean for bimodal data can be misleading. The mean might fall right between the two peaks, where very few data points actually sit.

Distribution and Central Tendency

The shape of a distribution affects how the mean, median, and mode relate to each other:

Symmetric distribution: The mean, median, and mode are all approximately equal, located at the center.
Right-skewed (positively skewed): The tail stretches to the right. The mean gets pulled toward the tail, so typically: mode < median < mean.
Left-skewed (negatively skewed): The tail stretches to the left. The mean gets pulled toward the tail, so typically: mean < median < mode.

A practical rule: if the data is heavily skewed or contains outliers, the median is usually a better measure of center than the mean. That's why median home prices and median incomes are reported more often than means. For roughly symmetric data without major outliers, the mean works well and has useful mathematical properties for later statistical analysis.

🎲Intro to Statistics Unit 2 Review