8.3 Mean, Median and Mode

3 min readjune 18, 2024

measures help us understand the typical value in a dataset. The shows the most common value, the represents the middle value, and the calculates the average. Each measure has its strengths and weaknesses.

These measures behave differently depending on the data's distribution. For symmetrical data, they're often similar. In skewed distributions, they can vary widely. Understanding when to use each measure is crucial for accurate data interpretation.

Measures of Central Tendency

Calculation of central tendency measures

Top images from around the web for Calculation of central tendency measures
Top images from around the web for Calculation of central tendency measures
  • represents the most frequently occurring value in a dataset
    • Datasets can have no mode (no value appears more than once), one mode (), or multiple modes ( or )
    • In grouped data, the is the class interval with the highest frequency (age groups, income brackets)
  • is the middle value when the dataset is arranged in ascending or descending order
    • For an odd number of values, the median is the middle value (5, 7, 9 - median is 7)
    • For an even number of values, the median is the average of the two middle values (4, 6, 8, 10 - median is (6 + 8) / 2 = 7)
    • The median is the 50th percentile, dividing the data into two equal parts
  • is the arithmetic average of all values in a dataset
    • Calculated by summing all values and dividing by the number of values (10, 20, 30, 40 - mean is (10 + 20 + 30 + 40) / 4 = 25)
    • Formula: xˉ=i=1nxin\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}, where xˉ\bar{x} is the mean, xix_i are the individual values, and nn is the number of values

Comparison of central tendency measures

  • Mode is best used for categorical or qualitative data to identify the most common value or category (favorite color, most popular car brand)
  • Median is best used for skewed distributions or datasets with outliers and ordinal data or data with a non-numerical order (median income, median house price)
  • Mean is best used for symmetrical distributions and interval or ratio data, especially when comparing different datasets (average test scores, average height)
  • Mode and median are less affected by extreme values or outliers, making them more robust measures of central tendency compared to the mean

Effects of data distribution

  • Extreme values (outliers) have different effects on the mode, median, and mean
    • Mode is not affected by outliers (1, 2, 3, 3, 4, 100 - mode is still 3)
    • Median is less affected by outliers compared to the mean (1, 2, 3, 4, 100 - median is 3)
    • Mean is highly sensitive to outliers and can be pulled towards the direction of the (1, 2, 3, 4, 100 - mean is 22)
  • In a perfectly , the mode, median, and mean are equal ()
    • For approximately symmetrical distributions, the mean is usually the best measure of central tendency
  • Skewed distributions have different relationships between the mode, median, and mean
    • (): Mode < Median < Mean (income distribution)
    • (): Mean < Median < Mode (age distribution in a retirement community)
    • For skewed distributions, the median is often the best measure of central tendency, as it is less affected by the skewness

Measures of Variability

  • is the simplest measure of variability, calculated as the difference between the maximum and minimum values in a dataset
  • divide the dataset into four equal parts, with Q1 (25th percentile), Q2 (median), and Q3 (75th percentile)
  • measures the average distance of data points from the mean, providing insight into the spread of the data
  • is the square of the standard deviation, representing the average squared deviation from the mean

Key Terms to Review (27)

Arithmetic mean: The arithmetic mean is a measure of central tendency that represents the average of a set of numbers, calculated by summing all values and dividing by the number of values. It provides a single value that summarizes the overall magnitude of a dataset, making it useful for understanding trends and comparing different sets of data. This concept is particularly significant when examining sequences where the mean can reflect regular patterns or shifts in values.
Average (or mean) of a set of: The average (or mean) of a set is the sum of all elements in the set divided by the number of elements. It provides a measure of central tendency, representing a typical value within the set.
Bell curve: A bell curve, also known as a normal distribution, is a graphical representation of data that shows how values are distributed around a central mean. It is characterized by its symmetrical shape, where most values cluster around the mean, and the probabilities for values taper off equally in both directions from the mean. This concept is crucial in understanding statistical measures like mean, median, and mode, as well as in determining variability through range and standard deviation.
Bimodal: Bimodal refers to a statistical distribution that has two different modes, which are the values that appear most frequently in a dataset. This characteristic indicates the presence of two distinct groups or peaks within the data, allowing for a deeper understanding of its structure and variability. Recognizing bimodal distributions is important for analyzing and interpreting data, as it can suggest multiple underlying processes or populations.
Central Tendency: Central tendency is a statistical measure that identifies a single value as representative of an entire dataset. It summarizes the data by providing a central point around which other values cluster, allowing for easier comparison and understanding of the overall distribution. This concept is fundamental in statistics and is commonly expressed through the mean, median, and mode, which each highlight different aspects of data interpretation.
Left-skewed: Left-skewed, or negatively skewed, refers to a distribution where the tail on the left side is longer or fatter than the right side. In this type of distribution, most data points are concentrated on the right side of the graph, causing the mean to be less than the median. Understanding this concept helps in analyzing data sets and interpreting the relationship between mean, median, and mode effectively.
Mathematical modeling: Mathematical modeling involves creating mathematical representations of real-world systems to analyze and predict their behavior. It is widely used in fields like medicine to understand complex biological processes and improve healthcare solutions.
Mean: The mean is the sum of all values in a dataset divided by the number of values. It represents the central value of a dataset.
Mean: The mean, commonly known as the average, is a measure of central tendency that summarizes a set of values by calculating the sum of the values divided by the number of values. It serves as a critical statistical tool that helps in understanding data distributions and making predictions, especially in contexts involving probability and real-world applications like economics or social sciences.
Median: The median is the middle value in a data set when the numbers are arranged in ascending or descending order. If there is an even number of observations, the median is the average of the two middle numbers.
Median: The median is the middle value in a data set when the numbers are arranged in ascending or descending order. It effectively divides the data into two equal halves, making it a useful measure of central tendency, especially when dealing with skewed distributions. The median helps to represent the typical value in a dataset and can be more informative than the mean when there are outliers or extreme values present.
Modal class: The modal class is the category or interval in a frequency distribution that contains the highest frequency of data points. It helps identify where the most common values fall within a dataset, providing insight into trends and patterns. Understanding the modal class is essential for interpreting data effectively, as it highlights the most prevalent range of values in a grouped frequency distribution.
Mode: Mode is the value that appears most frequently in a data set. It is a measure of central tendency used to identify the most common value among a collection of numbers.
Mode: The mode is the value that appears most frequently in a data set, making it a crucial measure of central tendency. Unlike the mean or median, which are influenced by all values in the data set, the mode highlights the most common value, providing insight into the distribution of data. In certain contexts, especially with distributions that may have multiple peaks, understanding the mode can help identify trends and patterns.
Multimodal: Multimodal refers to a statistical distribution that exhibits more than one mode, meaning it has multiple peaks or values that occur with the highest frequency. This characteristic indicates that the data set can be grouped into different clusters, highlighting the presence of multiple subpopulations within the overall dataset. In statistics, recognizing a multimodal distribution can be crucial for accurate data analysis and interpretation, as it often suggests that different processes or factors may be influencing the outcomes being studied.
Negatively skewed: Negatively skewed refers to a distribution where the tail on the left side is longer or fatter than the right side. In such distributions, most of the data points cluster toward the higher end of the scale, creating a scenario where the mean is typically less than the median, which can significantly affect the interpretation of data in various contexts.
Outlier: An outlier is a data point that differs significantly from other observations in a dataset. It can skew the results and may indicate variability in measurement, experimental errors, or a novel phenomenon. Understanding outliers is crucial when interpreting data, as they can influence statistical measures like mean and can affect visual representations such as box plots and scatter plots.
Percentiles: Percentiles are statistical measures that indicate the relative standing of a value within a data set, representing the percentage of observations that fall below a certain value. Understanding percentiles helps in interpreting data distribution, particularly when comparing scores or values within a population. They play a crucial role in summarizing data by providing insights into its distribution and variability.
Positively skewed: Positively skewed refers to a distribution where most of the data points cluster towards the lower end of the scale, with a long tail extending towards the higher values. This means that the mean is typically greater than the median, reflecting that a few high values pull the average up. In this kind of distribution, the mode often appears to be less than both the median and the mean, indicating that there are more low values than high ones.
Quartiles: Quartiles are values that divide a data set into four equal parts, helping to understand the distribution of the data. The first quartile (Q1) is the median of the lower half of the data, the second quartile (Q2) is the median of the entire data set, and the third quartile (Q3) is the median of the upper half of the data. This division provides insights into how data points spread out and allows for a better understanding of measures like range and interquartile range.
Range: Range refers to the set of all possible output values (or dependent variable values) of a function, determined by the inputs in the domain. Understanding range is crucial as it helps to identify the limits of a function's output and how it behaves under different conditions, which can be connected to various mathematical concepts including inequalities, quadratic equations, and statistical measures.
Right-skewed: A right-skewed distribution is a type of probability distribution where the tail on the right side is longer or fatter than the left side. This means that most of the data points are concentrated on the left, with fewer higher values stretching out to the right. In such distributions, the mean is typically greater than the median, which can affect how we interpret central tendency in data sets.
Skewed Distribution: A skewed distribution is a type of probability distribution that is not symmetrical and tends to have a longer tail on one side than the other. This characteristic affects the mean, median, and mode of the data, leading to significant differences in how these measures of central tendency are interpreted. In a skewed distribution, the direction of the skew indicates whether the tail extends toward the higher or lower values, providing insight into the nature of the dataset.
Standard Deviation: Standard deviation is a statistical measure that quantifies the amount of variation or dispersion in a set of data values. It indicates how much individual data points deviate from the mean, helping to understand the distribution and spread of data. A low standard deviation means that the values tend to be close to the mean, while a high standard deviation indicates that the values are spread out over a wider range. This concept is crucial for interpreting expected values, analyzing central tendencies like the mean, median, and mode, and assessing data distributions, including normal distributions.
Symmetrical distribution: Symmetrical distribution refers to a probability distribution where the left and right sides of the graph are mirror images of each other. This means that data values are evenly distributed around a central point, typically the mean, which also coincides with the median and mode in this type of distribution. Such characteristics make symmetrical distributions essential in understanding the properties of data sets and their averages.
Unimodal: Unimodal refers to a statistical distribution that has a single mode or peak, indicating that most of the data points cluster around one central value. This characteristic plays a vital role in understanding how data behaves and helps in determining measures like mean, median, and mode, which describe the center and spread of data sets.
Variance: Variance is a statistical measure that represents the degree of spread or dispersion of a set of values around their mean. It helps quantify how much the values in a data set deviate from the average, providing insight into the consistency and variability of the data. Understanding variance is essential in probability, distributions, and regression analysis as it influences predictions and expectations derived from data.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.