Fiveable
Fiveable
pep
Fiveable
Fiveable

or

Log in

Find what you need to study


Light

1.9 Comparing Distributions of a Quantitative Variable

6 min readdecember 29, 2022

L

Lusine Ghazaryan

Jed Quiaoit

Jed Quiaoit

L

Lusine Ghazaryan

Jed Quiaoit

Jed Quiaoit

Attend a live cram event

Review all units live with expert teachers & students

We talked a lot about and how to describe, summarize, and represent them in alternative formats; now it's time to put more into practice by comparing multiple sets of data. 🪑

Comparing Groups with Stem-and-Leaf Plots: Warm Up

Before we dive deeper into AP-style questions, which are more descriptive and comprehensive in nature, let's do a warm up question using a familiar graphical method used in statistics: stem plots! 🌳

Question: The weight of two groups of eight animals, Group M and Group N, are recorded and the data is shown in the stem plots below (with each stem and leaf representing weight in kg). Use the stem plots to compare the weight of the animals in the two groups.

Group M:

1 | 4

2 | 3 4 8

3 | 2 6 8

4 |

5 | 0

Group N:

1 | 0

2 | 3 6

3 | 5

4 | 1

5 | 4 7

6 | 2

To compare the two groups, we can look at the distribution of the data and compare the range of the data. From the stem plots, we can see that Group M has weights ranging from 14 to 50 kg, while Group N has weights ranging from 10 to 62 kg. Group N has a wider range of weights, with some animals being significantly heavier than the heaviest animal in Group M.

We can also look at the distribution of the data within each group to see if there are any patterns or trends. For example, we can see that Group M has a cluster of values in the 20s and 30s, while Group N has a more even distribution of values throughout the range. This suggests that Group M has a higher proportion of animals that are relatively similar in weight, while Group N has a more diverse range of weights.

Overall, the stem plots show that Group N has a wider range of weights compared to Group M, with a more diverse distribution of weights within the group!

Comparing Groups with Histograms: Practice AP-Style Problem

Records are kept by each state in the United States on the number of pupils enrolled in public schools and the number of teachers employed by public schools for each school year. From these records, the ratio of the number of pupils to the number of teachers () can be calculated for each state. The below show the for every state during the 2001–2002 school year. The histogram on the left displays the ratios for the 24 states that are west of the Mississippi River, and the histogram on the right displays the ratios for the 26 states that are east of the Mississippi River. 🏫

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-xcDnLqlSOjpS.JPG?alt=media&token=8ad909b1-1ddc-4118-ab9b-44b5a8cdc362

Source: The College Board (via AP Classroom)

The question asks us to estimate the (not to compute but estimate). For states west of the Mississippi (n = 24), n/2, the falls between the 12th and 13th value in the ordered list, and both the 12th and 13th values fall in the interval 15–16. For states east of the Mississippi (n = 26) the falls between the 13th and 14th value in the ordered list, and both of these values also fall in the interval 15–16. So, both groups have at least 15 or at most 16 students per teacher. 

b. Write a few sentences comparing the of P-T ratios for states in the two groups (west and east) during the 2001–2002 school year.

Here, you apply the three things about the distribution: , and one by one. Always start with first. The shapes of the two look different. The histogram for West is unimodal and , whereas the histogram East is unimodal and nearly symmetric.

For the we already found in part (a), that the medians of the two are about the same, between 15 and 16 for both .

Aaaaand finally, report the ! Look at how the values are scattered or concentrated next to its on the . The show that West values vary more than in East. Although the data are grouped but we still can approximate the range. The range for the west is at most 22 – 12 = 10, and the range for the east is at most 19 – 12 = 7. The east has less variability compared to the West.

c. Using your answers in parts (a) and (b), explain how you think the mean during the 2001–2002 school year will compare for the two groups (west and east).

The two have different shapes. Since West is , the mean will be higher and greater than the . The highest number on the right tail will affect the mean number. For East, since it is fairly symmetric, the mean will be close to the . To compare the two groups, we can conclude that the mean for the west group will probably be greater than the mean for the east group.

Comparing Groups with Box Plots: Practice AP-Style Problem

A team of psychologists studied the concept of visualization in basketball, where players visualize making a basket before shooting the ball. They conducted an experiment in which 20 basketball players with similar abilities were randomly assigned to two groups. The 10 players in group 1 received visualization training, and the 10 players in group 2 did not. 🏀

Each player stood 22 feet from the basket at the same location on the basketball court. Each player was then instructed to attempt to make the basket until two consecutive baskets were made. The players who received visualization training were instructed to use visualization techniques before attempting to make the basket. The total number of attempts, including the last two attempts, were recorded for each player.

The total number of attempts for each of the 20 players are summarized in the following .

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-svt0JEXz6TOi.JPG?alt=media&token=a4b1acbb-261e-477a-9405-2e4405128886

Source: The College Board

We have two groups, with 10 basketball players randomly assigned to each group.

We learn from the question that group 1 received visualization training but group 2. There are a few things here we can compare to find the answer to the question. We can see both groups have the same minimum attempts, and all other measures are different.

25% of the time the group 1 made the basket in 3 trials but group 2 in 4 trials.

Now, look at the . The is much lower for group 1 than for group 2. Group 1 has an , which is still less than the maximum of group 2. We can see that the training had an impact on group 1, as all the 5 summary measures are less than from group 2; however, we are not asked to generalize this finding yet.  

Finally, to answer the question: it is good enough only to report the . Because the number of attempts for players who received visualization training (4) is less than the number of attempts for players who did not receive training (7), those who received visualization training tend to need fewer attempts to make two consecutive baskets. Talk about talent and persistence! 🦘

Key Terms to Review (10)

Box Plots

: Box plots, also known as box-and-whisker plots, are graphical representations of a set of data that display the distribution and key statistical measures such as the median, quartiles, and outliers.

Center

: The center refers to the middle or average value of a data set. It represents the typical or central value around which the data tends to cluster.

Distributions

: Distributions refer to the way data is spread out or organized. It shows the frequency or probability of different values occurring in a dataset.

Histograms

: Histograms are graphical representations of data that use bars to show how many times each value or range of values occurs within a dataset.

Median

: The median is the middle value in a set of data when the values are arranged in order. It divides the data into two equal halves.

Outlier

: An outlier is a data point that significantly deviates from the rest of the dataset, either being unusually high or low. It is an observation that lies far away from most other observations.

P-T Ratio

: P-T ratio, also known as probability-to-treatment ratio, refers to comparing probabilities associated with different treatments to determine which treatment option has higher chances for success.

Shape

: In statistics, shape refers to the overall appearance or form of a distribution. It describes how the data is distributed and can be characterized by its symmetry, skewness, or modality.

Skewed to the right

: Skewed to the right refers to a distribution where the tail of the data points extends towards higher values, resulting in a longer right tail compared to the left tail.

Spread

: Spread refers to how much variability or dispersion exists within a data set. It measures how far apart the values are from each other.

1.9 Comparing Distributions of a Quantitative Variable

6 min readdecember 29, 2022

L

Lusine Ghazaryan

Jed Quiaoit

Jed Quiaoit

L

Lusine Ghazaryan

Jed Quiaoit

Jed Quiaoit

Attend a live cram event

Review all units live with expert teachers & students

We talked a lot about and how to describe, summarize, and represent them in alternative formats; now it's time to put more into practice by comparing multiple sets of data. 🪑

Comparing Groups with Stem-and-Leaf Plots: Warm Up

Before we dive deeper into AP-style questions, which are more descriptive and comprehensive in nature, let's do a warm up question using a familiar graphical method used in statistics: stem plots! 🌳

Question: The weight of two groups of eight animals, Group M and Group N, are recorded and the data is shown in the stem plots below (with each stem and leaf representing weight in kg). Use the stem plots to compare the weight of the animals in the two groups.

Group M:

1 | 4

2 | 3 4 8

3 | 2 6 8

4 |

5 | 0

Group N:

1 | 0

2 | 3 6

3 | 5

4 | 1

5 | 4 7

6 | 2

To compare the two groups, we can look at the distribution of the data and compare the range of the data. From the stem plots, we can see that Group M has weights ranging from 14 to 50 kg, while Group N has weights ranging from 10 to 62 kg. Group N has a wider range of weights, with some animals being significantly heavier than the heaviest animal in Group M.

We can also look at the distribution of the data within each group to see if there are any patterns or trends. For example, we can see that Group M has a cluster of values in the 20s and 30s, while Group N has a more even distribution of values throughout the range. This suggests that Group M has a higher proportion of animals that are relatively similar in weight, while Group N has a more diverse range of weights.

Overall, the stem plots show that Group N has a wider range of weights compared to Group M, with a more diverse distribution of weights within the group!

Comparing Groups with Histograms: Practice AP-Style Problem

Records are kept by each state in the United States on the number of pupils enrolled in public schools and the number of teachers employed by public schools for each school year. From these records, the ratio of the number of pupils to the number of teachers () can be calculated for each state. The below show the for every state during the 2001–2002 school year. The histogram on the left displays the ratios for the 24 states that are west of the Mississippi River, and the histogram on the right displays the ratios for the 26 states that are east of the Mississippi River. 🏫

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-xcDnLqlSOjpS.JPG?alt=media&token=8ad909b1-1ddc-4118-ab9b-44b5a8cdc362

Source: The College Board (via AP Classroom)

The question asks us to estimate the (not to compute but estimate). For states west of the Mississippi (n = 24), n/2, the falls between the 12th and 13th value in the ordered list, and both the 12th and 13th values fall in the interval 15–16. For states east of the Mississippi (n = 26) the falls between the 13th and 14th value in the ordered list, and both of these values also fall in the interval 15–16. So, both groups have at least 15 or at most 16 students per teacher. 

b. Write a few sentences comparing the of P-T ratios for states in the two groups (west and east) during the 2001–2002 school year.

Here, you apply the three things about the distribution: , and one by one. Always start with first. The shapes of the two look different. The histogram for West is unimodal and , whereas the histogram East is unimodal and nearly symmetric.

For the we already found in part (a), that the medians of the two are about the same, between 15 and 16 for both .

Aaaaand finally, report the ! Look at how the values are scattered or concentrated next to its on the . The show that West values vary more than in East. Although the data are grouped but we still can approximate the range. The range for the west is at most 22 – 12 = 10, and the range for the east is at most 19 – 12 = 7. The east has less variability compared to the West.

c. Using your answers in parts (a) and (b), explain how you think the mean during the 2001–2002 school year will compare for the two groups (west and east).

The two have different shapes. Since West is , the mean will be higher and greater than the . The highest number on the right tail will affect the mean number. For East, since it is fairly symmetric, the mean will be close to the . To compare the two groups, we can conclude that the mean for the west group will probably be greater than the mean for the east group.

Comparing Groups with Box Plots: Practice AP-Style Problem

A team of psychologists studied the concept of visualization in basketball, where players visualize making a basket before shooting the ball. They conducted an experiment in which 20 basketball players with similar abilities were randomly assigned to two groups. The 10 players in group 1 received visualization training, and the 10 players in group 2 did not. 🏀

Each player stood 22 feet from the basket at the same location on the basketball court. Each player was then instructed to attempt to make the basket until two consecutive baskets were made. The players who received visualization training were instructed to use visualization techniques before attempting to make the basket. The total number of attempts, including the last two attempts, were recorded for each player.

The total number of attempts for each of the 20 players are summarized in the following .

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-svt0JEXz6TOi.JPG?alt=media&token=a4b1acbb-261e-477a-9405-2e4405128886

Source: The College Board

We have two groups, with 10 basketball players randomly assigned to each group.

We learn from the question that group 1 received visualization training but group 2. There are a few things here we can compare to find the answer to the question. We can see both groups have the same minimum attempts, and all other measures are different.

25% of the time the group 1 made the basket in 3 trials but group 2 in 4 trials.

Now, look at the . The is much lower for group 1 than for group 2. Group 1 has an , which is still less than the maximum of group 2. We can see that the training had an impact on group 1, as all the 5 summary measures are less than from group 2; however, we are not asked to generalize this finding yet.  

Finally, to answer the question: it is good enough only to report the . Because the number of attempts for players who received visualization training (4) is less than the number of attempts for players who did not receive training (7), those who received visualization training tend to need fewer attempts to make two consecutive baskets. Talk about talent and persistence! 🦘

Key Terms to Review (10)

Box Plots

: Box plots, also known as box-and-whisker plots, are graphical representations of a set of data that display the distribution and key statistical measures such as the median, quartiles, and outliers.

Center

: The center refers to the middle or average value of a data set. It represents the typical or central value around which the data tends to cluster.

Distributions

: Distributions refer to the way data is spread out or organized. It shows the frequency or probability of different values occurring in a dataset.

Histograms

: Histograms are graphical representations of data that use bars to show how many times each value or range of values occurs within a dataset.

Median

: The median is the middle value in a set of data when the values are arranged in order. It divides the data into two equal halves.

Outlier

: An outlier is a data point that significantly deviates from the rest of the dataset, either being unusually high or low. It is an observation that lies far away from most other observations.

P-T Ratio

: P-T ratio, also known as probability-to-treatment ratio, refers to comparing probabilities associated with different treatments to determine which treatment option has higher chances for success.

Shape

: In statistics, shape refers to the overall appearance or form of a distribution. It describes how the data is distributed and can be characterized by its symmetry, skewness, or modality.

Skewed to the right

: Skewed to the right refers to a distribution where the tail of the data points extends towards higher values, resulting in a longer right tail compared to the left tail.

Spread

: Spread refers to how much variability or dispersion exists within a data set. It measures how far apart the values are from each other.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.