Fiveable
Fiveable
pep
Fiveable
Fiveable

or

Log in

Find what you need to study


Light

1.4 Representing a Categorical Variable with Graphs

7 min readdecember 28, 2022

L

Lusine Ghazaryan

Jed Quiaoit

Jed Quiaoit

L

Lusine Ghazaryan

Jed Quiaoit

Jed Quiaoit

Attend a live cram event

Review all units live with expert teachers & students

You might recall from earlier that can be represented using tables and/or graphs. This section will provide more context that'll equip us with the ability to eventually construct and describe numerical or of data distributions. 👍

As for why graphs are big in statistics, and statistics are powerful tools for understanding and summarizing data. Graphs can help you visualize the patterns and relationships in your data, and statistics can help you quantify and describe those patterns. By using both and statistics, you can gain a deeper understanding of your data and communicate that understanding to others!

Bar Graphs

(or bar graphs) are used to display frequencies (counts) or relative frequencies (proportions) for categorical data. The height or length of each bar in a corresponds to either the number or proportion of observations falling within

each category. 📊

To create a , you first need to decide on the categories you want to include. Each category corresponds to a separate bar on the graph. The height of each bar represents the frequency or count of observations in that category. All the bars have the same width, and there is a gap between adjacent bars to distinguish them from each other. 📏

When translated into a step-by-step procedure, here's how we would create a :

  1. Determine the categories you want to include in the graph.

  2. Count the number of observations in each category.

  3. Mark the frequencies on the vertical axis and the categories on the horizontal axis.

  4. Draw the bars, with the height of each bar representing the frequency of the corresponding category.

  5. Add a title and axis labels to the graph to help interpret the data.

It's important to choose an appropriate and consistent scale for the vertical axis. You should also consider adding a legend to the graph if you have multiple series of data that you want to compare.

To keep it short, here is the of stress on the job. We can also use relative frequencies or percentages to construct the . You can be creative and color each category with a different color. It will be  visually attractive and easier to compare them.

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-jdsIRDnpdW2j.JPG?alt=media&token=e049a57b-3b5d-4c78-a7ab-c5c22087177b

Source: Prem S. Mann: Introductory Statistics. John Wiley and Sons Inc. 2020

Pie Charts

A is a circular graph that is divided into slices, with each slice representing a different category. The size of each slice is proportional to the fraction of the whole that is represented by that category. are often used to show the relative proportions of different categories within a dataset. 🥧

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-zHiC0k75UEDf.webp?alt=media&token=d07994d2-c840-49bc-998c-b3af4d39a1d9

To create a , you'll have to keep the following steps in mind:

  1. Determine the categories you want to include in the . (Example: Commuter, non-commuter)

  2. Calculate the fraction of the whole that is represented by each category (Example: Out of 50 respondents, 30 commuters would occupy 3/5ths of the pie, while 20 non-commuters would occupy the remaining 2/5ths of the pie).

  3. Draw a circle and divide it into slices that are proportional to the fractions calculated in step 2.

  4. Label each slice with the corresponding category and the percentage it represents.

  5. Add a title to the to help interpret the data.

It's important to keep in mind that are best used to compare the relative proportions (percentages and relative frequencies, for example) of different categories. They're not as effective at showing precise values or small differences between categories. If you want to show detailed values or compare the values of multiple categories, it is usually better to use a different type of graph, such as a bar chart.

💡 Tips:

  • The choice between and will depend on how many categories that variable of your interest assumes and the size of it. Whenever you have many categories or few categories with about the same frequencies, then the should be your first choice. If the pie has many slices or slices of the same size, it will be hard to compare the groups.

  • Be careful of quantity distortions and keeping the area principle.

Contingency Table (Two-Way Table)

Now that we know how to represent data in tables and charts, let's add one more character to the tables gang to keep things evenly balanced!

A contingency table is a type of table that is used to organize and (later on) analyze categorical data. It shows how the observations in a dataset are distributed among different categories of two or more variables. Contingency tables can help in understanding relationships between variables and identifying patterns or trends in the data. 🎨

To create a contingency table, you'll have to:

  1. Determine the variables you want to include in the table.

  2. Count the number of observations in each category of each variable.

  3. Organize the counts in a table, with each row representing a category of one variable and each column representing a category of the other variable.

  4. Add row and column totals to the table. (This step is the easiest to forget!)

  5. Analyze the table to identify any patterns or trends in the data. (This is important when establishing context and responding to Multiple Choice and Free Response Questions in the AP exam!)

If the numbers in the cells of the contingency table are the same for all categories, we can say that the variables are independent, If the numbers in the cells are different for different categories (with some having higher values than others), then the variables might be related. For example, if you are analyzing data on the relationship between gender and income, you might find that the proportions of men and women in different income categories are different, indicating some sort of relationship between the two variables.

🎥 Watch: AP Stats - Analyzing Categorical Data

Key Vocabulary

Real-Life Applications: To Trust or Not To Trust a Bar/Pie Chart?

Chances are, you've probably seen a bar or in some shape or form before in the news, media you consume, or even other textbooks. It's important to remember that they shouldn't be taken immediately at face value as they could be easily misused. To help inform whether bar/ are reliable or not, here are examples of ways they are commonly misused:

  • Using bar/ to compare variables on different scales: Charts are best used to compare categories or groups that are on the same scale. If you are comparing variables that are on different scales, it can be difficult to accurately compare the sizes of the bars/pie slices.

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-AikDQ6iXfeun.png?alt=media&token=38bc6b36-fe1e-4e31-9ec0-9cb8502fe12b

Source: Infogram

  • Using bar/ to show continuous data: Charts are best used to show categorical data, not continuous data. If you have continuous data, it is usually better to use a different type of graph, such as a line graph or scatterplot.

  • Using bar/ to show small differences: Charts are not very effective at showing small differences between categories. If the differences between the categories are small, it may be difficult to accurately interpret the graph.

  • Using bar/ to show trends over time: Charts are not well suited for showing trends over time. For this purpose, it is usually better to use a line graph or a time series plot.

  • Using bar/ to show more than two variables: Charts are typically used to compare two variables. If you want to show more than two variables, it is usually better to use a different type of graph. The example below compares A, B, and C; here, you can see that it might make more sense to use a bar chart over a .

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FPiecharts-JQZrMIv6JO1z.png?alt=media&token=df58733b-3fa5-4a91-b190-3bf32bea7c45

Source: Wikipedia

  • Using bar/ to show a false impression of size: Truncated ( that don't start at a y-value of 0) can be misleading if the truncation is not clearly labeled or if the truncation is done in a way that distorts the data. For example, if the truncation is done at an arbitrary value, it could give the impression that the data is more evenly distributed than it really is. See the example below and notice how the differences between 2010 and 2011 are more noticeable in a truncated (left) compared to the usual (right).

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-QmbA4KiKbskt.png?alt=media&token=099f16ae-ae53-48b6-974e-475e898031ea

Source: Wikipedia

In today's age where misinformation can easily and quickly spread, it's very important to choose the appropriate type of graph for your data and the message you want to convey. Carefully considering the limitations of each type of graph can help you avoid misusing (or mistrusting) them! 🤨

Key Terms to Review (11)

Bar charts

: Bar charts are graphical representations of categorical data using rectangular bars. The height or length of each bar corresponds to the frequency or relative frequency of a particular category.

Bar Graph

: A bar graph is used to display categorical data using rectangular bars whose lengths represent the frequencies or counts for each category. The categories are usually displayed on the x-axis, while the heights or lengths represent the corresponding frequencies on the y-axis.

Bar Graphs

: Bar graphs are a type of graphical representation that use rectangular bars to represent different categories or groups. They are commonly used to compare categorical data by showing the frequency or proportion of each category.

Categorical Variables

: Categorical variables represent characteristics or qualities that fall into distinct categories or groups. They cannot be measured numerically but instead provide information about group membership or classification.

Contingency Table (Two-Way Table)

: A contingency table, also known as a two-way table, is a tabular representation of categorical data that shows how two variables are related. It displays frequencies or counts for different combinations of categories from both variables.

Frequency Table

: A frequency table is a tabular representation that displays the number of times each value or category occurs in a dataset. It organizes data into groups or intervals and shows their corresponding frequencies.

Graphical representations

: Graphical representations are visual tools used to display data in a clear and organized manner. They help us understand patterns and relationships within the data.

Pie Chart

: A pie chart is a circular graph divided into sectors, where each sector represents a category and its size corresponds to the proportion or percentage of data values in that category.

Pie Charts

: Pie charts are circular graphs that represent data as slices of a whole. Each slice represents a category or group, and the size of each slice corresponds to the proportion or percentage of data it represents.

Relative Frequency Table

: A relative frequency table is a table that shows the proportion or percentage of data values that fall into different categories or intervals.

Two-Way Table

: A two-way table organizes categorical data for two variables and shows how they are related to each other.

1.4 Representing a Categorical Variable with Graphs

7 min readdecember 28, 2022

L

Lusine Ghazaryan

Jed Quiaoit

Jed Quiaoit

L

Lusine Ghazaryan

Jed Quiaoit

Jed Quiaoit

Attend a live cram event

Review all units live with expert teachers & students

You might recall from earlier that can be represented using tables and/or graphs. This section will provide more context that'll equip us with the ability to eventually construct and describe numerical or of data distributions. 👍

As for why graphs are big in statistics, and statistics are powerful tools for understanding and summarizing data. Graphs can help you visualize the patterns and relationships in your data, and statistics can help you quantify and describe those patterns. By using both and statistics, you can gain a deeper understanding of your data and communicate that understanding to others!

Bar Graphs

(or bar graphs) are used to display frequencies (counts) or relative frequencies (proportions) for categorical data. The height or length of each bar in a corresponds to either the number or proportion of observations falling within

each category. 📊

To create a , you first need to decide on the categories you want to include. Each category corresponds to a separate bar on the graph. The height of each bar represents the frequency or count of observations in that category. All the bars have the same width, and there is a gap between adjacent bars to distinguish them from each other. 📏

When translated into a step-by-step procedure, here's how we would create a :

  1. Determine the categories you want to include in the graph.

  2. Count the number of observations in each category.

  3. Mark the frequencies on the vertical axis and the categories on the horizontal axis.

  4. Draw the bars, with the height of each bar representing the frequency of the corresponding category.

  5. Add a title and axis labels to the graph to help interpret the data.

It's important to choose an appropriate and consistent scale for the vertical axis. You should also consider adding a legend to the graph if you have multiple series of data that you want to compare.

To keep it short, here is the of stress on the job. We can also use relative frequencies or percentages to construct the . You can be creative and color each category with a different color. It will be  visually attractive and easier to compare them.

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-jdsIRDnpdW2j.JPG?alt=media&token=e049a57b-3b5d-4c78-a7ab-c5c22087177b

Source: Prem S. Mann: Introductory Statistics. John Wiley and Sons Inc. 2020

Pie Charts

A is a circular graph that is divided into slices, with each slice representing a different category. The size of each slice is proportional to the fraction of the whole that is represented by that category. are often used to show the relative proportions of different categories within a dataset. 🥧

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-zHiC0k75UEDf.webp?alt=media&token=d07994d2-c840-49bc-998c-b3af4d39a1d9

To create a , you'll have to keep the following steps in mind:

  1. Determine the categories you want to include in the . (Example: Commuter, non-commuter)

  2. Calculate the fraction of the whole that is represented by each category (Example: Out of 50 respondents, 30 commuters would occupy 3/5ths of the pie, while 20 non-commuters would occupy the remaining 2/5ths of the pie).

  3. Draw a circle and divide it into slices that are proportional to the fractions calculated in step 2.

  4. Label each slice with the corresponding category and the percentage it represents.

  5. Add a title to the to help interpret the data.

It's important to keep in mind that are best used to compare the relative proportions (percentages and relative frequencies, for example) of different categories. They're not as effective at showing precise values or small differences between categories. If you want to show detailed values or compare the values of multiple categories, it is usually better to use a different type of graph, such as a bar chart.

💡 Tips:

  • The choice between and will depend on how many categories that variable of your interest assumes and the size of it. Whenever you have many categories or few categories with about the same frequencies, then the should be your first choice. If the pie has many slices or slices of the same size, it will be hard to compare the groups.

  • Be careful of quantity distortions and keeping the area principle.

Contingency Table (Two-Way Table)

Now that we know how to represent data in tables and charts, let's add one more character to the tables gang to keep things evenly balanced!

A contingency table is a type of table that is used to organize and (later on) analyze categorical data. It shows how the observations in a dataset are distributed among different categories of two or more variables. Contingency tables can help in understanding relationships between variables and identifying patterns or trends in the data. 🎨

To create a contingency table, you'll have to:

  1. Determine the variables you want to include in the table.

  2. Count the number of observations in each category of each variable.

  3. Organize the counts in a table, with each row representing a category of one variable and each column representing a category of the other variable.

  4. Add row and column totals to the table. (This step is the easiest to forget!)

  5. Analyze the table to identify any patterns or trends in the data. (This is important when establishing context and responding to Multiple Choice and Free Response Questions in the AP exam!)

If the numbers in the cells of the contingency table are the same for all categories, we can say that the variables are independent, If the numbers in the cells are different for different categories (with some having higher values than others), then the variables might be related. For example, if you are analyzing data on the relationship between gender and income, you might find that the proportions of men and women in different income categories are different, indicating some sort of relationship between the two variables.

🎥 Watch: AP Stats - Analyzing Categorical Data

Key Vocabulary

Real-Life Applications: To Trust or Not To Trust a Bar/Pie Chart?

Chances are, you've probably seen a bar or in some shape or form before in the news, media you consume, or even other textbooks. It's important to remember that they shouldn't be taken immediately at face value as they could be easily misused. To help inform whether bar/ are reliable or not, here are examples of ways they are commonly misused:

  • Using bar/ to compare variables on different scales: Charts are best used to compare categories or groups that are on the same scale. If you are comparing variables that are on different scales, it can be difficult to accurately compare the sizes of the bars/pie slices.

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-AikDQ6iXfeun.png?alt=media&token=38bc6b36-fe1e-4e31-9ec0-9cb8502fe12b

Source: Infogram

  • Using bar/ to show continuous data: Charts are best used to show categorical data, not continuous data. If you have continuous data, it is usually better to use a different type of graph, such as a line graph or scatterplot.

  • Using bar/ to show small differences: Charts are not very effective at showing small differences between categories. If the differences between the categories are small, it may be difficult to accurately interpret the graph.

  • Using bar/ to show trends over time: Charts are not well suited for showing trends over time. For this purpose, it is usually better to use a line graph or a time series plot.

  • Using bar/ to show more than two variables: Charts are typically used to compare two variables. If you want to show more than two variables, it is usually better to use a different type of graph. The example below compares A, B, and C; here, you can see that it might make more sense to use a bar chart over a .

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FPiecharts-JQZrMIv6JO1z.png?alt=media&token=df58733b-3fa5-4a91-b190-3bf32bea7c45

Source: Wikipedia

  • Using bar/ to show a false impression of size: Truncated ( that don't start at a y-value of 0) can be misleading if the truncation is not clearly labeled or if the truncation is done in a way that distorts the data. For example, if the truncation is done at an arbitrary value, it could give the impression that the data is more evenly distributed than it really is. See the example below and notice how the differences between 2010 and 2011 are more noticeable in a truncated (left) compared to the usual (right).

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-QmbA4KiKbskt.png?alt=media&token=099f16ae-ae53-48b6-974e-475e898031ea

Source: Wikipedia

In today's age where misinformation can easily and quickly spread, it's very important to choose the appropriate type of graph for your data and the message you want to convey. Carefully considering the limitations of each type of graph can help you avoid misusing (or mistrusting) them! 🤨

Key Terms to Review (11)

Bar charts

: Bar charts are graphical representations of categorical data using rectangular bars. The height or length of each bar corresponds to the frequency or relative frequency of a particular category.

Bar Graph

: A bar graph is used to display categorical data using rectangular bars whose lengths represent the frequencies or counts for each category. The categories are usually displayed on the x-axis, while the heights or lengths represent the corresponding frequencies on the y-axis.

Bar Graphs

: Bar graphs are a type of graphical representation that use rectangular bars to represent different categories or groups. They are commonly used to compare categorical data by showing the frequency or proportion of each category.

Categorical Variables

: Categorical variables represent characteristics or qualities that fall into distinct categories or groups. They cannot be measured numerically but instead provide information about group membership or classification.

Contingency Table (Two-Way Table)

: A contingency table, also known as a two-way table, is a tabular representation of categorical data that shows how two variables are related. It displays frequencies or counts for different combinations of categories from both variables.

Frequency Table

: A frequency table is a tabular representation that displays the number of times each value or category occurs in a dataset. It organizes data into groups or intervals and shows their corresponding frequencies.

Graphical representations

: Graphical representations are visual tools used to display data in a clear and organized manner. They help us understand patterns and relationships within the data.

Pie Chart

: A pie chart is a circular graph divided into sectors, where each sector represents a category and its size corresponds to the proportion or percentage of data values in that category.

Pie Charts

: Pie charts are circular graphs that represent data as slices of a whole. Each slice represents a category or group, and the size of each slice corresponds to the proportion or percentage of data it represents.

Relative Frequency Table

: A relative frequency table is a table that shows the proportion or percentage of data values that fall into different categories or intervals.

Two-Way Table

: A two-way table organizes categorical data for two variables and shows how they are related to each other.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.