Categorical Data

Categorical data records which group or category each individual falls into (like grade level, yes/no, or car type) rather than a numerical measurement, and on the AP Stats exam it's summarized with counts and proportions and analyzed with z-procedures for proportions and chi-square tests.

Verified for the 2027 AP Statistics examLast updated June 2026

What is Categorical Data?

Categorical data is data that puts each individual into a group. Think eye color, political party, "passed or failed," or favorite streaming service. You're not measuring an amount; you're recording a label. That's the whole distinction from quantitative data, where each value is a number you could average.

Here's why the label matters so much in AP Stats: the type of variable decides everything that comes after. With categorical data you summarize using counts and proportions (not means), you display it with bar charts and two-way tables (not histograms), and you run inference with z-procedures for proportions (Unit 6) or chi-square tests (Unit 8), never a t-test for means. One careful note about disguises: a variable like zip code looks numerical but is actually categorical, because the numbers are just labels and averaging them would be meaningless.

Why Categorical Data matters in AP Statistics

Categorical data is one of the two pillars the entire AP Stats course is built on. Topic 2.1 (AP Stats 2.1.A) asks you to identify questions about relationships in data, and whether your variables are categorical or quantitative determines which relationship tools apply. In Unit 6, learning objectives AP Stats 6.2.A through 6.2.E all start from one categorical variable, leading to a one-sample z-interval for a proportion with margin of error z*√(p̂(1-p̂)/n). In Unit 8, AP Stats 8.1.A frames the big question (is the gap between observed and expected counts random or real?), and AP Stats 8.5.A through 8.5.C have you set up chi-square tests for homogeneity or independence on two-way tables of categorical data. Topic 7.10's skills focus is choosing the right inference procedure, and "categorical or quantitative?" is the very first fork in that decision tree.

How Categorical Data connects across the course

Quantitative Data (Units 1 and 7)

This is the other half of the variable-type split, and it's the most important contrast in the course. Categorical variables get proportions and chi-square; quantitative variables get means and t-procedures. Misclassify the variable and every later choice falls apart.

Contingency Table / Two-Way Table (Units 2 and 8)

Two categorical variables crossed together make a two-way table. In Unit 2 you read it descriptively with marginal and conditional proportions; in Unit 8 that same table becomes the raw material for a chi-square test (AP Stats 8.5.B).

Confidence Interval for a Proportion (Unit 6)

One categorical variable with two outcomes (success/failure) is what makes a proportion possible. AP Stats 6.2.A says the appropriate procedure here is the one-sample z-interval for a proportion, which only exists because the data is categorical.

Bar Chart (Unit 1)

Bar charts are the standard display for categorical data, with gaps between bars because the categories aren't on a number line. If you catch yourself reaching for a histogram, your data probably isn't categorical.

Is Categorical Data on the AP Statistics exam?

AP Stats tests categorical data less by asking "define it" and more by making you choose correctly because of it. Multiple-choice stems describe a scenario and ask which test or interval fits, like "Which test is appropriate for comparing distributions in two-way tables of categorical data?" The trap answers are usually t-procedures, which only work for quantitative data. You'll also see scenarios asking which situations a chi-square test for homogeneity or independence can NOT handle (hint: anything comparing means). On the FRQ side, inference questions award points for naming the correct procedure and checking conditions, so identifying the variable as categorical is literally where your answer starts. For proportions, that means checking randomness, the 10% condition when sampling without replacement, and large counts (AP Stats 6.2.B); for chi-square, it means checking expected counts (AP Stats 8.5.C).

Categorical Data vs Quantitative Data

Categorical data assigns labels (group membership); quantitative data assigns numerical measurements where arithmetic makes sense. The quick test is whether the average is meaningful. Average height? Meaningful, so quantitative. Average zip code? Nonsense, so categorical even though it's written with digits. This single distinction routes you to proportions and chi-square versus means and t-tests.

Key things to remember about Categorical Data

  • Categorical data places each individual into a group or label, like yes/no or vehicle type, instead of recording a numerical measurement.

  • Categorical variables are summarized with counts and proportions, never with means or standard deviations.

  • One categorical variable with two outcomes leads to inference for a proportion, including the one-sample z-interval p̂ ± z*√(p̂(1-p̂)/n) from Topic 6.2.

  • Two categorical variables go into a two-way table, and the chi-square tests for homogeneity and independence (Topic 8.5) check whether their distributions differ or are associated.

  • Numbers can still be categorical if they're just labels, so always ask whether averaging the values would mean anything before classifying a variable.

  • On the exam, correctly identifying data as categorical is the first step in picking the right inference procedure, which is exactly the skill Topic 7.10 targets.

Frequently asked questions about Categorical Data

What is categorical data in AP Stats?

Categorical data records which group or category each individual belongs to, like eye color, political party, or pass/fail. You summarize it with counts and proportions and analyze it with z-procedures for proportions (Unit 6) or chi-square tests (Unit 8).

Is data with numbers always quantitative?

No. If the numbers are just labels, like zip codes, jersey numbers, or area codes, the data is categorical because doing arithmetic on the values would be meaningless. Ask whether the average of the values makes sense; if not, it's categorical.

What's the difference between categorical and quantitative data?

Categorical data assigns group labels; quantitative data assigns numerical measurements you can meaningfully add and average. The split decides your whole analysis: proportions, bar charts, and chi-square for categorical versus means, histograms, and t-procedures for quantitative.

Can you use a t-test on categorical data?

No. T-tests compare means, and categorical data has no meaningful mean. For categorical data you use z-procedures for proportions (one or two samples) or chi-square tests for two-way tables, which is why "categorical or quantitative?" is the first question when choosing a procedure.

Which chi-square test do I use for categorical data in a two-way table?

Per AP Stats 8.5.B, use the test for homogeneity when comparing the distribution of one categorical variable across multiple populations or treatments, and the test for independence when checking whether two categorical variables are associated within a single population. The math is identical; the sampling design and hypotheses differ.