Fiveable

🎲Intro to Statistics Unit 11 Review

QR code for Intro to Statistics practice questions

11.8 Lab 2: Chi-Square Test of Independence

11.8 Lab 2: Chi-Square Test of Independence

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎲Intro to Statistics
Unit & Topic Study Guides

Chi-Square Test of Independence

Chi-Square Test for Categorical Relationships

The chi-square test of independence tells you whether two categorical variables are related or just appear that way by chance. Categorical variables have two or more categories with no inherent order, like gender, race, or political affiliation.

The hypotheses are straightforward:

  • Null hypothesis (H0H_0): The two categorical variables are independent (no relationship).
  • Alternative hypothesis (HaH_a): The two categorical variables are dependent (there is a relationship).

If the null hypothesis is true, the observed frequencies in your data should be close to what you'd expect by chance alone. The test works by measuring how far your observed data strays from those expected values.

Steps to conduct the test:

  1. Build a contingency table with the observed frequencies for each combination of the two variables.
  2. Calculate expected frequencies for every cell: E=(row total)×(column total)grand totalE = \frac{(\text{row total}) \times (\text{column total})}{\text{grand total}}
  3. Calculate the chi-square test statistic by summing across all cells: χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}

where OO is the observed frequency and EE is the expected frequency. 4. Find the degrees of freedom: df=(number of rows1)×(number of columns1)\text{df} = (\text{number of rows} - 1) \times (\text{number of columns} - 1)

  1. Compare your test statistic to the critical value from the chi-square distribution table at your chosen significance level (usually α=0.05\alpha = 0.05):
    • If χ2\chi^2 > critical value → reject H0H_0 and conclude there's a significant association.
    • If χ2\chi^2 < critical value → fail to reject H0H_0. There isn't enough evidence to claim an association.
Chi-square test for categorical relationships, Test of Independence (2 of 3) | Concepts in Statistics

Interpretation of Chi-Square Results

The p-value is the probability of getting a chi-square statistic as extreme as (or more extreme than) what you observed, assuming H0H_0 is true.

  • A small p-value (typically < 0.05) means strong evidence against H0H_0. You conclude the variables are associated.
  • A large p-value (> 0.05) means weak evidence against H0H_0. You can't conclude the variables are associated.

Degrees of freedom affect which chi-square distribution you use and therefore which critical value you compare against. A 3×4 contingency table, for example, gives you df=(31)(41)=6\text{df} = (3-1)(4-1) = 6.

When reporting results, always include:

  • The chi-square test statistic (χ2\chi^2)
  • Degrees of freedom (df)
  • The p-value
  • Your decision: reject or fail to reject H0H_0

For example, you might write: "A chi-square test of independence showed a significant association between political affiliation and opinion on the policy, χ2(2)=11.34\chi^2(2) = 11.34, p=0.003p = 0.003."

Chi-square test for categorical relationships, Test of Independence (1 of 3) | Concepts in Statistics

Limitations of Chi-Square Tests

The test relies on several assumptions that must be met:

  • The sample is randomly selected from the population.
  • The sample size is large enough that every cell's expected frequency is at least 5.
  • Both variables are categorical.
  • Observations are independent of each other (one person's response doesn't influence another's).

Even when assumptions are met, the test has real limitations:

  • No strength or direction. A significant result tells you the variables are related, but not how strongly or in what direction. Measures like Cramér's V or the phi coefficient can fill that gap by quantifying effect size.
  • Sensitive to sample size. With a very large sample, even a trivially small association can produce a statistically significant result. Always consider practical significance alongside statistical significance.
  • No control for confounding variables. If a third variable is driving the relationship between your two variables, the chi-square test won't catch that.
  • Categories must be mutually exclusive and exhaustive. If a person could fall into more than one category, or if your categories don't cover all possibilities, the results can be misleading.

Additional Considerations in Chi-Square Analysis

  • Contingency analysis is the broader method of examining relationships between categorical variables using a contingency table. The chi-square test of independence is the main statistical tool within this framework.
  • Statistical inference is the underlying principle at work here: you're using sample data to draw conclusions about a larger population.
  • Post-hoc analysis comes into play after you get a significant chi-square result. The overall test tells you something is going on, but post-hoc tests (like examining standardized residuals for each cell) help you pinpoint which specific categories are driving the association.