Fiveable

🎲Intro to Statistics Unit 11 Review

QR code for Intro to Statistics practice questions

11.3 Test of Independence

11.3 Test of Independence

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🎲Intro to Statistics
Unit & Topic Study Guides

Chi-Square Test of Independence

The chi-square test of independence determines whether two categorical variables are related or if they vary independently of each other. It's one of the most common hypothesis tests you'll encounter for categorical data, and it shows up everywhere from medical research to marketing surveys.

Test Statistic Calculation

The chi-square test works by comparing what you actually observed in your data to what you'd expect to see if the two variables had no relationship at all.

The test statistic formula:

χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}

  • OO = observed frequency (the actual count in each cell of your contingency table)
  • EE = expected frequency (the count you'd predict if the variables were independent)

How to find expected frequencies:

For each cell in the contingency table, use this formula:

E=(row total)×(column total)grand totalE = \frac{(\text{row total}) \times (\text{column total})}{\text{grand total}}

This gives you the count you'd expect in that cell if the two variables had nothing to do with each other.

Degrees of freedom:

df=(r1)(c1)df = (r - 1)(c - 1)

  • rr = number of rows (categories of one variable)
  • cc = number of columns (categories of the other variable)

For example, a 3×2 contingency table has (31)(21)=2(3-1)(2-1) = 2 degrees of freedom.

Test statistic calculation for independence, Chi square calculator - wikidoc

Interpretation of Results

The hypotheses for this test are always structured the same way:

  • Null hypothesis (H0H_0): The two variables are independent (no association).
  • Alternative hypothesis (HaH_a): The two variables are not independent (there is an association).

To make your decision, compare the calculated χ2\chi^2 statistic to the critical value from the chi-square distribution table (using your degrees of freedom and significance level, typically α=0.05\alpha = 0.05). You can also compare a p-value to α\alpha if your calculator or software gives you one.

  • If χ2>\chi^2 > critical value (or p-value <α< \alpha): Reject H0H_0. There is sufficient evidence that the two variables are associated.
  • If χ2<\chi^2 < critical value (or p-value >α> \alpha): Fail to reject H0H_0. There is not enough evidence to conclude the variables are associated.

One thing to watch: rejecting H0H_0 tells you the variables are associated, but it does not tell you the direction or strength of that association. You'd need to look back at the contingency table to describe the pattern.

Test statistic calculation for independence, Test of Independence (3 of 3) | Concepts in Statistics

Applying the Test: Step-by-Step

Here's how to carry out a chi-square test of independence from start to finish:

  1. Identify your two categorical variables. For example, you might ask whether a person's exercise frequency (none, moderate, daily) is related to their stress level (low, medium, high).

  2. Organize your data into a contingency table with observed frequencies for every combination of categories.

  3. Calculate expected frequencies for each cell using E=(row total)×(column total)grand totalE = \frac{(\text{row total}) \times (\text{column total})}{\text{grand total}}.

  4. Compute the test statistic χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E} by finding the contribution from each cell and summing them all.

  5. Find the degrees of freedom: (r1)(c1)(r - 1)(c - 1).

  6. Look up the critical value in a chi-square table using your df and significance level (usually 0.05), or use your calculator to find the p-value.

  7. Make your conclusion. If the test statistic exceeds the critical value, you have evidence of an association. If not, you lack sufficient evidence to say the variables are related.

Always state your conclusion in context. Don't just say "reject H0H_0." Say something like: "There is sufficient evidence at the 0.05 significance level to conclude that exercise frequency and stress level are associated."

Statistical Inference and Assumptions

The chi-square test of independence is a form of statistical inference: you're using sample data to draw a conclusion about the broader population. It follows the standard hypothesis testing framework you've used throughout the course.

A few things worth knowing:

  • The chi-square test is nonparametric, meaning it doesn't assume your data follow a normal distribution. That's one reason it works well for categorical data.
  • The test does assume that expected frequencies are not too small. A common guideline is that every expected cell count should be at least 5. If some cells fall below that, your results may not be reliable.
  • The chi-square distribution is always right-skewed and only takes non-negative values, so this is always a right-tail test.