Chi-Square Tests
Chi-square tests analyze categorical data to determine whether observed patterns differ from what we'd expect. The three types of chi-square tests all use the same basic formula, but they answer different research questions. Knowing which test to use and how to set up the hypotheses is the core skill here.
Types of Chi-Square Tests
Goodness-of-fit test checks whether a single categorical variable follows a specific distribution you have in mind. You're comparing observed frequencies against expected frequencies from a theoretical distribution.
- Involves one categorical variable with two or more categories
- Example: You roll a die 120 times and want to know if each face comes up equally often. Your expected frequency for each face is 20. You compare your observed counts to those expected counts.
- Another example: A candy company claims 30% of its candies are blue, 20% red, and 50% green. You buy a bag and check whether your observed proportions match.
Independence test checks whether two categorical variables are associated or unrelated within a single population.
- Involves two categorical variables, each with two or more levels
- Uses a contingency table to organize the data
- Example: You survey 500 college students and record both gender (male/female/nonbinary) and political affiliation (Democrat/Republican/Independent). The question is whether political affiliation depends on gender, or whether the two variables are independent of each other.
Homogeneity test checks whether the distribution of a single categorical variable is the same across different populations or groups.
- Involves one categorical variable and one grouping variable, each with two or more levels
- Also uses a contingency table
- Example: You sample smokers and non-smokers separately from three age groups (young, middle-aged, elderly) and ask whether the proportion of smokers is the same across all three groups.
Independence vs. Homogeneity: These two tests use the same formula and the same contingency table setup. The difference is in the study design. In an independence test, you draw one sample and measure two variables on each subject. In a homogeneity test, you draw separate samples from different populations and compare the distribution of one variable across those groups.

Selecting the Appropriate Chi-Square Test
Choosing the right test comes down to two questions: how many categorical variables are involved, and what is the research question asking?
- Count your variables. If the question involves only one categorical variable measured against a known or claimed distribution, use the goodness-of-fit test.
- If you have two categorical variables, decide between independence and homogeneity:
- Ask "Was one sample drawn, and two variables recorded on each subject?" If yes, use the independence test (e.g., surveying students and recording both major and study habits).
- Ask "Were separate samples drawn from distinct populations?" If yes, use the homogeneity test (e.g., sampling from three different cities and comparing the proportion who prefer public transit).

Hypotheses for Chi-Square Tests
Each test has its own way of framing the null and alternative hypotheses. All chi-square tests are non-directional (there's no "greater than" or "less than"), so the alternative hypothesis simply states that the null isn't true.
Goodness-of-fit:
- : The observed frequency distribution fits the specified distribution.
- : The observed frequency distribution does not fit the specified distribution.
Independence:
- : The two categorical variables are independent (no association).
- : The two categorical variables are dependent (there is an association).
Homogeneity:
- : The distribution of the categorical variable is the same across all populations.
- : The distribution of the categorical variable is not the same across all populations.
Interpreting Chi-Square Test Results
Once you run the test, you need to make sense of the output. Here are the key pieces:
- p-value: The probability of getting a chi-square statistic at least as large as yours, assuming the null hypothesis is true. If the p-value is less than your significance level (commonly 0.05), you reject the null.
- Critical value: The threshold on the chi-square distribution that marks the boundary of the rejection region. If your test statistic exceeds the critical value, you reject the null. This approach gives the same conclusion as the p-value method.
- Degrees of freedom: For goodness-of-fit, where is the number of categories. For independence and homogeneity, where is the number of rows and is the number of columns.
- Effect size: A significant result tells you the variables aren't independent, but not how strong the relationship is. Cramér's V is a common effect size measure for chi-square tests, ranging from 0 (no association) to 1 (perfect association).
- Standardized residuals: After finding a significant result, these tell you which specific cells in the contingency table contributed most to the overall chi-square statistic. A standardized residual with an absolute value greater than about 2 suggests that cell is a major contributor.
- Post-hoc analysis: When a test with more than two categories is significant, post-hoc comparisons help you pinpoint exactly where the differences lie, similar to how post-hoc tests work after a significant ANOVA.