Chi-square test overview
Chi-square tests are non-parametric statistical tests used to analyze categorical data. They compare observed frequencies (what you actually count in your data) to expected frequencies (what you'd expect if the null hypothesis were true). The gap between those two sets of numbers drives the entire test.
In econometrics, chi-square tests show up in three main contexts:
- Goodness of fit: Does your data match a hypothesized distribution?
- Independence: Are two categorical variables related or unrelated?
- Homogeneity: Do different populations share the same distribution of a categorical variable?
Hypothesis testing with chi-square
Like any hypothesis test, you start by setting up two competing claims:
- The null hypothesis () states there's no significant association or difference.
- The alternative hypothesis () states there is a significant association or difference.
You then calculate a test statistic from your data, and either compare it to a critical value from the chi-square distribution table or look at the p-value. If the test statistic is large enough (or the p-value small enough), you reject .
Chi-square distribution properties
The chi-square distribution is a continuous probability distribution that arises from summing squared standard normal random variables. A few key features:
- It's always right-skewed and non-negative (values range from 0 to infinity).
- Its shape depends entirely on the degrees of freedom (df). With low df, the distribution is heavily skewed right. As df increases, it becomes more symmetric and starts to resemble a normal distribution.
- You'll use chi-square distribution tables (or software) to find critical values for your tests.
Degrees of freedom in chi-square
Degrees of freedom (df) represent the number of independent pieces of information that go into calculating the test statistic. The formula depends on which test you're running:
- Goodness of fit: , where is the number of categories.
- Independence or homogeneity (contingency table): , where is the number of rows and is the number of columns.
The degrees of freedom determine which chi-square distribution you compare your test statistic against, so getting this right is essential.
Chi-square goodness of fit test
The goodness of fit test checks whether the observed frequency distribution of a single categorical variable matches a hypothesized theoretical distribution (uniform, normal, Poisson, etc.). You're asking: "Could my data have plausibly come from this distribution?"
Observed vs expected frequencies
- Observed frequencies () are the actual counts you collect from your data for each category.
- Expected frequencies () are what you'd predict for each category if the hypothesized distribution were correct.
To calculate expected frequencies, multiply the total sample size by the probability of each category under the hypothesized distribution:
where is the total sample size and is the hypothesized probability for category .
Calculating the chi-square statistic
The test statistic measures how far your observed data deviates from what's expected:
Here's the step-by-step process:
-
For each category, subtract the expected frequency from the observed frequency: .
-
Square that difference: .
-
Divide by the expected frequency: .
-
Sum across all categories.
A larger value means a bigger discrepancy between what you observed and what the hypothesized distribution predicts.
Interpreting the p-value
The p-value tells you the probability of getting a chi-square statistic at least as large as yours, assuming is true.
- p-value < 0.05 (at the 5% significance level): Reject . Your data doesn't fit the hypothesized distribution.
- p-value ≥ 0.05: Fail to reject . Your data is consistent with the hypothesized distribution.
Note that failing to reject doesn't prove the distribution is correct. It just means you don't have enough evidence to say it's wrong.
Limitations of goodness of fit test
- Expected frequency rule: Each category should have an expected frequency of at least 5. If not, consider combining categories or using an alternative test.
- No direction or magnitude info: The test tells you that the fit is poor, not where or how much it's off in a meaningful way.
- Sensitive to category choices: How you define your categories can change the result. Arbitrary binning decisions matter.
- Alternatives for small samples: Fisher's exact test or the likelihood ratio test may be more reliable when expected frequencies are low.
Chi-square test for independence
This test determines whether two categorical variables are independent or associated. For example, you might ask: "Is there a relationship between a consumer's income bracket and their preferred payment method?"
Contingency tables for categorical data
A contingency table (also called a cross-tabulation) organizes the data for two categorical variables. Rows represent one variable's categories, columns represent the other's, and each cell contains the observed count for that combination.
The marginal totals (row sums and column sums) along the edges are critical because you'll use them to calculate expected frequencies.
Null vs alternative hypotheses
- : The two variables are independent. Knowing the category of one variable tells you nothing about the other.
- : The two variables are dependent (associated). The distribution of one variable changes depending on the category of the other.
Assumptions of the test
Before running the test, verify these conditions:
- The sample is randomly selected from the population.
- Observations are independent of each other (one observation doesn't influence another).
- Expected frequencies in each cell should be at least 5. If this isn't met, consider Fisher's exact test or collapse categories.
Calculating expected frequencies
If the two variables truly are independent, the expected frequency for any cell is:
where is the total for row , is the total for column , and is the grand total.
The logic: under independence, the proportion in any cell should just reflect the product of the marginal proportions.
Computing the chi-square statistic
The formula extends naturally from the goodness of fit version, but now you sum over every cell in the table:
Steps:
-
Calculate the expected frequency for each cell using the formula above.
-
For each cell, compute .
-
Sum all those values across every cell in the table.
Determining the critical value
The critical value comes from the chi-square distribution table, using:
- Degrees of freedom:
- Significance level: typically
If your calculated exceeds the critical value, you reject and conclude the variables are associated.

Making decisions based on p-value
You can also use the p-value approach:
- p-value < : Reject . There's a statistically significant association between the variables.
- p-value ≥ : Fail to reject . You don't have sufficient evidence of an association.
Both the critical value approach and the p-value approach will always give you the same conclusion. Use whichever your course emphasizes.
Chi-square test for homogeneity
The homogeneity test compares the distribution of a single categorical variable across two or more separate populations. For instance: "Do consumers in different age groups have the same distribution of brand preferences?"
Comparing multiple populations
The data is organized in a contingency table just like the independence test. The difference is conceptual: here, each column (or row) represents a distinct population that was sampled separately, and you're comparing their distributions.
Null vs alternative hypotheses
- : All populations have the same distribution of the categorical variable.
- : At least one population has a different distribution.
Calculating the test statistic
The mechanics are identical to the independence test:
-
Compute expected frequencies:
-
Calculate
-
Degrees of freedom:
The formulas are the same. What changes is the research question and how the data was collected (separate samples from distinct populations vs. one sample classified on two variables).
Interpreting the results
- If exceeds the critical value (or p-value < 0.05), reject . The populations don't share the same distribution.
- If you fail to reject, the data is consistent with the populations being homogeneous.
Rejecting tells you the distributions differ, but it doesn't tell you which populations differ from each other. Follow-up pairwise comparisons may be needed.
Applications of chi-square tests
Market research and consumer preferences
Chi-square tests help researchers determine whether consumer preferences differ across demographic groups. For example, a test for independence could examine whether product choice is associated with age group. A test for homogeneity could compare brand preference distributions across income levels to guide marketing strategy.
Quality control and defect analysis
In manufacturing, the goodness of fit test can check whether defect counts follow a Poisson distribution, which would suggest the process is stable. The independence test can explore whether defect types are related to production factors like shift or machine, helping pinpoint quality problems.
Demographic and social science research
Researchers use chi-square tests to study relationships between categorical variables like education level and employment status. The homogeneity test can compare characteristics across populations (urban vs. rural, different regions) to identify disparities that inform policy.
Limitations and alternatives to chi-square
Small sample size and low expected frequencies
The chi-square approximation breaks down when expected cell frequencies are too small (below 5). Results become unreliable because the test statistic no longer follows the chi-square distribution closely. When this happens, you need an alternative approach.
Fisher's exact test for small samples
Fisher's exact test calculates the exact probability of observing your data (or something more extreme) under , rather than relying on the chi-square approximation. It's most commonly applied to contingency tables with small samples. The trade-off: it can be computationally intensive for larger tables, but it gives accurate results when the chi-square test can't.
Yates' correction for continuity
Yates' correction adjusts for the fact that the chi-square distribution is continuous while your data is discrete. It subtracts 0.5 from the absolute difference between observed and expected frequencies before squaring:
This correction is typically applied to tables when sample sizes are moderate. Be aware that it can be overly conservative, making it harder to reject .
Likelihood ratio tests as an alternative
The likelihood ratio test (LRT) compares the likelihood of the data under to the likelihood under . The test statistic is:
LRT tends to perform better than the standard chi-square test when sample sizes are small or data is sparse. It follows the same chi-square distribution asymptotically, so you interpret it the same way (compare to a critical value or use a p-value). The downside is that it can require more computational effort.