๐ŸŽฒIntro to Statistics

Key Statistical Inference Techniques

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Statistical inference is the bridge between what you observe in a sample and what you can conclude about an entire population. Whether you're estimating a population mean, comparing two groups, or determining if variables are related, you're using inference techniques that share common logic: sampling distributions, standard errors, test statistics, and probability.

These techniques aren't isolated tools to memorize separately. They're variations on the same fundamental question: "Could this result have happened by chance?" For each technique, know when to use it, what assumptions it requires, and how to interpret results. That conceptual understanding is what separates students who struggle from those who do well on exams.


Estimation: Quantifying What We Don't Know

These techniques focus on estimating population parameters from sample data. Samples vary, so our estimates carry uncertainty. Good statistics means quantifying that uncertainty.

Point Estimation

A point estimate is a single-value best guess for a population parameter. The sample mean (xห‰\bar{x}) estimates the population mean (ฮผ\mu), the sample proportion (p^\hat{p}) estimates the population proportion (pp), and the sample variance (s2s^2) estimates the population variance (ฯƒ2\sigma^2).

The limitation is that point estimates don't tell you how close you might be to the true value. That's why they serve as the foundation for confidence intervals and test statistics, which do communicate uncertainty.

Confidence Intervals

A confidence interval gives you a range of plausible values for a parameter. A 95% CI means that if you repeated your sampling process many times and built an interval each time, about 95% of those intervals would capture the true parameter. It does not mean there's a 95% probability the true value is in your specific interval.

Every confidence interval has this structure:

pointย estimateยฑcriticalย valueร—standardย error\text{point estimate} \pm \text{critical value} \times \text{standard error}

The piece after the ยฑ\pm is called the margin of error. Wider intervals mean more uncertainty, which typically comes from smaller samples or greater variability in the data.

Compare: Point Estimation vs. Confidence Intervals: both estimate population parameters, but point estimates give a single value while confidence intervals communicate uncertainty. If a problem asks you to "estimate and interpret," you'll almost always need a confidence interval, not just a point estimate.


Hypothesis Testing: Making Decisions with Data

Hypothesis testing provides a structured framework for deciding whether sample evidence is strong enough to reject a claim about a population. The logic starts by assuming the null hypothesis is true, then asks how surprising your data would be under that assumption.

Hypothesis Testing Framework

Here's the process, step by step:

  1. State your hypotheses. H0H_0 represents "no effect" or "no difference." HaH_a represents the claim you're trying to find evidence for.
  2. Check conditions. Each test has assumptions (randomness, sample size, etc.) that must be met.
  3. Calculate the test statistic. This measures how far your sample result is from what H0H_0 predicts, in standardized units.
  4. Find the p-value. This is the probability of observing results as extreme as yours if H0H_0 were true. A small p-value (typically < 0.05) means your data would be surprising under H0H_0.
  5. Make a decision. If the p-value is less than your significance level ฮฑ\alpha, reject H0H_0. Otherwise, fail to reject it. (You never "accept" H0H_0.)

Two types of errors can occur:

  • Type I error: rejecting a true H0H_0 (false positive). The significance level ฮฑ\alpha controls this rate.
  • Type II error: failing to reject a false H0H_0 (false negative). This is harder to control and relates to the power of the test.

z-Tests

Use a z-test when the population standard deviation ฯƒ\sigma is known (rare in practice) or when testing a population proportion.

For a mean: z=xห‰โˆ’ฮผ0ฯƒ/nz = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}

For a proportion: z=p^โˆ’p0p0(1โˆ’p0)/nz = \frac{\hat{p} - p_0}{\sqrt{p_0(1 - p_0)/n}}

The test statistic is compared against the standard normal distribution. Proportion tests are the most common z-test you'll see in an intro course.

t-Tests

In most real scenarios, you don't know ฯƒ\sigma and have to estimate it with the sample standard deviation ss. This extra uncertainty is why you use the t-distribution instead of the z-distribution. The t-distribution has heavier tails (more probability in the extremes), especially with small samples. As nn increases, the t-distribution approaches the standard normal.

Three main types:

  • One-sample t-test: compares a sample mean to a hypothesized value
  • Independent two-sample t-test: compares means from two separate groups
  • Paired t-test: compares means from matched observations (before/after, twins, etc.)

Degrees of freedom affect the shape of the t-distribution. For a one-sample t-test, df=nโˆ’1df = n - 1.

Compare: z-Tests vs. t-Tests: both test hypotheses about means, but z-tests require known ฯƒ\sigma (rare) while t-tests estimate it from data (common). On exams, if you're given ฯƒ\sigma, use z. If you're given ss or raw data, use t. When in doubt, the t-test is almost always the safer choice.


Comparing Groups: Testing for Differences

When your research question involves comparing outcomes across multiple groups, these techniques help determine whether observed differences reflect real population differences or just sampling variability.

Analysis of Variance (ANOVA)

ANOVA extends the logic of t-tests to three or more groups at once. Why not just run multiple t-tests? Because each t-test carries a risk of Type I error, and running many of them inflates that risk well beyond your chosen ฮฑ\alpha.

The F-statistic is the key measure. It compares between-group variance (how much the group means differ from each other) to within-group variance (how much individual observations vary within each group). A large F value means the group means differ more than you'd expect from random chance alone.

ANOVA requires three assumptions:

  • Independence of observations
  • Normality within each group (less critical with larger samples)
  • Equal variances across groups (called homogeneity of variance); violations matter more when group sizes are unequal

A significant ANOVA result tells you at least one group mean differs, but not which ones. You'd need follow-up procedures (like post-hoc tests) to identify specific differences.

Chi-Square Tests

Chi-square tests are for categorical data. Use them when both your variables are categorical, not quantitative.

The test statistic measures how far observed counts deviate from what you'd expect if there were no relationship:

ฯ‡2=โˆ‘(Oโˆ’E)2E\chi^2 = \sum \frac{(O - E)^2}{E}

where OO is the observed frequency and EE is the expected frequency.

Two main applications:

  • Test of independence: Are two categorical variables related? For example, is there an association between smoking status and lung disease diagnosis?
  • Goodness-of-fit: Does an observed distribution match a hypothesized one? For example, does a die produce each number with equal frequency?

Compare: ANOVA vs. Chi-Square: both compare groups, but ANOVA tests differences in means of a quantitative response across categorical groups, while chi-square tests associations between categorical variables. Know your variable types: quantitative response โ†’ ANOVA; categorical response โ†’ chi-square.


Modeling Relationships: Prediction and Explanation

These techniques go beyond testing for differences to model how variables relate to each other, enabling both prediction and understanding of relationships.

Regression Analysis

Simple linear regression models how a response variable (yy) changes as a predictor variable (xx) changes, using the equation:

y^=b0+b1x\hat{y} = b_0 + b_1 x

Here b0b_0 is the y-intercept and b1b_1 is the slope. The slope tells you the predicted change in yy for each one-unit increase in xx.

Inference on the slope is where hypothesis testing meets regression. Testing whether ฮฒ1=0\beta_1 = 0 determines if there's a statistically significant linear relationship between xx and yy. You can also build a confidence interval for the slope to show the precision of your estimated effect.

Multiple regression includes several predictors, which helps control for confounding variables and improve predictions. When interpreting coefficients in multiple regression, you need to say the effect is "holding other variables constant."

Compare: ANOVA vs. Regression: these are surprisingly similar mathematically. ANOVA is essentially regression with categorical predictors coded as indicator variables. The difference is framing: ANOVA emphasizes group mean comparisons, while regression emphasizes the equation and prediction. A quantitative response with a single categorical predictor can be analyzed with either approach.


Advanced Approaches: Beyond the Basics

These techniques represent more sophisticated approaches to inference. They're less commonly tested in intro courses, but understanding their logic deepens your grasp of statistical thinking.

Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) finds the parameter values that make the observed data most probable. Think of it this way: given the data you actually collected, what parameter values would have been most likely to generate that sample?

MLE has useful properties for large samples: estimates become unbiased and achieve minimum variance. Many methods you've already encountered (like logistic regression) use MLE under the hood, so understanding it connects procedures that might seem unrelated.

Bayesian Inference

Bayesian inference takes a fundamentally different approach by incorporating prior knowledge. It combines what you believed before seeing data (the prior) with the evidence from data (the likelihood) to produce updated beliefs (the posterior):

P(ฮธโˆฃdata)โˆP(dataโˆฃฮธ)ร—P(ฮธ)P(\theta | \text{data}) \propto P(\text{data} | \theta) \times P(\theta)

The key philosophical difference: frequentist methods (everything else in this guide) treat parameters as fixed unknown values. Bayesian methods treat parameters as having probability distributions. This means Bayesian "credible intervals" have a more intuitive interpretation than confidence intervals. A 95% credible interval actually means there's a 95% probability the parameter falls in that range.

Compare: Maximum Likelihood vs. Bayesian Inference: both estimate parameters, but MLE asks "what parameter maximizes the probability of this data?" while Bayesian asks "what's the probability distribution of the parameter given this data?" MLE is frequentist (parameters are fixed), Bayesian treats parameters as random variables with distributions.


Quick Reference Table

ConceptBest Examples
Estimating parametersPoint Estimation, Confidence Intervals, Maximum Likelihood
Testing one meanz-Test (known ฯƒ\sigma), t-Test (unknown ฯƒ\sigma)
Comparing two meansIndependent t-Test, Paired t-Test
Comparing 3+ meansANOVA
Categorical associationsChi-Square Test of Independence, Chi-Square Goodness-of-Fit
Modeling relationshipsSimple Regression, Multiple Regression
Quantifying uncertaintyConfidence Intervals, Bayesian Credible Intervals
Decision-making frameworkHypothesis Testing, p-values, Significance Level

Self-Check Questions

  1. You want to determine if average test scores differ across four teaching methods. Which technique should you use, and why would multiple t-tests be problematic?

  2. Compare confidence intervals and hypothesis testing: How are they related, and what does it mean when a 95% CI for a mean difference doesn't include zero?

  3. A researcher has categorical data on political party affiliation and opinion on a policy issue. Which technique tests whether these variables are associated, and what does the test statistic measure?

  4. When would you choose a t-test over a z-test for comparing a sample mean to a hypothesized value? What assumption about the population makes this distinction necessary?

  5. Explain how regression analysis and ANOVA are conceptually similar. If you had a quantitative response and a single categorical predictor with three levels, could you use either approach?