Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Statistical inference is the bridge between what you observe in a sample and what you can conclude about an entire population. Whether you're estimating a population mean, comparing two groups, or determining if variables are related, you're using inference techniques that share common logic: sampling distributions, standard errors, test statistics, and probability.
These techniques aren't isolated tools to memorize separately. They're variations on the same fundamental question: "Could this result have happened by chance?" For each technique, know when to use it, what assumptions it requires, and how to interpret results. That conceptual understanding is what separates students who struggle from those who do well on exams.
These techniques focus on estimating population parameters from sample data. Samples vary, so our estimates carry uncertainty. Good statistics means quantifying that uncertainty.
A point estimate is a single-value best guess for a population parameter. The sample mean () estimates the population mean (), the sample proportion () estimates the population proportion (), and the sample variance () estimates the population variance ().
The limitation is that point estimates don't tell you how close you might be to the true value. That's why they serve as the foundation for confidence intervals and test statistics, which do communicate uncertainty.
A confidence interval gives you a range of plausible values for a parameter. A 95% CI means that if you repeated your sampling process many times and built an interval each time, about 95% of those intervals would capture the true parameter. It does not mean there's a 95% probability the true value is in your specific interval.
Every confidence interval has this structure:
The piece after the is called the margin of error. Wider intervals mean more uncertainty, which typically comes from smaller samples or greater variability in the data.
Compare: Point Estimation vs. Confidence Intervals: both estimate population parameters, but point estimates give a single value while confidence intervals communicate uncertainty. If a problem asks you to "estimate and interpret," you'll almost always need a confidence interval, not just a point estimate.
Hypothesis testing provides a structured framework for deciding whether sample evidence is strong enough to reject a claim about a population. The logic starts by assuming the null hypothesis is true, then asks how surprising your data would be under that assumption.
Here's the process, step by step:
Two types of errors can occur:
Use a z-test when the population standard deviation is known (rare in practice) or when testing a population proportion.
For a mean:
For a proportion:
The test statistic is compared against the standard normal distribution. Proportion tests are the most common z-test you'll see in an intro course.
In most real scenarios, you don't know and have to estimate it with the sample standard deviation . This extra uncertainty is why you use the t-distribution instead of the z-distribution. The t-distribution has heavier tails (more probability in the extremes), especially with small samples. As increases, the t-distribution approaches the standard normal.
Three main types:
Degrees of freedom affect the shape of the t-distribution. For a one-sample t-test, .
Compare: z-Tests vs. t-Tests: both test hypotheses about means, but z-tests require known (rare) while t-tests estimate it from data (common). On exams, if you're given , use z. If you're given or raw data, use t. When in doubt, the t-test is almost always the safer choice.
When your research question involves comparing outcomes across multiple groups, these techniques help determine whether observed differences reflect real population differences or just sampling variability.
ANOVA extends the logic of t-tests to three or more groups at once. Why not just run multiple t-tests? Because each t-test carries a risk of Type I error, and running many of them inflates that risk well beyond your chosen .
The F-statistic is the key measure. It compares between-group variance (how much the group means differ from each other) to within-group variance (how much individual observations vary within each group). A large F value means the group means differ more than you'd expect from random chance alone.
ANOVA requires three assumptions:
A significant ANOVA result tells you at least one group mean differs, but not which ones. You'd need follow-up procedures (like post-hoc tests) to identify specific differences.
Chi-square tests are for categorical data. Use them when both your variables are categorical, not quantitative.
The test statistic measures how far observed counts deviate from what you'd expect if there were no relationship:
where is the observed frequency and is the expected frequency.
Two main applications:
Compare: ANOVA vs. Chi-Square: both compare groups, but ANOVA tests differences in means of a quantitative response across categorical groups, while chi-square tests associations between categorical variables. Know your variable types: quantitative response โ ANOVA; categorical response โ chi-square.
These techniques go beyond testing for differences to model how variables relate to each other, enabling both prediction and understanding of relationships.
Simple linear regression models how a response variable () changes as a predictor variable () changes, using the equation:
Here is the y-intercept and is the slope. The slope tells you the predicted change in for each one-unit increase in .
Inference on the slope is where hypothesis testing meets regression. Testing whether determines if there's a statistically significant linear relationship between and . You can also build a confidence interval for the slope to show the precision of your estimated effect.
Multiple regression includes several predictors, which helps control for confounding variables and improve predictions. When interpreting coefficients in multiple regression, you need to say the effect is "holding other variables constant."
Compare: ANOVA vs. Regression: these are surprisingly similar mathematically. ANOVA is essentially regression with categorical predictors coded as indicator variables. The difference is framing: ANOVA emphasizes group mean comparisons, while regression emphasizes the equation and prediction. A quantitative response with a single categorical predictor can be analyzed with either approach.
These techniques represent more sophisticated approaches to inference. They're less commonly tested in intro courses, but understanding their logic deepens your grasp of statistical thinking.
Maximum likelihood estimation (MLE) finds the parameter values that make the observed data most probable. Think of it this way: given the data you actually collected, what parameter values would have been most likely to generate that sample?
MLE has useful properties for large samples: estimates become unbiased and achieve minimum variance. Many methods you've already encountered (like logistic regression) use MLE under the hood, so understanding it connects procedures that might seem unrelated.
Bayesian inference takes a fundamentally different approach by incorporating prior knowledge. It combines what you believed before seeing data (the prior) with the evidence from data (the likelihood) to produce updated beliefs (the posterior):
The key philosophical difference: frequentist methods (everything else in this guide) treat parameters as fixed unknown values. Bayesian methods treat parameters as having probability distributions. This means Bayesian "credible intervals" have a more intuitive interpretation than confidence intervals. A 95% credible interval actually means there's a 95% probability the parameter falls in that range.
Compare: Maximum Likelihood vs. Bayesian Inference: both estimate parameters, but MLE asks "what parameter maximizes the probability of this data?" while Bayesian asks "what's the probability distribution of the parameter given this data?" MLE is frequentist (parameters are fixed), Bayesian treats parameters as random variables with distributions.
| Concept | Best Examples |
|---|---|
| Estimating parameters | Point Estimation, Confidence Intervals, Maximum Likelihood |
| Testing one mean | z-Test (known ), t-Test (unknown ) |
| Comparing two means | Independent t-Test, Paired t-Test |
| Comparing 3+ means | ANOVA |
| Categorical associations | Chi-Square Test of Independence, Chi-Square Goodness-of-Fit |
| Modeling relationships | Simple Regression, Multiple Regression |
| Quantifying uncertainty | Confidence Intervals, Bayesian Credible Intervals |
| Decision-making framework | Hypothesis Testing, p-values, Significance Level |
You want to determine if average test scores differ across four teaching methods. Which technique should you use, and why would multiple t-tests be problematic?
Compare confidence intervals and hypothesis testing: How are they related, and what does it mean when a 95% CI for a mean difference doesn't include zero?
A researcher has categorical data on political party affiliation and opinion on a policy issue. Which technique tests whether these variables are associated, and what does the test statistic measure?
When would you choose a t-test over a z-test for comparing a sample mean to a hypothesized value? What assumption about the population makes this distinction necessary?
Explain how regression analysis and ANOVA are conceptually similar. If you had a quantitative response and a single categorical predictor with three levels, could you use either approach?