Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Sampling is the backbone of statistical inference—it's how we draw conclusions about millions of data points by examining just a fraction of them. When you're working with massive datasets or surveying populations, you can't analyze everything, so the method you choose to select your sample determines whether your findings are valid, generalizable, and unbiased. This topic connects directly to core concepts like bias-variance tradeoffs, statistical inference, and experimental design.
You're being tested on more than just definitions here. Exam questions will ask you to identify which sampling method fits a given scenario, explain why one technique introduces bias while another doesn't, and evaluate the tradeoffs between cost, precision, and representativeness. Don't just memorize the names—know what problem each technique solves and when it fails.
These techniques give every member of the population a known, non-zero chance of being selected. This mathematical foundation is what allows us to make valid statistical inferences and calculate margins of error.
Compare: Stratified vs. Cluster Sampling—both divide populations into groups, but stratified sampling takes individuals from every stratum while cluster sampling takes all individuals from selected clusters only. If an FRQ asks about reducing variance, stratified is your answer; if it asks about cost efficiency for geographically dispersed populations, go with cluster.
Compare: Simple Random vs. Multi-stage Sampling—simple random is theoretically optimal but often impractical for large populations. Multi-stage sacrifices some precision for massive gains in cost and logistics. Know when practical constraints justify this tradeoff.
These techniques don't give every population member a known chance of selection. They're faster and cheaper but limit your ability to generalize findings or calculate true confidence intervals.
Compare: Stratified vs. Quota Sampling—both aim for subgroup representation, but stratified uses random selection within strata (probability method) while quota lets researchers pick non-randomly (non-probability). This distinction determines whether you can calculate valid confidence intervals.
Compare: Convenience vs. Snowball Sampling—both are non-probability methods, but convenience samples whoever's available while snowball specifically leverages social connections. Use snowball when your target population has no sampling frame; use convenience only when you need quick preliminary data.
This isn't just a category—it's the fundamental divide that determines what statistical claims you can make.
| Concept | Best Examples |
|---|---|
| Equal selection probability | Simple Random Sampling |
| Guaranteed subgroup representation | Stratified Sampling, Quota Sampling |
| Cost-effective for dispersed populations | Cluster Sampling, Multi-stage Sampling |
| Requires complete sampling frame | Simple Random, Systematic, Stratified |
| Hidden/hard-to-reach populations | Snowball Sampling |
| Exploratory research only | Convenience Sampling, Purposive Sampling |
| Vulnerable to periodicity | Systematic Sampling |
| Valid for statistical inference | All probability methods (Simple Random, Stratified, Cluster, Systematic, Multi-stage) |
A researcher wants to survey voters across 50 states but can only afford to visit 10 states. Within those states, she'll randomly select precincts, then randomly select voters within precincts. Which sampling method is this, and why might it introduce more error than simple random sampling?
Compare stratified sampling and quota sampling: what key procedural difference determines whether you can calculate a valid margin of error?
You're studying individuals with a rare genetic condition that has no registry or public list. Which sampling technique would you use, and what bias should you acknowledge in your findings?
A dataset was collected by surveying people who responded to an online ad. A colleague wants to report 95% confidence intervals for population parameters. What's wrong with this approach?
When would systematic sampling produce a biased sample even though it's technically a probability method? Give a specific example of how list ordering could cause this problem.