Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Get Started
Why This Matters
Sampling is the backbone of statistical inference—it's how we draw conclusions about millions of data points by examining just a fraction of them. When you're working with massive datasets or surveying populations, you can't analyze everything, so the method you choose to select your sample determines whether your findings are valid, generalizable, and unbiased. This topic connects directly to core concepts like bias-variance tradeoffs, statistical inference, and experimental design.
You're being tested on more than just definitions here. Exam questions will ask you to identify which sampling method fits a given scenario, explain why one technique introduces bias while another doesn't, and evaluate the tradeoffs between cost, precision, and representativeness. Don't just memorize the names—know what problem each technique solves and when it fails.
Probability Sampling Methods
These techniques give every member of the population a known, non-zero chance of being selected. This mathematical foundation is what allows us to make valid statistical inferences and calculate margins of error.
Simple Random Sampling
- Every individual has an equal selection probability—this is the gold standard for eliminating selection bias
- Implementation uses random number generators or lottery methods; requires a complete sampling frame (a list of all population members)
- Best for homogeneous populations where subgroup representation isn't a concern; forms the theoretical basis for most statistical tests
Stratified Sampling
- Divides population into non-overlapping strata based on known characteristics (age, income, region) before sampling within each group
- Guarantees representation of all subgroups—critical when minority groups might be missed by pure random sampling
- Reduces variance and increases precision compared to simple random sampling of the same size; requires prior knowledge of stratifying variables
Systematic Sampling
- Selects every kth element after a random starting point, where k=nN (population size divided by desired sample size)
- Simpler to execute than simple random sampling—no need for random number generation after the initial selection
- Vulnerable to periodicity bias if the list has a hidden pattern that aligns with your sampling interval; always verify list ordering is arbitrary
Cluster Sampling
- Randomly selects entire groups (clusters) rather than individuals—often geographic units like schools, city blocks, or hospitals
- Dramatically reduces costs when population members are physically dispersed; doesn't require a complete list of all individuals
- Trades precision for practicality—sampling error increases if clusters differ significantly from each other; works best when clusters are internally heterogeneous
Compare: Stratified vs. Cluster Sampling—both divide populations into groups, but stratified sampling takes individuals from every stratum while cluster sampling takes all individuals from selected clusters only. If an FRQ asks about reducing variance, stratified is your answer; if it asks about cost efficiency for geographically dispersed populations, go with cluster.
Multi-stage Sampling
- Combines sampling methods hierarchically—typically clusters first, then random or stratified sampling within selected clusters
- Balances representativeness with feasibility for large-scale studies like national surveys or census operations
- Requires careful variance estimation since error compounds at each stage; standard formulas must account for the multi-level design
Compare: Simple Random vs. Multi-stage Sampling—simple random is theoretically optimal but often impractical for large populations. Multi-stage sacrifices some precision for massive gains in cost and logistics. Know when practical constraints justify this tradeoff.
Non-Probability Sampling Methods
These techniques don't give every population member a known chance of selection. They're faster and cheaper but limit your ability to generalize findings or calculate true confidence intervals.
Convenience Sampling
- Selects whoever is easiest to reach—students in your class, people walking by, users who opt in
- High risk of selection bias since accessible individuals often differ systematically from the broader population
- Appropriate only for pilot studies or exploratory research where generalizability isn't the goal; never use for final inference
Quota Sampling
- Sets target numbers for demographic categories (e.g., 50 men, 50 women) but uses non-random selection within each quota
- Ensures demographic diversity without the logistical demands of true stratified sampling
- Selection bias persists within quotas—the researcher chooses which 50 men, introducing subjectivity; common in market research
Compare: Stratified vs. Quota Sampling—both aim for subgroup representation, but stratified uses random selection within strata (probability method) while quota lets researchers pick non-randomly (non-probability). This distinction determines whether you can calculate valid confidence intervals.
Purposive Sampling
- Researcher deliberately selects cases that fit specific criteria or represent particular phenomena of interest
- Maximizes information for qualitative research—choosing "typical" cases, extreme cases, or expert informants
- Cannot support statistical generalization since selection is based on judgment, not probability; findings apply only to cases studied
Snowball Sampling
- Participants recruit other participants through their social networks—each subject refers others who qualify
- Essential for hidden or hard-to-reach populations—undocumented immigrants, people with rare diseases, underground communities
- Sample clusters around initial contacts creating network-based bias; representativeness depends entirely on starting points and network structure
Compare: Convenience vs. Snowball Sampling—both are non-probability methods, but convenience samples whoever's available while snowball specifically leverages social connections. Use snowball when your target population has no sampling frame; use convenience only when you need quick preliminary data.
The Probability vs. Non-Probability Distinction
This isn't just a category—it's the fundamental divide that determines what statistical claims you can make.
Why This Distinction Matters
- Probability methods enable statistical inference—you can calculate standard errors, confidence intervals, and p-values because selection probabilities are known
- Non-probability methods support exploration, not confirmation—useful for generating hypotheses, understanding mechanisms, or accessing difficult populations
- Choosing incorrectly invalidates your analysis—applying inferential statistics to a convenience sample produces meaningless confidence intervals, even if the math runs
Quick Reference Table
|
| Equal selection probability | Simple Random Sampling |
| Guaranteed subgroup representation | Stratified Sampling, Quota Sampling |
| Cost-effective for dispersed populations | Cluster Sampling, Multi-stage Sampling |
| Requires complete sampling frame | Simple Random, Systematic, Stratified |
| Hidden/hard-to-reach populations | Snowball Sampling |
| Exploratory research only | Convenience Sampling, Purposive Sampling |
| Vulnerable to periodicity | Systematic Sampling |
| Valid for statistical inference | All probability methods (Simple Random, Stratified, Cluster, Systematic, Multi-stage) |
Self-Check Questions
-
A researcher wants to survey voters across 50 states but can only afford to visit 10 states. Within those states, she'll randomly select precincts, then randomly select voters within precincts. Which sampling method is this, and why might it introduce more error than simple random sampling?
-
Compare stratified sampling and quota sampling: what key procedural difference determines whether you can calculate a valid margin of error?
-
You're studying individuals with a rare genetic condition that has no registry or public list. Which sampling technique would you use, and what bias should you acknowledge in your findings?
-
A dataset was collected by surveying people who responded to an online ad. A colleague wants to report 95% confidence intervals for population parameters. What's wrong with this approach?
-
When would systematic sampling produce a biased sample even though it's technically a probability method? Give a specific example of how list ordering could cause this problem.