Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
Sampling is the backbone of statistical inference—and you're being tested on your ability to choose the right method for a given scenario, not just define terms. Every dataset you analyze in data science starts with how the data was collected, and flawed sampling leads to biased estimates, invalid confidence intervals, and misleading conclusions. The exam will push you to understand when each method works, why it reduces (or introduces) bias, and how sampling design affects the validity of your inferences.
The key concepts here revolve around randomization, representativeness, and practical constraints. You'll need to distinguish between probability and non-probability methods, recognize trade-offs between precision and cost, and identify when a sampling approach threatens external validity. Don't just memorize the list—know what statistical principle each method leverages and what can go wrong when assumptions are violated.
These methods ensure every member of the population has a known, non-zero probability of selection. This property is what allows us to calculate standard errors, construct confidence intervals, and make valid inferences about the population.
Compare: Stratified vs. Cluster Sampling—both divide populations into groups, but stratified sampling takes individuals from every stratum while cluster sampling takes all individuals from selected clusters. If an FRQ asks about reducing variance, stratified is your answer; if it asks about reducing costs for spread-out populations, think cluster.
Compare: Systematic vs. Simple Random Sampling—both aim for equal probability selection, but systematic is operationally simpler. The catch: SRS is always unbiased, while systematic sampling can be biased if population ordering has periodicity. When in doubt on an exam, SRS is the safer theoretical baseline.
These methods do not give all population members a known chance of selection. Statistical inference becomes problematic because you cannot calculate valid standard errors or confidence intervals—results describe only your sample, not the population.
Compare: Quota vs. Stratified Sampling—both ensure subgroup representation, but stratified uses random selection within strata while quota uses researcher judgment. Exam tip: if a question describes "ensuring 30% of respondents are from each region" without mentioning random selection, it's quota sampling, and you should flag the bias risk.
The choice between methods depends on research goals, available resources, and acceptable trade-offs. Probability methods support inference; non-probability methods sacrifice validity for practicality.
| Consideration | Probability Methods | Non-Probability Methods |
|---|---|---|
| Valid inference | Yes—standard errors calculable | No—cannot generalize |
| Bias control | Randomization eliminates selection bias | Selection bias likely |
| Cost/time | Higher (need sampling frame, random selection) | Lower (grab who's available) |
| Best use case | Confirmatory research, policy decisions | Exploratory research, pilot studies |
Compare: Probability vs. Non-Probability Sampling—the fundamental distinction is whether you can calculate the probability that any given unit enters your sample. If yes, you can do inference. If no, your results are descriptive only. FRQs often present a scenario and ask you to identify the sampling method and evaluate whether conclusions are valid—this distinction is your key.
| Concept | Best Examples |
|---|---|
| Equal probability selection | Simple Random Sampling, Systematic Sampling |
| Variance reduction through homogeneity | Stratified Sampling |
| Cost reduction for dispersed populations | Cluster Sampling, Multistage Sampling |
| Hierarchical population structure | Multistage Sampling |
| Speed over validity | Convenience Sampling, Quota Sampling |
| Qualitative/exploratory research | Purposive Sampling |
| Valid statistical inference | All probability methods (SRS, Stratified, Cluster, Systematic, Multistage) |
| Selection bias risk | All non-probability methods (Convenience, Quota, Purposive) |
A researcher wants to estimate average household income in a city but only has resources to visit 10 neighborhoods. She randomly selects 10 neighborhoods and surveys every household in each. What sampling method is this, and what is its main disadvantage compared to SRS?
Which two sampling methods both divide the population into groups but differ in which units ultimately get sampled? Explain the key distinction and when you'd prefer each.
A polling company ensures their sample includes 40% Democrats, 40% Republicans, and 20% Independents by interviewing people at a shopping center until they hit those numbers. Identify the sampling method and explain why confidence intervals from this data would be invalid.
Compare systematic sampling and simple random sampling: under what specific condition does systematic sampling produce biased estimates while SRS would not?
An FRQ describes a study where researchers first randomly selected 5 states, then randomly selected 3 counties within each state, then surveyed 100 randomly chosen residents per county. Name the sampling method, identify how many stages it has, and explain why standard variance formulas cannot be directly applied.