📊Sampling Surveys

Common Sampling Biases

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Why This Matters

Every AP Statistics exam tests your ability to identify what went wrong in a study—and sampling bias is the most common culprit. When you're analyzing an experimental design or critiquing a survey in an FRQ, you need to pinpoint exactly which bias is present and explain why it threatens the validity of the conclusions. The good news? These biases follow predictable patterns based on how samples are selected, who responds, and how questions are asked.

Understanding sampling biases isn't just about memorizing a list of terms. You're being tested on your ability to recognize how bias enters a study and what effect it has on generalizability. Can the results be extended to the entire population, or did something in the sampling process systematically favor certain groups? Master the underlying mechanisms—who's missing, who's overrepresented, and why—and you'll be able to tackle any scenario the exam throws at you.

Biases from Sample Selection

These biases occur before any data is collected—something goes wrong in how or from whom the sample is drawn. The core issue is that the sampling process itself creates systematic differences between your sample and the target population.

Selection Bias

Systematic exclusion or inclusion—occurs when the method of choosing participants favors certain groups over others, making the sample unrepresentative from the start
Root cause is the selection mechanism itself, not who chooses to participate; the researcher's process creates the problem
Threatens external validity because conclusions drawn from a biased sample cannot be generalized to the broader population

Undercoverage Bias

Certain population groups are inadequately represented or entirely missing from the sample, often due to a flawed sampling frame
Common culprits include outdated lists—phone directories miss cell-only households, voter rolls miss recent movers
Systematically excludes demographics that may differ meaningfully on the variable being studied, skewing results in a predictable direction

Sampling Frame Bias

The list used to draw the sample doesn't match the target population—if your frame is incomplete, your sample inherits those gaps
Outdated or incomplete frames create undercoverage; the frame defines who can possibly be selected
Directly limits generalizability because people not on the list had zero probability of selection, violating random sampling assumptions

Convenience Sampling Bias

Samples drawn from easily accessible groups rather than through random selection—think surveying students in your own class or shoppers at one mall
Prioritizes ease over representativeness, introducing systematic differences between your sample and the population
Almost always produces biased results because accessible groups typically share characteristics (location, schedule, interests) that correlate with the variable of interest

Compare: Undercoverage bias vs. Sampling frame bias—both involve missing groups, but undercoverage can occur even with a good frame (through bad sampling), while frame bias means the list itself is flawed. On FRQs, identify where the problem originates: the list or the selection process.

Biases from Participation Patterns

Even with a perfectly selected sample, bias can enter when who actually participates differs systematically from who was selected. These biases emerge from the gap between your intended sample and your actual respondents.

Non-Response Bias

Selected individuals fail to respond, and their characteristics differ meaningfully from those who do respond
Creates bias only when non-response is related to the survey topic—if health survey non-respondents are sicker than respondents, results underestimate health problems
Can be reduced through follow-ups, incentives, and shorter surveys, but never fully eliminated; always report response rates

Voluntary Response Bias

Self-selection attracts people with strong opinions, while those with moderate views rarely bother to participate
Classic examples include call-in polls, online reviews, and comment sections—extreme voices dominate because motivation to respond correlates with intensity of opinion
Results skew toward extremes and cannot represent the population; this is why voluntary response samples are essentially worthless for inference

Compare: Non-response bias vs. Voluntary response bias—both involve who chooses to participate, but non-response starts with a random sample where some don't respond, while voluntary response never had random selection to begin with. Non-response can sometimes be addressed; voluntary response is fundamentally flawed by design.

Biases from How Respondents Answer

These biases occur during data collection itself. The sample may be perfectly representative, but how people respond introduces systematic errors. The issue shifts from "who's in the sample" to "what are they telling us."

Response Bias

Respondents provide inaccurate answers due to question wording, survey format, leading questions, or interviewer influence
Can be intentional or unintentional—confusing questions, biased wording ("Don't you agree that..."), or interviewer presence all distort responses
Threatens data reliability because even a representative sample yields misleading conclusions if the measurements themselves are flawed

Respondents answer to look good rather than honestly, over-reporting positive behaviors (voting, exercise) and under-reporting stigmatized ones (drug use, prejudice)
Strongest on sensitive topics where respondents perceive a "right" answer; anonymity and indirect questioning can help
Creates systematic overestimation or underestimation depending on the topic—critical to consider when interpreting self-reported data on personal behaviors

Recall Bias

Memory limitations cause inaccurate reporting of past events, especially for routine behaviors or distant timeframes
Particularly problematic in retrospective studies asking about diet, exercise, or experiences from months or years ago
Errors are often systematic, not random—significant events are remembered better, and people reconstruct memories based on current beliefs

Compare: Social desirability bias vs. Response bias—social desirability is a specific type of response bias driven by wanting to appear favorable. On exams, use the more specific term when the scenario clearly involves sensitive topics or impression management.

Biases from Analyzing the Wrong Group

This category involves drawing conclusions from a sample that, while real, doesn't represent what we think it does. The data is accurate for who we measured, but we're missing crucial information about who we didn't.

Survivorship Bias

Only "survivors" of a selection process are analyzed, while those who dropped out, failed, or disappeared are ignored
Classic example: studying successful companies to find success factors while ignoring failed companies that may have had the same traits
Leads to systematically optimistic conclusions because failures are invisible; always ask "what's missing from this dataset?"

Compare: Survivorship bias vs. Undercoverage bias—both involve missing groups, but undercoverage means certain groups were never sampled, while survivorship means they were part of the original group but "disappeared" before measurement. Survivorship often occurs in longitudinal studies or historical analyses.

Quick Reference Table

Concept	Best Examples
Sample selection problems	Selection bias, Undercoverage, Sampling frame bias, Convenience sampling
Participation-related	Non-response bias, Voluntary response bias
Response accuracy	Response bias, Social desirability bias, Recall bias
Missing data patterns	Survivorship bias, Non-response bias
Flawed sampling frame	Sampling frame bias, Undercoverage bias
Self-selection issues	Voluntary response bias, Convenience sampling bias
Threatens generalizability	All of them—but especially selection, undercoverage, and voluntary response
Threatens reliability	Response bias, Social desirability bias, Recall bias

Self-Check Questions

A researcher surveys patients currently enrolled in a weight-loss program about their eating habits from the past year. Identify two distinct biases that could affect these results and explain how each would distort the findings.
Which two biases both involve people choosing whether to participate, and what is the key difference in how the original sample was selected?
An online poll asks readers whether they support a new city policy, and 89% of 2,000 respondents say "no." A city council member claims this proves residents oppose the policy. What bias is present, and why does the large sample size not fix the problem?
Compare sampling frame bias and undercoverage bias. If a researcher uses a 2015 phone directory to survey a city's current residents, which bias (or both) is present? Explain your reasoning.
A study finds that entrepreneurs who took a specific business course have higher success rates than average. Before concluding the course is effective, what bias should you consider, and what additional information would you need to evaluate the claim?