📊Advanced Communication Research Methods

Sampling Methods

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Why This Matters

Sampling methods form the backbone of statistical inference, and the AP Statistics exam tests whether you understand why certain methods produce valid conclusions while others don't. You're not just being asked to identify "stratified" versus "cluster" sampling; you're being tested on whether you can explain how each method affects bias, variability, and the validity of inference. Every confidence interval, hypothesis test, and chi-square procedure you'll encounter assumes data was collected properly, so understanding sampling is prerequisite knowledge for Units 5–9.

The key distinction driving this entire topic is probability sampling versus non-probability sampling. Probability methods give every member of the population a known, non-zero chance of selection, and that's what allows us to make legitimate generalizations. When you see questions about the 10% condition, independence assumptions, or why a study's conclusions might be flawed, you're applying sampling concepts. Don't just memorize definitions. Know what makes each method statistically valid (or not) and when each is most appropriate.

Probability Sampling Methods

These methods give every population member a known chance of selection, which is the fundamental requirement for valid statistical inference. Without probability sampling, confidence intervals and hypothesis tests lose their mathematical foundation.

Simple Random Sample (SRS)

Every possible sample of size $n$ has an equal probability of being selected. This is the gold standard that other methods are compared against.
Selection uses random number generators or tables of random digits to eliminate human bias in choosing participants.
Enables direct generalization to the population because the randomization process, not researcher judgment, determines who's included.

Stratified Random Sample

The population is divided into homogeneous strata (groups where members share a key characteristic), then an SRS is taken within each stratum.
Reduces variability in estimates compared to SRS alone because you guarantee representation of important subgroups. For example, if you're estimating average income across a university, stratifying by department (faculty, staff, administration) ensures each group contributes to the estimate proportionally rather than being over- or under-represented by chance.
Strata should be chosen based on a variable related to what you're measuring. Stratifying by a variable that has no connection to your response variable won't help reduce variability.

Cluster Sample

The population is divided into heterogeneous clusters (each cluster should mirror the diversity of the whole population), then entire clusters are randomly selected and everyone in those clusters is sampled.
Dramatically reduces cost and logistics when populations are geographically spread out or no complete list of individuals exists. Think of sampling school districts: it's far easier to survey every student in 10 randomly chosen schools than to track down individually selected students scattered across 200 schools.
Introduces higher sampling variability than SRS because individuals within clusters tend to be more similar to each other than to the broader population (e.g., students at the same school share similar resources and experiences).

Systematic Random Sample

Randomly select a starting point, then choose every $k$ th individual from an ordered list, where $k$ equals population size divided by desired sample size.
Spreads the sample evenly across the sampling frame and is simpler to implement than pure SRS, especially with large populations.
Risk of bias exists if the list has a periodic pattern that aligns with the sampling interval. For instance, if every 10th house on a street is a corner lot with a larger yard, sampling every 10th house would systematically overrepresent one type of property.

Compare: Stratified vs. Cluster sampling. Both divide populations into groups, but stratified samples from every group while cluster samples entire groups. Stratified reduces variability; cluster often increases it. If an FRQ describes dividing by geography and sampling whole areas, that's cluster. If it describes ensuring representation of subgroups by sampling within each one, that's stratified.

Non-Probability Sampling Methods

These methods do not give every population member a known chance of selection, which means results cannot be generalized to the population through statistical inference. The AP exam frequently tests whether you can identify these as sources of bias.

Convenience Sample

Participants are selected based on easy accessibility to the researcher. Surveying people in a mall food court at noon on a Tuesday is a classic example: you're only reaching people who happen to be there.
Quick and inexpensive but produces selection bias because the accessible group likely differs systematically from the population.
Conclusions apply only to those sampled, not the broader population. No amount of sophisticated analysis fixes the flawed data collection.

Voluntary Response Sample

Participants self-select into the study by choosing to respond. Online polls, call-in surveys, and product reviews all fall into this category.
Produces severe bias toward strong opinions because people with moderate views rarely bother to participate. A restaurant's online reviews skew toward people who had either a fantastic or terrible experience; the majority with an average meal stay silent.
Large sample sizes don't fix this problem. A voluntary response poll with 50,000 responses is still biased because the mechanism of selection is flawed, not the quantity.

Quota Sample

The researcher sets quotas to match population proportions for key characteristics, then fills those quotas using non-random selection.
Resembles stratified sampling but lacks randomization. The final step uses convenience or researcher judgment, not random selection. A quota sampler told to interview 30 women aged 18–25 might approach people who look friendly or approachable, introducing subtle bias.
Cannot support valid inference despite appearing representative, because selection within quotas introduces unknown biases that can't be quantified.

Compare: Stratified sampling vs. Quota sampling. Both aim for proportional representation of subgroups, but stratified uses random selection within strata while quota uses researcher judgment. Only stratified supports valid statistical inference.

Complex Sampling Designs

Real-world studies often combine methods to balance statistical validity with practical constraints. The AP exam expects you to recognize these hybrid approaches.

Multistage Sampling

Combines multiple sampling methods in stages. A typical design starts with cluster sampling (randomly select some clusters), then applies SRS or stratified sampling within the selected clusters rather than surveying everyone in them.
Balances efficiency with representativeness by reducing travel and cost while maintaining probability-based selection at every stage.
Used in major national surveys like the Census Bureau's Current Population Survey. The government first randomly selects geographic areas, then randomly selects households within those areas.

Compare: Simple cluster vs. Multistage sampling. Cluster samples everyone in selected clusters, while multistage adds another random selection step within clusters. Multistage typically produces more precise estimates but requires more complex analysis.

The Probability vs. Non-Probability Distinction

This conceptual division determines whether your inference is mathematically valid. Every inference procedure in Units 6–9 assumes probability sampling.

Probability Sampling (Category)

Every member has a known, non-zero probability of selection. This mathematical property is what makes inference work.
Includes SRS, stratified, cluster, and systematic methods. All allow calculation of sampling distributions and standard errors.
Supports the 10% condition and independence assumptions required for confidence intervals and hypothesis tests.

Non-Probability Sampling (Category)

Selection probabilities are unknown or zero for some members. No mathematical basis for generalization exists.
Includes convenience, voluntary response, and quota methods. These can be useful for exploratory research but not for confirmatory inference.
Results in selection bias and undercoverage that cannot be corrected through larger sample sizes.

Compare: Probability vs. Non-probability sampling. The distinction isn't about sample size or effort; it's about whether randomization determines selection. A carefully designed convenience sample of 10,000 is still biased, while a proper SRS of 100 supports valid inference.

Quick Reference Table

Concept	Best Examples
Equal probability of selection	Simple Random Sample
Reducing variability through subgroups	Stratified Random Sample
Cost-effective for spread-out populations	Cluster Sample, Multistage Sampling
Sources of selection bias	Convenience Sample, Voluntary Response Sample
Appears representative but isn't random	Quota Sample
Supports valid statistical inference	SRS, Stratified, Cluster, Systematic (all probability methods)
Risk from periodic patterns in lists	Systematic Random Sample

Self-Check Questions

A researcher divides a city into neighborhoods, randomly selects 5 neighborhoods, and surveys every household in those neighborhoods. A second researcher divides residents by income level and randomly selects participants from each income group. Which method will likely produce estimates with lower variability, and why?
An online poll asks visitors to a news website to vote on a political issue and receives 50,000 responses. Why can't we construct a valid 95% confidence interval for the population proportion from this data, despite the large sample size?
Both stratified and quota sampling aim to ensure representation of subgroups. What specific feature distinguishes them, and how does this affect the validity of inference?
A study uses systematic sampling with $k = 20$ from an alphabetized list of employees. Under what circumstance might this method introduce bias, and how could the researchers check for this problem?
For a chi-square test of homogeneity comparing customer satisfaction across three store locations, explain why a cluster sample (randomly selecting entire stores) would be inappropriate for this specific inference, while a stratified random sample (sampling customers from each store) would work.