Cluster Sample

A cluster sample is a sampling method in which the population is divided into groups (clusters), a random selection of entire clusters is chosen, and every individual in the selected clusters is surveyed. It works best when each cluster looks like a mini version of the whole population.

Verified for the 2027 AP Statistics examLast updated June 2026

What is Cluster Sample?

In a cluster sample, you split the population into groups called clusters, randomly pick some of those clusters, and then collect data from everyone inside the chosen clusters. The randomness happens at the cluster level, not the individual level. Think of a city with 50 neighborhoods. Instead of randomly picking households scattered all over the city, you randomly pick 10 neighborhoods and knock on every door in those 10. That's a cluster sample.

The big idea is that each cluster should be a miniature copy of the population, internally diverse, just like the whole. If that's true, surveying a few whole clusters gives you representative data at a fraction of the cost and travel time of a simple random sample. If clusters are internally similar but different from each other (say, one neighborhood is all wealthy and another is all low-income), your sample can badly misrepresent the population. That trade-off, convenience versus the risk of unrepresentative clusters, is exactly what the AP exam asks you to reason about under DAT-2.D.1.

Why Cluster Sample matters in AP Statistics

Cluster sampling lives in Topic 3.3 (Random Sampling and Data Collection) in Unit 3: Collecting Data. It supports two learning objectives directly. LO 3.3.A asks you to identify the sampling method from a study description, and cluster designs are a favorite because they look deceptively like stratified samples. LO 3.3.B asks you to explain why a method is or isn't appropriate, backed by essential knowledge DAT-2.D.1, which says every sampling method has advantages and disadvantages depending on the question and the population. For cluster sampling, the advantage is practicality (cheaper, faster, less travel) and the disadvantage is the risk that clusters aren't representative. Unit 3 concepts also echo through the rest of the course, since every inference procedure in Units 6-9 assumes the data came from a sound random sampling method in the first place.

How Cluster Sample connects across the course

Stratified Sample (Unit 3)

These are mirror images of each other. Stratified sampling samples some individuals from every group; cluster sampling samples every individual from some groups. Strata should be similar within and different between. Clusters should be different within and similar between.

Random Sampling / SRS (Unit 3)

An SRS gives every group of a given size an equal chance of selection, but it can be expensive or impossible for spread-out populations. Cluster sampling keeps the randomness (clusters are chosen at random) while making data collection manageable, which is exactly the trade-off DAT-2.D.1 wants you to articulate.

Systematic Sample (Unit 3)

Both are shortcuts that avoid a full SRS. Systematic sampling picks every kth individual from a list, while cluster sampling picks whole groups. Exam questions love describing one and asking you to name it, so know that systematic = a pattern through a list, cluster = entire groups all at once.

10% Condition (Units 5-9)

How you sample in Unit 3 feeds directly into inference later. The standard error formulas behind confidence intervals and tests assume essentially independent observations, which is why sampling without replacement requires the sample to be under 10% of the population. A poorly designed cluster sample can break those assumptions before you ever compute a statistic.

Is Cluster Sample on the AP Statistics exam?

Cluster sampling shows up most often in multiple-choice questions that describe a study and ask you to identify the method (LO 3.3.A) or evaluate whether it's appropriate (LO 3.3.B). Classic stems include a researcher who randomly selects 10 of a city's 50 neighborhoods and surveys every household, or a national health survey that randomly picks 50 of 500 hospitals and interviews all current patients. The follow-up question is usually conceptual. For example, what condition makes a cluster sample effective? The answer is that each cluster must be representative of the whole population. Watch for multistage designs too, like randomly selecting school districts, then randomly selecting schools within them. Those combine cluster selection with further random sampling, and you may need to recognize that it's not a pure cluster sample. On FRQs, Unit 3 sampling questions typically ask you to describe a method or explain an advantage or disadvantage in context, so practice writing one clean sentence like "cluster sampling is appropriate here because it reduces cost, and each neighborhood contains a mix of household types similar to the city overall."

Cluster Sample vs Stratified Sample

Both methods divide the population into groups first, which is why they get mixed up constantly. The difference is what happens next. In a stratified sample, you take a random sample from EVERY group (strata), so all groups are represented. In a cluster sample, you randomly select only SOME groups and survey EVERYONE in them. The logic flips too. Strata work best when groups are homogeneous inside (all similar within a stratum), while clusters work best when each group is heterogeneous inside (a mini version of the population). Quick check on the exam: if every group contributes data, it's stratified; if whole groups are in or out, it's cluster.

Key things to remember about Cluster Sample

  • A cluster sample randomly selects entire groups (clusters) and collects data from every individual inside the chosen clusters.

  • Cluster sampling works only if each cluster is representative of the whole population, meaning the variety inside each cluster mirrors the variety in the population.

  • The main advantage of cluster sampling is practicality, since it cuts cost and travel time when the population is spread out and an SRS would be hard to run.

  • Cluster and stratified samples are opposites in structure. Stratified samples take some individuals from every group, while cluster samples take all individuals from some groups.

  • On the exam, you need to both name the method from a study description (LO 3.3.A) and justify why it is or isn't appropriate in context (LO 3.3.B).

Frequently asked questions about Cluster Sample

What is a cluster sample in AP Stats?

A cluster sample divides the population into groups called clusters, randomly selects some of those clusters, and surveys every individual in the selected clusters. Example: randomly choosing 10 of a city's 50 neighborhoods and surveying every household in those 10.

What's the difference between a cluster sample and a stratified sample?

Stratified sampling takes a random sample from every group, so all strata contribute data. Cluster sampling randomly picks only some groups and surveys everyone in them. Memory trick: stratified samples SOME from ALL groups, cluster samples ALL from SOME groups.

Is a cluster sample a random sample?

Yes, as long as the clusters themselves are chosen randomly. The randomization happens at the group level instead of the individual level, so it still counts as a probability sampling method, unlike a convenience sample.

When should clusters be similar or different from each other?

Each cluster should be internally diverse and similar to the other clusters, so any cluster works as a mini version of the population. That's the reverse of strata, which should be internally similar and different from each other.

Why would a researcher use a cluster sample instead of an SRS?

Cost and logistics. Surveying every household in 10 randomly chosen neighborhoods is far cheaper and faster than tracking down a simple random sample scattered across all 50 neighborhoods. The CED (DAT-2.D.1) frames this as the advantage you trade against the risk of unrepresentative clusters.