Sampling lets you learn about an entire population without surveying every single member. By choosing a sample carefully, you can make reliable estimates about the whole group. This section covers the main sampling methods, how sample statistics vary across repeated samples, and why sample size matters for the accuracy of your estimates.

Random Sampling for Population Estimates

A sampling frame is the complete list of all members in the population you're drawing from. The quality of your sample depends on having a good sampling frame and choosing an appropriate method. Here are the four main approaches:

Simple random sampling gives every member of the population an equal chance of being selected. Think of a lottery or a random number generator. Each selection is independent, meaning picking one person doesn't affect who gets picked next. This can be done with replacement (putting the selected item back before the next draw) or without replacement (once selected, that item is out of the pool).

Systematic sampling selects every $k$ th element from a population list. For example, you might pick every 10th student from a class roster. You start by randomly choosing your first element from the initial $k$ elements (say, randomly picking a number between 1 and 10), then count forward by $k$ from there.

Stratified sampling divides the population into homogeneous subgroups (called strata) based on a characteristic like age, gender, or income level. You then apply simple random sampling within each subgroup. The big advantage here is that it guarantees every subgroup is represented in your final sample, which plain random sampling can't promise.

Cluster sampling divides the population into naturally occurring groups, such as city blocks, schools, or hospitals. Instead of sampling individuals, you randomly select entire clusters. For instance, you might randomly choose 5 city blocks out of 20, then survey all households within those chosen blocks. This is often more practical and cheaper than other methods, especially when the population is spread across a large area.

Random sampling for population estimates, Cluster sampling - Wikipedia

Variability Across Multiple Samples

If you took 100 different samples of 50 people from the same population, you'd get a slightly different sample mean each time. The sampling distribution is the distribution of that statistic (like the mean) across all those repeated samples. It shows you how much variability to expect from sample to sample.

The Central Limit Theorem (CLT) is one of the most important ideas in statistics. It states that as sample size increases, the sampling distribution of the sample mean approaches a normal distribution (bell-shaped curve). This holds true regardless of the shape of the original population's distribution, as long as the sample size is sufficiently large (usually $n \geq 30$ ).

Two key properties of the sampling distribution under the CLT:

The mean of the sampling distribution equals the population mean ( $\mu$ )
The standard deviation of the sampling distribution, called the standard error, equals $\frac{\sigma}{\sqrt{n}}$ , where $\sigma$ is the population standard deviation and $n$ is the sample size

The standard error tells you how much a sample statistic typically varies across repeated samples. A smaller standard error means more consistent results from sample to sample. This is different from sampling error, which is the difference between a single sample's statistic and the true population parameter.

Random sampling for population estimates, Відбір вибірки (статистика) — Вікіпедія

Sample Size Effects on Accuracy

Sample size has a direct impact on how trustworthy your estimates are. Three related concepts capture this:

Accuracy refers to how close your sample statistic is to the true population parameter. Larger samples tend to produce statistics that land closer to the real value.
Precision refers to how consistent your results are across repeated samples. Larger samples decrease variability, giving you a narrower range of values each time.
Margin of error quantifies the maximum expected difference between your sample statistic and the population parameter. It's calculated as:

$z \times \frac{\sigma}{\sqrt{n}}$

where $z$ is the critical value (z-score) for your chosen confidence level. Because $n$ is in the denominator, increasing sample size shrinks the margin of error.

The confidence level is the probability that your confidence interval actually contains the true population parameter. A 95% confidence level means that if you repeated the sampling process many times, about 95% of the resulting intervals would capture the true value. Common confidence levels are 90%, 95%, and 99%. At any given confidence level, larger samples produce narrower confidence intervals, giving you more precise estimates.

Sampling Considerations

Even a well-chosen method can go wrong if you're not careful about a few things:

A representative sample accurately reflects the characteristics of the population it's drawn from. This is always the goal.
Bias is a systematic error in sampling that pushes results in a particular direction. For example, surveying only people at a mall would over-represent certain demographics.
Non-sampling error covers mistakes in the data collection process itself, like measurement errors, poorly worded survey questions, or data entry mistakes. These aren't caused by your sampling method but can still distort your results.