Fiveable

📊Causal Inference Unit 1 Review

QR code for Causal Inference practice questions

1.3 Sampling and estimation

1.3 Sampling and estimation

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
📊Causal Inference
Unit & Topic Study Guides

Simple random sampling

Definition of simple random sampling

Simple random sampling (SRS) means selecting a subset of individuals from a population so that every individual has an equal probability of being chosen. You use random selection methods (random number generators, lottery systems) to ensure unbiased selection. This requires a complete list of all members of the population, known as a sampling frame.

Advantages of simple random sampling

  • Eliminates selection bias by giving every member of the population an equal chance of inclusion
  • Allows you to use standard statistical methods to analyze results and estimate population parameters
  • Produces a representative sample, enabling generalizations about the larger population

Disadvantages of simple random sampling

  • Can be time-consuming and expensive, especially for large or geographically dispersed populations
  • Requires a complete and accurate sampling frame, which isn't always available
  • May not adequately represent small subgroups if the overall sample size is too small

Stratified sampling

Definition of stratified sampling

Stratified sampling divides the population into distinct subgroups called strata based on known characteristics (age, gender, income level). You then draw an independent random sample from each stratum. This guarantees that each subgroup appears in the final sample.

Advantages of stratified sampling

  • Guarantees representation of all important subgroups within the population
  • Increases precision by reducing sampling variability within each stratum
  • Allows direct comparison across subgroups

Disadvantages of stratified sampling

  • Requires prior knowledge of the population's characteristics to define appropriate strata
  • More complex and time-consuming than SRS, since you draw multiple separate samples
  • Offers little advantage over SRS if variability within strata is similar to variability between strata

Proportional vs disproportional allocation

  • Proportional allocation assigns sample sizes to each stratum in proportion to that stratum's share of the population. If women make up 60% of the population, 60% of your sample comes from the women stratum.
  • Disproportional allocation assigns larger samples to strata with greater internal variability, regardless of their population share.
    • Optimal allocation (Neyman allocation) is a specific form of disproportional allocation that minimizes the variance of the overall estimate for a fixed total sample size.

Cluster sampling

Definition of cluster sampling

Cluster sampling divides the population into naturally occurring groups called clusters (schools, city blocks, hospitals). You randomly select some clusters, then sample from within those clusters. This is especially useful when a complete list of individuals doesn't exist but clusters can be easily identified.

Advantages of cluster sampling

  • Reduces cost and time by focusing data collection on selected clusters rather than scattered individuals
  • Eliminates the need for a complete sampling frame of every individual
  • Allows study of naturally occurring groups and within-group relationships

Disadvantages of cluster sampling

  • Clusters may not be representative of the entire population, increasing sampling error
  • Requires a larger total sample size than SRS to achieve the same precision
  • The clustering effect (individuals within a cluster tend to be more similar to each other than to those in other clusters) reduces effective sample size

Single-stage vs multi-stage clustering

  • Single-stage: you randomly select clusters and include all members within the chosen clusters.
  • Multi-stage: you randomly select clusters, then randomly select individuals within each chosen cluster. This is more practical for large or geographically dispersed populations because you don't need to survey every person in a selected cluster.

Systematic sampling

Definition of systematic sampling

Systematic sampling selects every kkth element from a population list, starting with a randomly chosen element between 1 and kk. The sampling interval kk is calculated by dividing the population size by the desired sample size. The list needs to be arranged in some order (alphabetical, chronological, etc.).

Advantages of systematic sampling

  • Simple to implement: only the first element is chosen randomly, and the rest follow a fixed interval
  • Ensures even coverage across the entire population list
  • Can be more efficient than SRS for large populations
Definition of simple random sampling, Why It Matters: Linking Probability to Statistical Inference | Statistics for the Social Sciences

Disadvantages of systematic sampling

  • Periodicity in the list can introduce bias. If the sampling interval happens to align with a repeating pattern in the data, your sample will be skewed.
  • Requires a complete, ordered list of the population
  • Estimating sampling error and constructing confidence intervals is more complex than with SRS

Sample size determination

Factors influencing sample size

  • Desired precision: smaller margins of error require larger samples
  • Population variability: more variable populations need larger samples
  • Confidence level: higher confidence (99% vs. 95%) requires larger samples
  • Available resources: budget, time, and personnel may constrain feasible sample size

Sample size for estimating means

The required sample size depends on the population standard deviation, desired margin of error, and confidence level:

n=(zα/2σE)2n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2

where nn is the sample size, zα/2z_{\alpha/2} is the critical value for the desired confidence level, σ\sigma is the population standard deviation, and EE is the margin of error.

For example, to estimate a population mean within E=2E = 2 units at 95% confidence with σ=10\sigma = 10: n=(1.96×102)2=96.04n = \left(\frac{1.96 \times 10}{2}\right)^2 = 96.04, so you'd need at least 97 observations.

Sample size for estimating proportions

n=zα/22p(1p)E2n = \frac{z_{\alpha/2}^2 \cdot p(1-p)}{E^2}

where pp is the estimated population proportion. If you have no prior estimate for pp, use p=0.5p = 0.5 because that maximizes p(1p)p(1-p) and gives the most conservative (largest) sample size.

Sample size for comparing means

This depends on the difference in means you want to detect, the population standard deviations, desired statistical power, and significance level. You'll need to specify the null and alternative hypotheses and whether the test is one-tailed or two-tailed.

Sample size for comparing proportions

Similarly, this depends on the difference in proportions to detect, the population proportions, desired power, and significance level. Again, you specify hypotheses and test direction.

Sampling error and bias

Definition of sampling error

Sampling error is the difference between a sample statistic and the corresponding population parameter. It arises from the random variation inherent in drawing a sample rather than measuring the entire population. You can reduce it by increasing sample size or using more efficient sampling methods (like stratified sampling).

Definition of sampling bias

Sampling bias is a systematic error that occurs when the sample is not representative of the population. Unlike sampling error, bias causes estimates to consistently deviate from the true value in one direction. Increasing sample size does not fix bias, because the problem is in the sampling method itself.

Types of sampling bias

  • Selection bias: the sampling method favors certain members over others
    • Voluntary response bias: people who choose to participate often have stronger opinions than those who don't
    • Undercoverage bias: certain population members have a lower probability of being included (e.g., a phone survey that misses people without phones)
  • Non-response bias: people who respond to a survey differ systematically from those who don't
  • Measurement bias: the data collection process itself distorts responses
    • Leading question bias: question wording pushes respondents toward a particular answer
    • Social desirability bias: respondents answer in ways that make them look good rather than answering honestly

Reducing sampling error and bias

  • Use probability sampling methods (SRS, stratified sampling) to ensure representativeness
  • Increase sample size to reduce sampling error (but remember this won't fix bias)
  • Design questionnaires carefully to minimize measurement bias
  • Implement strategies to boost response rates (incentives, follow-up contacts) to reduce non-response bias
  • Compare your sample's key characteristics to known population values and apply weighting techniques to adjust for discrepancies

Point estimation

Definition of point estimation

Point estimation uses sample data to calculate a single value as the best guess for a population parameter. Common point estimates include the sample mean (for the population mean), sample proportion (for the population proportion), and sample variance (for the population variance). A point estimate is concise but tells you nothing about the uncertainty around it.

Properties of good estimators

  • Unbiasedness: the expected value of the estimator equals the true parameter. On average, across many samples, the estimator hits the right target.
  • Efficiency: among all unbiased estimators, the efficient one has the smallest variance. It gives you the tightest clustering around the true value.
  • Consistency: as sample size grows, the estimator converges in probability to the true parameter.
  • Sufficiency: the estimator captures all the information in the sample that's relevant to the parameter.

Maximum likelihood estimation

Maximum likelihood estimation (MLE) finds the parameter values that make the observed data most probable.

  1. Write down the likelihood function, which is the joint probability of observing your sample data given the parameter values.
  2. Take the natural log to get the log-likelihood function (this simplifies the math).
  3. Take the derivative of the log-likelihood with respect to the parameter, set it equal to zero, and solve.

MLE estimators are consistent and asymptotically efficient, which makes them widely used in practice.

Definition of simple random sampling, Figure 2: Sampling frame work of study participants.

Method of moments estimation

Method of moments (MoM) works by setting sample moments equal to their population counterparts and solving for the parameters.

  1. Express population moments (mean, variance, etc.) in terms of the unknown parameters.
  2. Replace population moments with the corresponding sample moments.
  3. Solve the resulting system of equations for the parameter estimates.

MoM is generally less efficient than MLE but can be easier to compute, especially when the likelihood function is hard to work with.

Interval estimation

Definition of interval estimation

Interval estimation produces a range of plausible values for a population parameter rather than a single point estimate. These intervals, called confidence intervals, incorporate sample variability and a chosen confidence level to account for uncertainty.

Confidence intervals

A confidence interval is a range of values centered around the point estimate. It's constructed using three ingredients:

  • The point estimate (e.g., sample mean)
  • The standard error of the estimate
  • A critical value from the appropriate distribution (z-distribution or t-distribution)

The confidence level (typically 90%, 95%, or 99%) specifies how often the procedure produces intervals that contain the true parameter.

Interpreting confidence intervals

This is one of the most commonly misunderstood concepts in statistics. A 95% confidence interval does not mean there's a 95% probability the true parameter lies within that specific interval. The true parameter is fixed; it's either in the interval or it isn't.

The correct interpretation: if you repeated the sampling process many times and built a 95% CI each time, about 95% of those intervals would contain the true parameter.

Factors affecting confidence interval width

  • Sample size: larger samples produce narrower intervals
  • Population variability: more variable populations produce wider intervals
  • Confidence level: higher confidence (99% vs. 95%) produces wider intervals because you need a bigger net to be more certain
  • Sampling method: more efficient methods like stratified sampling can produce narrower intervals than SRS for the same sample size

Estimating population means

Sample mean as an estimator

The sample mean xˉ\bar{x} is an unbiased and consistent estimator of the population mean μ\mu. It's calculated as:

xˉ=i=1nxin\bar{x} = \frac{\sum_{i=1}^n x_i}{n}

Unbiasedness means E(xˉ)=μE(\bar{x}) = \mu: if you could take every possible sample and average all the sample means, you'd get exactly μ\mu.

Standard error of the mean

The standard error measures how much xˉ\bar{x} varies from sample to sample:

SE(xˉ)=σnSE(\bar{x}) = \frac{\sigma}{\sqrt{n}}

Notice the n\sqrt{n} in the denominator. To cut the standard error in half, you need to quadruple your sample size. When σ\sigma is unknown (which is almost always the case), substitute the sample standard deviation ss.

Confidence intervals for population means

For a 95% confidence interval:

xˉ±zα/2σn\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}

With zα/2=1.96z_{\alpha/2} = 1.96 for 95% confidence. If σ\sigma is unknown and you use ss, replace the z-value with the appropriate t-value.

t-distribution for small samples

When the sample size is small (typically n<30n < 30) and σ\sigma is unknown, you use the t-distribution instead of the standard normal. The t-distribution has heavier tails, which produces wider confidence intervals. This accounts for the extra uncertainty introduced by estimating σ\sigma with ss from a small sample.

The degrees of freedom are df=n1df = n - 1. As nn grows large, the t-distribution converges to the standard normal distribution.

Estimating population proportions

Sample proportion as an estimator

The sample proportion p^\hat{p} is an unbiased and consistent estimator of the population proportion pp:

p^=xn\hat{p} = \frac{x}{n}

where xx is the number of "successes" (individuals with the characteristic of interest) and nn is the sample size. Its expected value is E(p^)=pE(\hat{p}) = p.

Standard error of the proportion

SE(p^)=p(1p)nSE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}}

When pp is unknown, substitute p^\hat{p} from your sample.

Confidence intervals for population proportions

For a 95% confidence interval:

p^±zα/2p^(1p^)n\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}

where zα/2=1.96z_{\alpha/2} = 1.96 for 95% confidence.

Normal approximation for large samples

The sampling distribution of p^\hat{p} is technically binomial, but for large samples you can approximate it with a normal distribution. The standard rule of thumb is that both np^5n\hat{p} \geq 5 and n(1p^)5n(1-\hat{p}) \geq 5 should hold.

This approximation comes from the Central Limit Theorem: as nn increases, the distribution of p^\hat{p} approaches a normal distribution regardless of the population's shape. If these conditions aren't met, you should use exact binomial methods instead.