Probability proportional to size (PPS) sampling is a game-changer in survey design. It gives bigger, more important units a better shot at being picked, which can lead to more accurate results. This method is super useful when you're dealing with units of different sizes or importance.

PPS sampling is a key player in multistage sampling. It helps researchers pick primary sampling units (PSUs) in a way that balances representation and efficiency. This approach can save time and money while still giving reliable data.

PPS Sampling Basics

Understanding PPS and Measure of Size

Probability proportional to size (PPS) sampling selects units with probabilities proportional to their size or importance
Measure of size (MOS) quantifies the relative importance of each unit in the population
MOS correlates strongly with the variable of interest improves precision of estimates
Common MOS includes population counts, land area, or sales volume
Sampling frame contains list of all units in the population with their corresponding MOS values

Unequal Probability and Size Bias

Unequal probability sampling assigns different selection probabilities to units based on their MOS
Larger units have higher chances of selection enhances representation of important elements
Size bias occurs when larger units are overrepresented in the sample
PPS sampling corrects for size bias by adjusting selection probabilities
Balances representation of small and large units in the final sample

PPS Sampling Methods

Understanding PPS and Measure of Size, 7.3 The Sampling Distribution of the Sample Proportion – Significant Statistics

Cumulative Total and Systematic PPS

Cumulative total method creates a running sum of MOS values across all units
Random number generated between 0 and the total cumulative MOS
Unit selected when cumulative total exceeds the random number
Systematic PPS sampling divides the cumulative total into equal intervals
Random start point chosen within the first interval determines subsequent selections

With and Without Replacement PPS

With replacement PPS (PPSWR) allows units to be selected multiple times
PPSWR simplifies calculations and analysis
Without replacement PPS (PPSWOR) ensures each unit appears only once in the sample
PPSWOR increases efficiency by avoiding duplication
PPSWOR requires more complex sampling algorithms (Brewer's method or Hanurav-Vijayan algorithm)

PPS Sampling Properties

Understanding PPS and Measure of Size, Sampling (statistics) - Wikipedia

Inclusion Probabilities and Self-Weighting Samples

Inclusion probability represents the chance of a unit being selected in the sample
Calculated as the ratio of unit's MOS to the total MOS of the population
Self-weighting samples have equal weights for all selected units
PPS can create self-weighting samples when MOS is proportional to the variable of interest
Self-weighting simplifies analysis and reduces the need for complex weighting procedures

Efficiency Gains and Precision

PPS sampling often yields more precise estimates than simple random sampling
Efficiency gains result from incorporating auxiliary information through MOS
Reduces sampling variance by allocating more resources to important units
Particularly effective when MOS strongly correlates with the variable of interest
Can lead to smaller sample sizes for the same level of precision

PPS Estimators

Horvitz-Thompson Estimator and Applications

Horvitz-Thompson estimator provides unbiased estimates for population totals in PPS sampling
Calculated by summing the ratio of observed values to inclusion probabilities
Formula: $\hat{Y} = \sum_{i \in s} \frac{y_i}{\pi_i}$ where $\hat{Y}$ is the estimated total, $y_i$ is the observed value, and $\pi_i$ is the inclusion probability
Accounts for unequal selection probabilities in the estimation process
Widely used in complex surveys and multi-stage sampling designs

Variance Estimation and Confidence Intervals

Variance of Horvitz-Thompson estimator depends on joint inclusion probabilities
Exact variance calculation complex for PPSWOR designs
Approximation methods (linearization, replication) often used for variance estimation
Confidence intervals constructed using estimated variance and normal distribution assumptions
Bootstrap methods provide alternative approach for variance and confidence interval estimation in PPS sampling

2,589 studying →