Fiveable
Fiveable
pep
Fiveable
Fiveable

or

Log in

Find what you need to study


Light

5.6 Sampling Distributions for Differences in Sample Proportions

4 min readjanuary 2, 2023

Jed Quiaoit

Jed Quiaoit

B

Brianna Bukowski

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

B

Brianna Bukowski

Josh Argo

Josh Argo

Differences (Non-Distribution) Recap

To find the for differences in a or mean, remember that variances always add to find the new variance. If one needs the , you should take the square root of the variance. However, for means you can just subtract. ➖

Proportion Differences

To find the of differences in sample means, divide the variances by each sample size before square rooting to find the overall . The simplified formula can be seen below. If you are only given the standard deviations for both samples, you must square both standard deviations, add them up and then take the square root. This can be referred to as the “Pythagorean Theorem of Statistics.” 📐

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F22-FZm9oNUSINV6.JPG?alt=media&token=51dfeda2-67f4-446b-875d-57599007955c

Source: NEW AP Statistics Formula Sheet

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-YXYdqZFP6kmT.JPG?alt=media&token=2451ada9-44e0-4daf-9540-f22d2e579130

For any Proportion Inference, you must check Large Counts to confirm normality. You can only check for Quantitative Data (Means).

For a , when randomly sampling with replacement from two independent populations with p1 and p2, the of the difference in sample proportions, p1 - p2, has mean µ = p1 - p2 and as shown in the image below.

Additionally, the of the difference in sample proportions p1 - p2 will have an approximate normal distribution provided the sample sizes are large enough:

  • n1p1 > 10

  • n1 (1 - p1) > 10

  • n2p2 > 10

  • n2 (1 - p2) > 10

Here is a review of types of distributions: (Be sure to save this somewhere!) ⭐

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-pD1CvWalmslo.JPG?alt=media&token=f00ef251-12d0-4028-aa25-8cc994dfd8f1

Source: The AP Statistics CED

Practice Problem

Suppose that you are conducting a survey to compare the proportion of people in two different cities who support a new public transportation system. You decide to use of 1000 people from each city, and you ask them whether or not they support the new system. After collecting the data, you find that 600 people out of the 1000 respondents from City A support the system, and 700 people out of the 1000 respondents from City B support the system. 🚂

a) Calculate the sample proportions of respondents who support the new system in each city.

b) Explain what the represents and why it is useful in this situation.

c) Suppose that the true population proportion of people in City A who support the new system is actually 0.6, and the true population proportion of people in City B who support the new system is actually 0.7. Describe the shape, center, and spread of the in this case.

d) Explain why the applies to the in this situation.

e) Discuss one potential source of that could affect the results of this study, and explain how it could influence the estimate. (Hint: slightly different when thinking about working with one sample vs. two samples)

Answer

a) The of respondents who support the new system in City A is 600/1000 = 0.6, and the of respondents who support the new system in City B is 700/1000 = 0.7.

b) The represents the distribution of possible values for the difference between the sample proportions if the study were repeated many times. It is useful in this situation because it allows us to make inferences about the difference between the in the two cities based on the sample data.

c) If the true population proportion of people in City A who support the new system is 0.6, and the true population proportion of people in City B who support the new system is 0.7, the would be approximately normal with a center at 0.7 - 0.6 = 0.1 and a spread that depends on the sample sizes and the variability of the populations.

d) The applies to the in this situation because the sample sizes (n1 = 1000 and n2 = 1000) are large enough for the distribution to be approximately normal, even if the populations are not normally distributed.

e) One potential source of in this study could be , which occurs when certain groups of individuals are more or less likely to respond to the survey. For example, if people in City A who support the new system are more likely to respond to the survey, the sample from City A could be biased toward higher levels of support and produce an overestimate of the population proportion.

On the other hand, if people in City B who do not support the new system are more likely to respond, the sample from City B could be biased toward lower levels of support and produce an underestimate of the population proportion. This could lead to an incorrect estimate of the difference in between the two cities.

Key Terms to Review (11)

Bias

: Bias refers to a systematic deviation from the true value or an unfair influence that affects the results of statistical analysis. It can occur during data collection, sampling, or analysis and leads to inaccurate or misleading conclusions.

Categorical Variable

: A categorical variable is one that represents characteristics or qualities rather than numerical values. It consists of categories or groups into which data can be classified.

Central Limit Theorem

: The Central Limit Theorem states that as the sample size increases, the sampling distribution of the mean approaches a normal distribution regardless of the shape of the population distribution.

Nonresponse Bias

: Nonresponse bias refers to the potential distortion of results caused by individuals who choose not to participate or fail to respond in a survey or study. This can lead to biased conclusions if those who do not respond differ systematically from those who do.

Normal Distribution

: A normal distribution is a symmetric bell-shaped probability distribution characterized by its mean and standard deviation. It follows a specific mathematical formula called Gaussian distribution.

Population Proportions

: Population proportions refer to the proportion or percentage of a specific characteristic or attribute within an entire population.

Sample Proportion

: The sample proportion is the ratio of the number of successes in a sample to the total number of observations in that sample.

Sampling Distribution

: A sampling distribution refers to the distribution of a statistic (such as mean, proportion, or difference) calculated from multiple random samples taken from the same population. It provides information about how sample statistics vary from sample to sample.

Sampling Distribution for the Difference in Sample Proportions

: The sampling distribution for the difference in sample proportions refers to the distribution of differences between two sample proportions. It helps us understand how likely it is to observe a particular difference between two groups when randomly selecting samples from populations.

Simple Random Samples

: Simple random samples are subsets of individuals selected from a larger population in such a way that every individual has an equal chance of being chosen. They are commonly used in statistical studies to make generalizations about populations.

Standard Deviation

: The standard deviation measures the average amount of variation or dispersion in a set of data. It tells us how spread out the values are from the mean.

5.6 Sampling Distributions for Differences in Sample Proportions

4 min readjanuary 2, 2023

Jed Quiaoit

Jed Quiaoit

B

Brianna Bukowski

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

B

Brianna Bukowski

Josh Argo

Josh Argo

Differences (Non-Distribution) Recap

To find the for differences in a or mean, remember that variances always add to find the new variance. If one needs the , you should take the square root of the variance. However, for means you can just subtract. ➖

Proportion Differences

To find the of differences in sample means, divide the variances by each sample size before square rooting to find the overall . The simplified formula can be seen below. If you are only given the standard deviations for both samples, you must square both standard deviations, add them up and then take the square root. This can be referred to as the “Pythagorean Theorem of Statistics.” 📐

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F22-FZm9oNUSINV6.JPG?alt=media&token=51dfeda2-67f4-446b-875d-57599007955c

Source: NEW AP Statistics Formula Sheet

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-YXYdqZFP6kmT.JPG?alt=media&token=2451ada9-44e0-4daf-9540-f22d2e579130

For any Proportion Inference, you must check Large Counts to confirm normality. You can only check for Quantitative Data (Means).

For a , when randomly sampling with replacement from two independent populations with p1 and p2, the of the difference in sample proportions, p1 - p2, has mean µ = p1 - p2 and as shown in the image below.

Additionally, the of the difference in sample proportions p1 - p2 will have an approximate normal distribution provided the sample sizes are large enough:

  • n1p1 > 10

  • n1 (1 - p1) > 10

  • n2p2 > 10

  • n2 (1 - p2) > 10

Here is a review of types of distributions: (Be sure to save this somewhere!) ⭐

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-pD1CvWalmslo.JPG?alt=media&token=f00ef251-12d0-4028-aa25-8cc994dfd8f1

Source: The AP Statistics CED

Practice Problem

Suppose that you are conducting a survey to compare the proportion of people in two different cities who support a new public transportation system. You decide to use of 1000 people from each city, and you ask them whether or not they support the new system. After collecting the data, you find that 600 people out of the 1000 respondents from City A support the system, and 700 people out of the 1000 respondents from City B support the system. 🚂

a) Calculate the sample proportions of respondents who support the new system in each city.

b) Explain what the represents and why it is useful in this situation.

c) Suppose that the true population proportion of people in City A who support the new system is actually 0.6, and the true population proportion of people in City B who support the new system is actually 0.7. Describe the shape, center, and spread of the in this case.

d) Explain why the applies to the in this situation.

e) Discuss one potential source of that could affect the results of this study, and explain how it could influence the estimate. (Hint: slightly different when thinking about working with one sample vs. two samples)

Answer

a) The of respondents who support the new system in City A is 600/1000 = 0.6, and the of respondents who support the new system in City B is 700/1000 = 0.7.

b) The represents the distribution of possible values for the difference between the sample proportions if the study were repeated many times. It is useful in this situation because it allows us to make inferences about the difference between the in the two cities based on the sample data.

c) If the true population proportion of people in City A who support the new system is 0.6, and the true population proportion of people in City B who support the new system is 0.7, the would be approximately normal with a center at 0.7 - 0.6 = 0.1 and a spread that depends on the sample sizes and the variability of the populations.

d) The applies to the in this situation because the sample sizes (n1 = 1000 and n2 = 1000) are large enough for the distribution to be approximately normal, even if the populations are not normally distributed.

e) One potential source of in this study could be , which occurs when certain groups of individuals are more or less likely to respond to the survey. For example, if people in City A who support the new system are more likely to respond to the survey, the sample from City A could be biased toward higher levels of support and produce an overestimate of the population proportion.

On the other hand, if people in City B who do not support the new system are more likely to respond, the sample from City B could be biased toward lower levels of support and produce an underestimate of the population proportion. This could lead to an incorrect estimate of the difference in between the two cities.

Key Terms to Review (11)

Bias

: Bias refers to a systematic deviation from the true value or an unfair influence that affects the results of statistical analysis. It can occur during data collection, sampling, or analysis and leads to inaccurate or misleading conclusions.

Categorical Variable

: A categorical variable is one that represents characteristics or qualities rather than numerical values. It consists of categories or groups into which data can be classified.

Central Limit Theorem

: The Central Limit Theorem states that as the sample size increases, the sampling distribution of the mean approaches a normal distribution regardless of the shape of the population distribution.

Nonresponse Bias

: Nonresponse bias refers to the potential distortion of results caused by individuals who choose not to participate or fail to respond in a survey or study. This can lead to biased conclusions if those who do not respond differ systematically from those who do.

Normal Distribution

: A normal distribution is a symmetric bell-shaped probability distribution characterized by its mean and standard deviation. It follows a specific mathematical formula called Gaussian distribution.

Population Proportions

: Population proportions refer to the proportion or percentage of a specific characteristic or attribute within an entire population.

Sample Proportion

: The sample proportion is the ratio of the number of successes in a sample to the total number of observations in that sample.

Sampling Distribution

: A sampling distribution refers to the distribution of a statistic (such as mean, proportion, or difference) calculated from multiple random samples taken from the same population. It provides information about how sample statistics vary from sample to sample.

Sampling Distribution for the Difference in Sample Proportions

: The sampling distribution for the difference in sample proportions refers to the distribution of differences between two sample proportions. It helps us understand how likely it is to observe a particular difference between two groups when randomly selecting samples from populations.

Simple Random Samples

: Simple random samples are subsets of individuals selected from a larger population in such a way that every individual has an equal chance of being chosen. They are commonly used in statistical studies to make generalizations about populations.

Standard Deviation

: The standard deviation measures the average amount of variation or dispersion in a set of data. It tells us how spread out the values are from the mean.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.