📚

All Subjects

 > 

📊 

AP Stats

 > 

⚖️

Unit 6

6.0 Unit 6 Overview: Inference for Categorical Data: Proportions

4 min readfebruary 18, 2021

josh

Josh Argo


https://cdn.pixabay.com/photo/2016/03/27/18/40/snow-1283525_960_720.jpg

Image Courtesy of pixabay.com

What is Inference?

Have you ever seen a statistic perhaps on Facebook or Twitter and had your doubts? Maybe you read a statistic such as this one: "The proportion of goofy footed snowboarders who contract cancer is higher than those that are regular footed."
Sounds pretty goofy right? 🤪 The process that scientists and data analysts use to make that conclusion comes from a process called statistical inference. Inference is a process where a study is performed on a small sample of a population in which we compare two groups or perhaps one group to a given population. Through calculations involving the normal distribution, we can estimate what the true population parameter is or we can test a claim about a population given in an article of study using our sample statistics. To estimate or predict a population parameter, we use a confidence interval and to test a claim, we use a significance test.

Confidence Intervals

For this unit, we are going to be estimating population parameters involving categorical data. This means that our sample statistic will be a sample proportion and we will be using that to estimate, or test against, a population proportion.
The first process we are going to use is a confidence interval. A confidence interval is an interval of numbers based on our sample proportion that gives us a range where we can expect to find the true population proportion. A confidence interval will be based on three things: sample proportion, sample size, and confidence level (usually 95%).

Sample Proportion

The first aspect of our confidence interval is our sample proportion. In order for our sample proportion to be a good estimate of our population proportion, it is necessary that it comes from a random sample. As mentioned before, there is no way to fix the lack of randomness in a sample.

Sample Size

Our sample size is also an important measure when used to calculate a confidence interval. Our sample size must be large enough that we can use a normal distribution to estimate our population proportion. In order to see that condition, refer back to what we said in Unit 5 with Sampling Distributions.
Also, as our sample size increases, our standard deviation decreases, so a larger sample size will result in a smaller confidence interval.

Confidence Level

Our confidence level gives us a measure of how confident we are that our interval contains the true proportion of our population. The way this percentage is determined is by considering if we were to take several samples from the same population and create confidence intervals, the level selected is the percentage of resulting confidence intervals that would contain our true proportion.
For example, if we were to take 100 different samples from the same population and create 100 different 95% confidence intervals, ~95 of those 100 confidence intervals would contain the true proportion we are trying to estimate.
Our confidence level is also a key part of our confidence level because it determines our z* or critical value based on the standard normal distribution. As our confidence level increases, so does our z*, which in turn increases the range of our confidence interval.

Significance Tests

When we are given a population parameter and we have some reason to believe that it is false, we can perform a significance test to check if that value is correct. With a significance test, we are going to estimate the probability of obtaining our collected sample from the sampling distribution of our sample size when we assume that the given population proportion is correct. If the probability of obtaining our collected sample is low given those two factors (claimed population proportion and our sample size), we might have reason to reject the claim or at least investigate it further.
As we had with confidence intervals and sampling distributions, our significance test hinges on the fact that we must meet the three conditions of inference: randomness, independence and normality. Otherwise, our sample isn't reflective of the population, our standard deviation isn't accurate, or our sampling distribution isn't normal so we cannot accurately calculate the probability of obtaining our sample.

Inference with Two Proportions

Just as we mentioned in Unit 5, we also may have to create confidence intervals or perform significance tests with two proportions. This is typically used in experimental design when comparing two samples to see the effectiveness of certain treatments.
As mentioned in Unit 5, our conditions for inference must be met with both samples and we can subtract our two centers to find the center of the sampling distribution between two proportions. The standard deviation for this sampling distribution can be found on the reference page provided for AP testing.
For example, if a researcher is testing the effectiveness of a particular medicine or drug, the experimental design would randomly assign participants to a placebo group or the new treatment group. We would assume that there is no difference in the two groups and then compare the sample proportions of who recovered quicker between the two groups and if that difference is significant, then we would have an effective medicinal treatment.
🎥 Watch: AP Stats Unit 6

Was this guide helpful?

🔍 Are you ready for college apps?
Take this quiz and find out!
Start Quiz
FREE AP stats Survival Pack + Cram Chart PDF
Sign up now for instant access to 2 amazing downloads to help you get a 5
Join us on Discord
Thousands of students are studying with us for the AP Statistics exam.
join now
Play this on HyperTyper
Practice your typing skills while reading Unit 6 Overview: Inference for Categorical Data: Proportions
Start Game
💪🏽 Are you ready for the Stats exam?
Take this quiz for a progress check on what you’ve learned this year and get a personalized study plan to grab that 5!
START QUIZ
Hours Logo
Studying with Hours = the ultimate focus mode
Start a free study session
📱 Stressed or struggling and need to talk to someone?
Talk to a trained counselor for free. It's 100% anonymous.
Text FIVEABLE to 741741 to get started.
© 2021 Fiveable, Inc.