Inference for Quantitative Data
Have you ever been given a piece of information and said " 🤔 🤔 That just doesn't sound right!" In this unit, we are going to tackle how we can actually test these claims when dealing with quantitative data. We are going to see how we can estimate the true mean of a population, or test a given claim about a population.
image courtesy of: pixabay.com
The first half of this unit is dedicated to constructing and interpreting confidence intervals. A confidence interval is a range of numbers with which we can estimate, or predict, a true population mean/proportion.
In order to construct a confidence interval, we need to make sure that three conditions are met, which are similar to the conditions in Unit 6
The first thing that is essential to constructing a confidence interval is to make sure that our sample statistic is taken from a random sample. If our sample statistic is obtained from a random sample from our population, it is known as an unbiased estimator, which is exactly what we want to get a good estimate.
The next thing we need to check is that our sample is taken independently. As you recall from Unit 6
, most of the time our samples are taken without replacement, so therefore they technically are not independent. Therefore, we can check the 10% condition,
which states that the population is at least 10x the sample size. This is a necessary piece of calculating a confidence interval because it allows us to use the standard deviation
formula given on our formula sheet
The normal distribution is a tad different for quantitative data (means). Rather than using a z* as our critical value, we will shift to using the family of t distributions, which is based on the sample size. We will discuss that more later in the coming sections. For now, in order to be able to use the t distribution, you will need to be sure that one of the three things is true:
The population is normally distributed.
The sample size is at least 30. This is known as the Central Limit Theorem. Some people refer to this as the Fundamental Theorem of Statistics since so much of our calculations hinge on this fact being true.
If worse comes to worse and our sample size isn't large enough, we can also check that a sampling distribution for our sample mean is approximately normal by plotting our random sample on a box-plot or dot-plot and showing that it has no skewness or outliers.
The second half of this unit is dedicated to significance tests. This is when we have a claim from the author regarding the true population mean, but we also have a sample mean and standard deviation that leads us to doubt this claim.
The same conditions that we checked above for confidence intervals also need to hold in order to perform a significance test. We need to make sure our sample is random, the 10% condition is met and the sampling distribution for our sample mean is approximately normal.
In order to test a given claim, we will calculate the probability of obtaining our sample mean assuming that the author's population claim is true. Therefore, we will create a sampling distribution based off of the information given by the author and see where our sample mean would fit in. If our sample mean lies far in one of the two tails, that gives us reason to doubt that their claim is actually true since our sample was randomly selected from this population and assuming our normal condition is met.
For example, the local Co-Op says that they get in approximately 25 🐤 every Thursday. You have been checking their inventory for 35 days and have found the average number of 🐤 to be only 21. Is the Co-Op not being truthful or are your different findings due to simply sampling variability? We will return to this claim later in this unit and actually perform a test.
Two Sample Inference
Another essential part of inference with quantitative data involves constructing a confidence interval or running a significance test for the difference in two sample means. This comes in handy a lot in experimental design as we are testing the differences in means between two treatment groups, or trying to find the average difference between the two groups.
For instance, if we were comparing two treatments for poison ivy, we may randomly assign one group of patients one cream and the second group a different cream and compare the average number of days it took for symptoms to subside. We could analyze the effectiveness using both a confidence interval or a significance test.
🎥Watch: AP Stats - Unit 7