Fiveable
Fiveable

or

Log in

Find what you need to study


Light

Find what you need to study

Unit 7 Overview: Means

6 min readโ€ขjanuary 4, 2023

Jed Quiaoit

Jed Quiaoit

Jed Quiaoit

Jed Quiaoit

"In this unit, students will analyze to make inferences about population means. Students should understand that t* and are used for inference with means when the population standard deviation, ฯƒ, is not known. Using s for ฯƒ in the formula for gives a slightly different value, t, whose distribution, which depends on sample size, has more area in the tails than a . The boundaries for rejecting a using a tend to be further from the mean than for a . Students should understand how and why conditions for inference with proportions and means are similar and different." -- College Board

Inference for Quantitative Data

Have you ever been given a piece of information and said, "Wait, that just doesn't sound right!" ๐Ÿค”

In this unit, we are going to tackle how we can actually test these claims when dealing with . We are going to see how we can estimate the true mean of a population, or test a given claim about a population.

Similar to the previous unit, there are several ways to test claims about a population when dealing with . One common method is to use hypothesis testing, which involves stating a and an alternate hypothesis, and then using statistical analysis to determine which hypothesis is more supported by the data.

Another method is to use confidence intervals, which provide a range of values within which the true population mean is likely to fall. A can be calculated based on a sample mean and a measure of the sample's dispersion, such as the standard deviation. The larger the sample size and the smaller the standard deviation, the narrower the will be.

https://cdn.pixabay.com/photo/2017/08/25/21/47/confused-2681507_960_720.jpg

image courtesy of: pixabay.com

Confidence Intervals

The first half of this unit is dedicated to constructing and interpreting confidence intervals. A is a range of numbers with which we can estimate, or predict, a true population mean or proportion. ๐Ÿ‘

In order to construct a , we need to make sure that three conditions are met, which are similar to the conditions in Unit 6.

Random

The first thing that is essential to constructing a is to make sure that our sample statistic is taken from a . If our sample statistic is obtained from a from our population, it is known as an , which is exactly what we want to get a good estimate. ๐Ÿ€

Independence

The next thing we need to check is that our sample is taken independently. As you recall from Unit 6, most of the time our samples are taken without replacement, so therefore they technically are not independent. Therefore, we can check the , which states that the population is at least 10x the sample size. This is a necessary piece of calculating a because it allows us to use the standard deviation formula given on our formula sheet. ๐Ÿ

Normal

The is a tad different for (means). Rather than using a * as our critical value, we will shift to using the family of t distributions, which is based on the sample size. We will discuss that more later in the coming sections. For now, in order to be able to use the t distribution, you will need to be sure that one of the three things is true: ๐Ÿ””

  • The population is normally distributed.

  • The sample size is at least 30. This is known as the . Some people refer to this as the Fundamental Theorem of Statistics since so much of our calculations hinge on this fact being true.

  • If worse comes to worse and our sample size isn't large enough, we can also check that a for our sample mean is approximately normal by plotting our on a box-plot or dot-plot and showing that it has no or .

Significance Tests

The second half of this unit is dedicated to . This is when we have a claim from the author regarding the true population mean, but we also have a sample mean and standard deviation that leads us to doubt this claim. ๐Ÿ“

Conditions

The same conditions that we checked above for confidence intervals also need to hold in order to perform a significance test. We need to make sure our sample is random, the is met and the for our sample mean is approximately normal.

In order to test a given claim, we will calculate the probability of obtaining our sample mean assuming that the author's population claim is true. Therefore, we will create a based off of the information given by the author and see where our sample mean would fit in. If our sample mean lies far in one of the two tails, that gives us reason to doubt that their claim is actually true since our sample was randomly selected from this population and assuming our normal condition is met.

Example

For example, the local Co-Op says that they get in approximately 25 ๐Ÿค every Thursday. You have been checking their inventory for 35 days and have found the average number of ๐Ÿค to be only 21. Is the Co-Op not being truthful or are your different findings due to simply sampling variability? We will return to this claim later in this unit and actually perform a test.

Two-Sample Inference

Another essential part of inference with involves constructing a or running a significance test for the difference in two sample means. This comes in handy a lot in as we are testing the differences in means between two , or trying to find the average difference between the two groups. 2๏ธโƒฃ

For instance, if we were comparing two treatments for poison ivy, we may randomly assign one group of patients one cream and the second group a different cream and compare the average number of days it took for symptoms to subside. We could analyze the effectiveness using both a or a significance test.

Other questions that this unit will tackle include:

  • How do we know whether to use a or a for inference with means?

The and the are both used for inference with means, but they are used in different situations.

The is used when the population standard deviation is unknown and the sample size is small (usually n < 30). It is also used when the sample is drawn from a population that is not normally distributed. The takes into account the variability in the sample, which is important when the population standard deviation is unknown.

The is used when the population standard deviation is known and the sample size is large (usually n >= 30). It is also used when the sample is drawn from a normally distributed population. The assumes that the population standard deviation is known, so it does not take into account the variability in the sample to the same extent as the .

  • How can we make sure that samples are independent?

  • Why is it inappropriate to accept a hypothesis as true based on the results of statistical inference testing?

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-qCM719vz5lsY.png?alt=media&token=9b2e35a1-febc-4588-a096-a563f90ee809

Source: The Pirate's Guide to R

๐ŸŽฅ Watch: AP Stats - Unit 7

Key Terms to Review (24)

10% Condition

: The 10% condition states that for sampling without replacement to be considered valid, the sample size must be less than 10% of the population size.

Box Plot

: A box plot (also known as a box-and-whisker plot) is a graphical representation that displays key features of numerical data including minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum values. It helps visualize data spread and identify outliers.

Central Limit Theorem

: The Central Limit Theorem states that as the sample size increases, the sampling distribution of the mean approaches a normal distribution regardless of the shape of the population distribution.

Confidence Interval

: A confidence interval is a range of values that is likely to contain the true value of a population parameter. It provides an estimate along with a level of confidence about how accurate the estimate is.

Dot Plot

: A dot plot is a simple graphical display that uses dots to represent individual data points along an axis. It shows the distribution of numerical data and helps identify patterns, clusters, or gaps in the data.

Experimental Design

: Experimental design refers to the process of planning and conducting an experiment to investigate cause-and-effect relationships between variables. It involves defining treatments, assigning participants to different groups, and controlling for confounding factors to ensure valid results.

Independence

: Independence refers to events or variables that do not influence each other. If two events are independent, knowing one event occurred does not affect our knowledge about whether or not the other event will occur.

Normal Distribution

: A normal distribution is a symmetric bell-shaped probability distribution characterized by its mean and standard deviation. It follows a specific mathematical formula called Gaussian distribution.

Null Hypothesis

: The null hypothesis is a statement of no effect or relationship between variables in a statistical analysis. It assumes that any observed differences or associations are due to random chance.

Outliers

: Outliers are extreme values that significantly differ from other values in a dataset. They can greatly affect statistical analyses and should be carefully examined.

Quantitative Data

: Quantitative data refers to numerical information that can be measured or counted. It involves quantities and can be analyzed using mathematical methods.

Random Sample

: A random sample is a subset of individuals selected from a larger population in such a way that every individual has an equal chance of being chosen. It helps to ensure that the sample is representative of the population.

Sample Variability

: Sample variability refers to the differences or variations that exist among different samples taken from the same population. It measures how spread out or diverse the values in a sample are.

Sampling Distribution

: A sampling distribution refers to the distribution of a statistic (such as mean, proportion, or difference) calculated from multiple random samples taken from the same population. It provides information about how sample statistics vary from sample to sample.

Significance Tests

: Significance tests help determine whether an observed effect or difference between groups is statistically significant or simply due to chance variation.

Skewness

: Skewness refers to the asymmetry or lack of symmetry in a distribution. It indicates whether data is concentrated on one side or stretched out on both sides.

t-Distribution

: The t-distribution is a probability distribution that is used in statistical inference for small sample sizes or when the population standard deviation is unknown. It is similar to the normal distribution but has thicker tails.

t-test

: A t-test is a statistical test that compares two sample means to determine if they are significantly different from each other.

t-tests

: T-tests are statistical tests used to determine if there is a significant difference between the means of two groups. They are commonly used in hypothesis testing and can be performed for both small and large sample sizes.

Treatment Groups

: Treatment groups refer to the different groups or conditions in an experiment where participants are exposed to different levels of the independent variable. These groups allow researchers to compare the effects of different treatments or interventions.

Two-Sample Inference

: Two-sample inference involves making statistical inferences about two populations based on information obtained from two separate samples. It allows us to compare means, proportions, or other characteristics between two groups.

Unbiased Estimator

: An unbiased estimator is a statistic that accurately estimates the value of a parameter on average, meaning it does not consistently overestimate or underestimate the true value.

z

: The z-score is a measure of how many standard deviations an individual data point is from the mean of a distribution.

z-test

: A z-test is a statistical test used to determine if there is a significant difference between a sample mean and a population mean when the population standard deviation is known. It compares sample data by calculating a z-value based on differences between means and their variability.

Unit 7 Overview: Means

6 min readโ€ขjanuary 4, 2023

Jed Quiaoit

Jed Quiaoit

Jed Quiaoit

Jed Quiaoit

"In this unit, students will analyze to make inferences about population means. Students should understand that t* and are used for inference with means when the population standard deviation, ฯƒ, is not known. Using s for ฯƒ in the formula for gives a slightly different value, t, whose distribution, which depends on sample size, has more area in the tails than a . The boundaries for rejecting a using a tend to be further from the mean than for a . Students should understand how and why conditions for inference with proportions and means are similar and different." -- College Board

Inference for Quantitative Data

Have you ever been given a piece of information and said, "Wait, that just doesn't sound right!" ๐Ÿค”

In this unit, we are going to tackle how we can actually test these claims when dealing with . We are going to see how we can estimate the true mean of a population, or test a given claim about a population.

Similar to the previous unit, there are several ways to test claims about a population when dealing with . One common method is to use hypothesis testing, which involves stating a and an alternate hypothesis, and then using statistical analysis to determine which hypothesis is more supported by the data.

Another method is to use confidence intervals, which provide a range of values within which the true population mean is likely to fall. A can be calculated based on a sample mean and a measure of the sample's dispersion, such as the standard deviation. The larger the sample size and the smaller the standard deviation, the narrower the will be.

https://cdn.pixabay.com/photo/2017/08/25/21/47/confused-2681507_960_720.jpg

image courtesy of: pixabay.com

Confidence Intervals

The first half of this unit is dedicated to constructing and interpreting confidence intervals. A is a range of numbers with which we can estimate, or predict, a true population mean or proportion. ๐Ÿ‘

In order to construct a , we need to make sure that three conditions are met, which are similar to the conditions in Unit 6.

Random

The first thing that is essential to constructing a is to make sure that our sample statistic is taken from a . If our sample statistic is obtained from a from our population, it is known as an , which is exactly what we want to get a good estimate. ๐Ÿ€

Independence

The next thing we need to check is that our sample is taken independently. As you recall from Unit 6, most of the time our samples are taken without replacement, so therefore they technically are not independent. Therefore, we can check the , which states that the population is at least 10x the sample size. This is a necessary piece of calculating a because it allows us to use the standard deviation formula given on our formula sheet. ๐Ÿ

Normal

The is a tad different for (means). Rather than using a * as our critical value, we will shift to using the family of t distributions, which is based on the sample size. We will discuss that more later in the coming sections. For now, in order to be able to use the t distribution, you will need to be sure that one of the three things is true: ๐Ÿ””

  • The population is normally distributed.

  • The sample size is at least 30. This is known as the . Some people refer to this as the Fundamental Theorem of Statistics since so much of our calculations hinge on this fact being true.

  • If worse comes to worse and our sample size isn't large enough, we can also check that a for our sample mean is approximately normal by plotting our on a box-plot or dot-plot and showing that it has no or .

Significance Tests

The second half of this unit is dedicated to . This is when we have a claim from the author regarding the true population mean, but we also have a sample mean and standard deviation that leads us to doubt this claim. ๐Ÿ“

Conditions

The same conditions that we checked above for confidence intervals also need to hold in order to perform a significance test. We need to make sure our sample is random, the is met and the for our sample mean is approximately normal.

In order to test a given claim, we will calculate the probability of obtaining our sample mean assuming that the author's population claim is true. Therefore, we will create a based off of the information given by the author and see where our sample mean would fit in. If our sample mean lies far in one of the two tails, that gives us reason to doubt that their claim is actually true since our sample was randomly selected from this population and assuming our normal condition is met.

Example

For example, the local Co-Op says that they get in approximately 25 ๐Ÿค every Thursday. You have been checking their inventory for 35 days and have found the average number of ๐Ÿค to be only 21. Is the Co-Op not being truthful or are your different findings due to simply sampling variability? We will return to this claim later in this unit and actually perform a test.

Two-Sample Inference

Another essential part of inference with involves constructing a or running a significance test for the difference in two sample means. This comes in handy a lot in as we are testing the differences in means between two , or trying to find the average difference between the two groups. 2๏ธโƒฃ

For instance, if we were comparing two treatments for poison ivy, we may randomly assign one group of patients one cream and the second group a different cream and compare the average number of days it took for symptoms to subside. We could analyze the effectiveness using both a or a significance test.

Other questions that this unit will tackle include:

  • How do we know whether to use a or a for inference with means?

The and the are both used for inference with means, but they are used in different situations.

The is used when the population standard deviation is unknown and the sample size is small (usually n < 30). It is also used when the sample is drawn from a population that is not normally distributed. The takes into account the variability in the sample, which is important when the population standard deviation is unknown.

The is used when the population standard deviation is known and the sample size is large (usually n >= 30). It is also used when the sample is drawn from a normally distributed population. The assumes that the population standard deviation is known, so it does not take into account the variability in the sample to the same extent as the .

  • How can we make sure that samples are independent?

  • Why is it inappropriate to accept a hypothesis as true based on the results of statistical inference testing?

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-qCM719vz5lsY.png?alt=media&token=9b2e35a1-febc-4588-a096-a563f90ee809

Source: The Pirate's Guide to R

๐ŸŽฅ Watch: AP Stats - Unit 7

Key Terms to Review (24)

10% Condition

: The 10% condition states that for sampling without replacement to be considered valid, the sample size must be less than 10% of the population size.

Box Plot

: A box plot (also known as a box-and-whisker plot) is a graphical representation that displays key features of numerical data including minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum values. It helps visualize data spread and identify outliers.

Central Limit Theorem

: The Central Limit Theorem states that as the sample size increases, the sampling distribution of the mean approaches a normal distribution regardless of the shape of the population distribution.

Confidence Interval

: A confidence interval is a range of values that is likely to contain the true value of a population parameter. It provides an estimate along with a level of confidence about how accurate the estimate is.

Dot Plot

: A dot plot is a simple graphical display that uses dots to represent individual data points along an axis. It shows the distribution of numerical data and helps identify patterns, clusters, or gaps in the data.

Experimental Design

: Experimental design refers to the process of planning and conducting an experiment to investigate cause-and-effect relationships between variables. It involves defining treatments, assigning participants to different groups, and controlling for confounding factors to ensure valid results.

Independence

: Independence refers to events or variables that do not influence each other. If two events are independent, knowing one event occurred does not affect our knowledge about whether or not the other event will occur.

Normal Distribution

: A normal distribution is a symmetric bell-shaped probability distribution characterized by its mean and standard deviation. It follows a specific mathematical formula called Gaussian distribution.

Null Hypothesis

: The null hypothesis is a statement of no effect or relationship between variables in a statistical analysis. It assumes that any observed differences or associations are due to random chance.

Outliers

: Outliers are extreme values that significantly differ from other values in a dataset. They can greatly affect statistical analyses and should be carefully examined.

Quantitative Data

: Quantitative data refers to numerical information that can be measured or counted. It involves quantities and can be analyzed using mathematical methods.

Random Sample

: A random sample is a subset of individuals selected from a larger population in such a way that every individual has an equal chance of being chosen. It helps to ensure that the sample is representative of the population.

Sample Variability

: Sample variability refers to the differences or variations that exist among different samples taken from the same population. It measures how spread out or diverse the values in a sample are.

Sampling Distribution

: A sampling distribution refers to the distribution of a statistic (such as mean, proportion, or difference) calculated from multiple random samples taken from the same population. It provides information about how sample statistics vary from sample to sample.

Significance Tests

: Significance tests help determine whether an observed effect or difference between groups is statistically significant or simply due to chance variation.

Skewness

: Skewness refers to the asymmetry or lack of symmetry in a distribution. It indicates whether data is concentrated on one side or stretched out on both sides.

t-Distribution

: The t-distribution is a probability distribution that is used in statistical inference for small sample sizes or when the population standard deviation is unknown. It is similar to the normal distribution but has thicker tails.

t-test

: A t-test is a statistical test that compares two sample means to determine if they are significantly different from each other.

t-tests

: T-tests are statistical tests used to determine if there is a significant difference between the means of two groups. They are commonly used in hypothesis testing and can be performed for both small and large sample sizes.

Treatment Groups

: Treatment groups refer to the different groups or conditions in an experiment where participants are exposed to different levels of the independent variable. These groups allow researchers to compare the effects of different treatments or interventions.

Two-Sample Inference

: Two-sample inference involves making statistical inferences about two populations based on information obtained from two separate samples. It allows us to compare means, proportions, or other characteristics between two groups.

Unbiased Estimator

: An unbiased estimator is a statistic that accurately estimates the value of a parameter on average, meaning it does not consistently overestimate or underestimate the true value.

z

: The z-score is a measure of how many standard deviations an individual data point is from the mean of a distribution.

z-test

: A z-test is a statistical test used to determine if there is a significant difference between a sample mean and a population mean when the population standard deviation is known. It compares sample data by calculating a z-value based on differences between means and their variability.


ยฉ 2024 Fiveable Inc. All rights reserved.

APยฎ and SATยฎ are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


ยฉ 2024 Fiveable Inc. All rights reserved.

APยฎ and SATยฎ are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.