Fiveable
Fiveable

or

Log in

Find what you need to study


Light

Find what you need to study

7.1 Introducing Statistics: Should I Worry About Error?

5 min readjanuary 4, 2023

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

https://cdn.pixabay.com/photo/2017/03/09/12/31/error-2129569_960_720.jpg

image courtesy of pixabay

No statistical study is perfect and that there is always a chance of error occurring. There are several sources of error that can affect the results of a statistical study, including , , and . 😵‍💫

occurs when the sample used in the study is not representative of the population being studied. This can lead to incorrect conclusions being drawn about the population based on the sample data.

occurs when there is error in the measurement of the variables being studied due to the presence of . This can also lead to incorrect conclusions being drawn.

is another source of error that can occur in statistical studies. can occur in the sampling process, the measurement process, or in the analysis of the data. can lead to incorrect conclusions being drawn about the population being studied.

Type I Errors

A , also known as a false positive, is an error that occurs when the null hypothesis is rejected when it should have been accepted. The probability of a occurring is equal to the alpha level, which is the level of significance that is chosen for the study. The alpha level is the probability of rejecting the null hypothesis when it is true. A common alpha level is 0.05, which means that there is a 5% chance of making a . ➕

It's important to choose an appropriate alpha level for a study, as a lower alpha level (e.g. 0.01) will result in a higher probability of making a , while a higher alpha level (e.g. 0.1) will result in a lower probability of making a .

It's also crucial to consider the consequences of making a , as rejecting the null hypothesis when it is true can lead to incorrect conclusions being drawn about the population being studied.

Example

Let's say an author claims that the mean income for a given area is $45,000. 💸

We sample a group of 50 families and find that the mean income of our sample is $60,000 with a standard deviation of $2,500. In performing a statistical test, we would reject the author's claim. If we made an error in our study (either due to sampling or random chance), this would be a .

Type II Errors

A , also known as a false negative, is an error that occurs when the null hypothesis is not rejected when it should have been. This means that the null hypothesis is accepted when it is actually false. ➖

Type II errors are more likely to occur when the sample size is small, as there is less power in the statistical test to detect a true difference between the population and the sample. Recall from the previous unit that the power of a statistical test is the probability of correctly rejecting the null hypothesis when it is false.

Like the probability of a , the probability of a is influenced by the alpha level and the sample size. A higher alpha level and a larger sample size will result in a lower probability of making a .

It is important to consider the consequences of making a , as failing to reject the null hypothesis when it is false can lead to incorrect conclusions being drawn about the population being studied.

Example

Let's say an author claims that the mean income for a given area is $45,000. 💸

We sample a group of 50 families and find that the mean income of our sample is $44,500 with a standard deviation of $1,000. In performing a statistical test, we would fail to reject the author's claim. If we made an error in our study (either due to sampling or by random chance), this would be a .

How to Minimize Error in Statistical Studies

(1) Minimizing error due to in sampling

✔️ Select a using a method such as .

  • Example: A band director wants to survey the school on their opinions of this year's half-time show. To perform the survey, he numbers each student in the school with a number and uses a random number generator to select 20 students. This is a GOOD example of how to select a .

❌ Avoid , convenience samples and other sampling methods that may heavily influence your data in one direction.

  • Example: A band director wants to survey the school on their opinions of this year's half-time show. To perform the survey, he chooses the first 20 students who arrive at the Fall Band Concert and asks them if the band's show is satisfactory. This is a BAD example of how to select a since he is only using students who were coming to the band concert (convenience sample).

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-MKpqFI11f73r.png?alt=media&token=8d12e19e-81f8-47d9-81b5-fcc6264f5214

Source: Qualtrics

(2) Minimizing error due to in questioning

❌ Avoid asking questions in a way that will prompt a certain response

  • Example: If the band director wants to know how the half-time show was, he should ask the question in a way such as "Rate the band's half time show on a scale of 1-10" as opposed to saying, "Was the band's half-time show good?" The latter way of asking the question will likely influence the student's to say "yes". This is known as response .

❌ Avoid having someone ask questions that may influence the response (for example, don't have a police officer ask someone if they have ever broken the speed limit)

  • Example: If the band director wants to know how the half-time show was, an anonymous survey would be the best way. If the band director asked the students directly, they may feel more inclined to give it a higher rating (especially if their grade in class could be influenced by the way they feel).

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-5Gn9y2MgmXdR.png?alt=media&token=b7666ae1-da9f-4d0d-83e6-4a59ca1360e1

Source: Survicate

(3) Minimizing error due to

✔️ Use in your experiment to account for any known or suspected .

  • Example: If the band director wants to know how the student body feels about the half-time show, the band director may consider by grade. That would ensure an equal amount of responses from each class and make sure that the age of students was not a confounding variable.

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-0O8lAt0S6S3M.png?alt=media&token=967030fd-d47b-46f8-9d13-6a138fb32278

Source: Data Science Discovery

🎥 Watch: AP Stats - Errors and Powers of Tests

Key Terms to Review (10)

Bias

: Bias refers to a systematic deviation from the true value or an unfair influence that affects the results of statistical analysis. It can occur during data collection, sampling, or analysis and leads to inaccurate or misleading conclusions.

Blocking

: Blocking is a technique used in experimental design where subjects or items are grouped together based on certain characteristics before being assigned to different treatment groups. It helps control for potential sources of variation and increases precision in estimating treatment effects.

Confounding Variables

: Confounding variables are additional factors that are not accounted for in a study but can influence both the independent and dependent variables. They can lead to incorrect conclusions about cause-and-effect relationships.

Measurement Error

: Measurement error refers to inaccuracies or variations that occur when measuring variables or collecting data from individuals or objects. It can arise from various sources such as faulty instruments, human errors, or inconsistencies in respondents' answers.

Random Sample

: A random sample is a subset of individuals selected from a larger population in such a way that every individual has an equal chance of being chosen. It helps to ensure that the sample is representative of the population.

Sampling Error

: Sampling error refers to the discrepancy between a sample statistic and the true population parameter due to random chance. It occurs when the sample selected is not perfectly representative of the entire population.

Simple random sampling

: Simple random sampling is a type of random sampling where each possible sample size has an equal chance of being selected. It ensures that every combination of individuals has an equal probability of being chosen.

Type I Error

: Type I error refers to rejecting a true null hypothesis. It occurs when we conclude there is a significant difference or relationship between variables when there actually isn't one.

Type II error

: Type II error occurs when we fail to reject a null hypothesis that is actually false. In other words, it's the mistake of accepting the null hypothesis when we should have rejected it.

Volunteer Samples

: Volunteer samples refer to a type of sampling method where individuals self-select to participate in a study or survey. It is not a random sample and may introduce bias into the results.

7.1 Introducing Statistics: Should I Worry About Error?

5 min readjanuary 4, 2023

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

https://cdn.pixabay.com/photo/2017/03/09/12/31/error-2129569_960_720.jpg

image courtesy of pixabay

No statistical study is perfect and that there is always a chance of error occurring. There are several sources of error that can affect the results of a statistical study, including , , and . 😵‍💫

occurs when the sample used in the study is not representative of the population being studied. This can lead to incorrect conclusions being drawn about the population based on the sample data.

occurs when there is error in the measurement of the variables being studied due to the presence of . This can also lead to incorrect conclusions being drawn.

is another source of error that can occur in statistical studies. can occur in the sampling process, the measurement process, or in the analysis of the data. can lead to incorrect conclusions being drawn about the population being studied.

Type I Errors

A , also known as a false positive, is an error that occurs when the null hypothesis is rejected when it should have been accepted. The probability of a occurring is equal to the alpha level, which is the level of significance that is chosen for the study. The alpha level is the probability of rejecting the null hypothesis when it is true. A common alpha level is 0.05, which means that there is a 5% chance of making a . ➕

It's important to choose an appropriate alpha level for a study, as a lower alpha level (e.g. 0.01) will result in a higher probability of making a , while a higher alpha level (e.g. 0.1) will result in a lower probability of making a .

It's also crucial to consider the consequences of making a , as rejecting the null hypothesis when it is true can lead to incorrect conclusions being drawn about the population being studied.

Example

Let's say an author claims that the mean income for a given area is $45,000. 💸

We sample a group of 50 families and find that the mean income of our sample is $60,000 with a standard deviation of $2,500. In performing a statistical test, we would reject the author's claim. If we made an error in our study (either due to sampling or random chance), this would be a .

Type II Errors

A , also known as a false negative, is an error that occurs when the null hypothesis is not rejected when it should have been. This means that the null hypothesis is accepted when it is actually false. ➖

Type II errors are more likely to occur when the sample size is small, as there is less power in the statistical test to detect a true difference between the population and the sample. Recall from the previous unit that the power of a statistical test is the probability of correctly rejecting the null hypothesis when it is false.

Like the probability of a , the probability of a is influenced by the alpha level and the sample size. A higher alpha level and a larger sample size will result in a lower probability of making a .

It is important to consider the consequences of making a , as failing to reject the null hypothesis when it is false can lead to incorrect conclusions being drawn about the population being studied.

Example

Let's say an author claims that the mean income for a given area is $45,000. 💸

We sample a group of 50 families and find that the mean income of our sample is $44,500 with a standard deviation of $1,000. In performing a statistical test, we would fail to reject the author's claim. If we made an error in our study (either due to sampling or by random chance), this would be a .

How to Minimize Error in Statistical Studies

(1) Minimizing error due to in sampling

✔️ Select a using a method such as .

  • Example: A band director wants to survey the school on their opinions of this year's half-time show. To perform the survey, he numbers each student in the school with a number and uses a random number generator to select 20 students. This is a GOOD example of how to select a .

❌ Avoid , convenience samples and other sampling methods that may heavily influence your data in one direction.

  • Example: A band director wants to survey the school on their opinions of this year's half-time show. To perform the survey, he chooses the first 20 students who arrive at the Fall Band Concert and asks them if the band's show is satisfactory. This is a BAD example of how to select a since he is only using students who were coming to the band concert (convenience sample).

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-MKpqFI11f73r.png?alt=media&token=8d12e19e-81f8-47d9-81b5-fcc6264f5214

Source: Qualtrics

(2) Minimizing error due to in questioning

❌ Avoid asking questions in a way that will prompt a certain response

  • Example: If the band director wants to know how the half-time show was, he should ask the question in a way such as "Rate the band's half time show on a scale of 1-10" as opposed to saying, "Was the band's half-time show good?" The latter way of asking the question will likely influence the student's to say "yes". This is known as response .

❌ Avoid having someone ask questions that may influence the response (for example, don't have a police officer ask someone if they have ever broken the speed limit)

  • Example: If the band director wants to know how the half-time show was, an anonymous survey would be the best way. If the band director asked the students directly, they may feel more inclined to give it a higher rating (especially if their grade in class could be influenced by the way they feel).

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-5Gn9y2MgmXdR.png?alt=media&token=b7666ae1-da9f-4d0d-83e6-4a59ca1360e1

Source: Survicate

(3) Minimizing error due to

✔️ Use in your experiment to account for any known or suspected .

  • Example: If the band director wants to know how the student body feels about the half-time show, the band director may consider by grade. That would ensure an equal amount of responses from each class and make sure that the age of students was not a confounding variable.

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F-0O8lAt0S6S3M.png?alt=media&token=967030fd-d47b-46f8-9d13-6a138fb32278

Source: Data Science Discovery

🎥 Watch: AP Stats - Errors and Powers of Tests

Key Terms to Review (10)

Bias

: Bias refers to a systematic deviation from the true value or an unfair influence that affects the results of statistical analysis. It can occur during data collection, sampling, or analysis and leads to inaccurate or misleading conclusions.

Blocking

: Blocking is a technique used in experimental design where subjects or items are grouped together based on certain characteristics before being assigned to different treatment groups. It helps control for potential sources of variation and increases precision in estimating treatment effects.

Confounding Variables

: Confounding variables are additional factors that are not accounted for in a study but can influence both the independent and dependent variables. They can lead to incorrect conclusions about cause-and-effect relationships.

Measurement Error

: Measurement error refers to inaccuracies or variations that occur when measuring variables or collecting data from individuals or objects. It can arise from various sources such as faulty instruments, human errors, or inconsistencies in respondents' answers.

Random Sample

: A random sample is a subset of individuals selected from a larger population in such a way that every individual has an equal chance of being chosen. It helps to ensure that the sample is representative of the population.

Sampling Error

: Sampling error refers to the discrepancy between a sample statistic and the true population parameter due to random chance. It occurs when the sample selected is not perfectly representative of the entire population.

Simple random sampling

: Simple random sampling is a type of random sampling where each possible sample size has an equal chance of being selected. It ensures that every combination of individuals has an equal probability of being chosen.

Type I Error

: Type I error refers to rejecting a true null hypothesis. It occurs when we conclude there is a significant difference or relationship between variables when there actually isn't one.

Type II error

: Type II error occurs when we fail to reject a null hypothesis that is actually false. In other words, it's the mistake of accepting the null hypothesis when we should have rejected it.

Volunteer Samples

: Volunteer samples refer to a type of sampling method where individuals self-select to participate in a study or survey. It is not a random sample and may introduce bias into the results.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.