Fiveable
Fiveable
pep
Fiveable
Fiveable

or

Log in

Find what you need to study


Light

1.1 Introducing Statistics: What Can We Learn from Data?

6 min readdecember 25, 2022

L

Lusine Ghazaryan

L

Lusine Ghazaryan

Attend a live cram event

Review all units live with expert teachers & students

Statistics & Data, Data, Data

Depending on how we use data, the study of statistics is divided into two main areas: descriptive and inferential. In , we describe a situation by collecting, organizing, summarizing, and presenting the data. In , we try to make an inference from our collected data to populations by generalizing, estimating, testing, and making predictions. We will preserve the for the future and will focus on the descriptive branch of statistics here.

Suppose the statistics class just had a test. The teacher checked and recorded the test scores of students. The test scores represent numbers that, in statistical terms, are called data, and the whole set of numbers of the students is called a . But these numbers are meaningless if we don’t know what measures and who those numbers are measured on. Since we know that these are the test scores for the students enrolled in statistics class, these numbers may convey important information about class performance, test difficulty, students’ abilities, content knowledge, and even testing environment if placed in context

The statisticians will call the students as elements, and the score of each student as an observation. Soon These were part of the teacher’s assessment, and she needs to use these data to analyze the content she taught. Imagine if she had over 30 students, it would be hard for her to look at a . It would be much more helpful if she organized the data into tables, drawn graphs, or calculated the average. 

Going Deeper into Data and the "W"s

As mentioned earlier, data can refer to numbers or other subjective labels, and they are useless without their context. One easy way to provide context is to answer the Ws—who, what, when, where, why (if possible), and how—of the dataset we're working with.

(1) Who

Knowing who is involved in generating the data we have at hand provides more information about the cases (circumstances) for which (or whom) data is collected. That being said, there are a lot of ways to describe these individuals involved:

  • refer to individuals who contribute and answer surveys, providing information about themselves or their opinions on a particulat topic. 🦉

  • (or participants) refer to individuals (or sometimes other types of units, such as groups or organizations) involved in experiments, where they are exposed to a treatment or intervention and the effect of that treatment is measured. 👩

  • In addition to human , data can be collected from a wide range of other types of units, such as animals, plants, or inanimate objects. These units are often referred to as . 🌳

It is important to consider the who of data when designing a study or analysis, as the characteristics of the units being studied can affect the results and conclusions that can be drawn. For example, a study that is conducted with a sample of college students may not be generalizable (you'll learn more about generalizability down the road!) to the broader population, while a study that is conducted with a representative sample of the population may be more generalizable.

(2) What

are characteristics or attributes that are measured or observed for each individual in a study. The should have a name that clearly identifies what has been measured, so that the data collected can be easily understood and analyzed. 🔎

There are different types of , including:

  • : These are the that are being measured or observed in a study. The value of the dependent variable is thought to depend on the value of one or more .

  • : These are the that are being manipulated or controlled in a study. The value of the independent variable is thought to influence the value of the dependent variable.

  • : These are that are kept constant or controlled in a study, in order to eliminate their influence on the dependent variable.

It is important to carefully consider the that will be measured in a study, as they will determine the questions that can be answered and the conclusions that can be drawn. It is also important to ensure that the are accurately and consistently measured, in order to ensure the and of the study (we'll learn more about this when we talk about experimental design and set-up!). In addition, section 1.2 goes further in-depth on the specifics of . 😁

(3) When and Where

The more we know about the context, the more we'll understand about the data we have! This is where the when and where of our data come in.

The when refers to the time at which the data was collected, which can have an impact on the values that are recorded. For example, values recorded at different points in time may reflect different trends or patterns. ⏰

The where of data refers to the location where the data was collected, which can also have an impact on the values that are recorded. For example, values recorded in different geographical locations may reflect different social, cultural, or economic factors. 🗺️

Both the when and where of data can be important considerations when interpreting the results of a study or analysis. It is important to carefully consider the context in which the data was collected, as it can help to better understand the meaning and implications of the results.

(4) Why

The questions that we ask of a variable, or the why of our analysis, shape how we think about and approach the variable. The questions we ask can influence the way we define and measure the variable, as well as the type of statistical analysis that we use to analyze the data. 🖥️

For example, if we are interested in understanding the relationship between two (say, amount of sleep and test scores), we might ask questions such as:

  • Is there a relationship between the two ?

  • If there is a relationship, what is the nature of the relationship (e.g. positive, negative, or no relationship)?

  • Is the relationship statistically significant, or could it have occurred by chance?

Answering these types of questions can help us to better understand the data and draw meaningful conclusions. It is important to carefully consider the questions that we want to answer when designing a study or analysis, in order to ensure that the appropriate data is collected and analyzed.

(5) How

The how of data collection refers to the methods or techniques that are used to collect the data, and it can have a significant impact on the quality and of the data.

There are many different methods for collecting data, including surveys, experiments, , and secondary data sources. Each method has its own strengths and limitations, and it is important to choose the most appropriate method for the research question being addressed. 📜

For example, Internet surveys can be a convenient and cost-effective way to collect data from a large number of , but they may also be unreliable due to biases, such as nonresponse bias (where certain groups are more or less likely to respond to the survey) or response bias (where the responses are not accurate or honest). 😔

It is important to carefully consider the how of data collection when designing a study or analysis, in order to ensure that the data is of sufficient quality and to support the research question and conclusions.

Tying these factors together, large data is hard to read and to draw conclusions from it. By constructing tables, drawing graphs, or calculating summary measures such as averages, make up the descriptive portion of statistics. The next few sections will show how to construct tables, graphs, and calculate summary measures.  The two branches of statistics are strongly connected, and the knowledge gained in the first few units is going to help you when you are introduced to many inference procedures.

Key Vocabulary

  • Data

Key Terms to Review (14)

Controlled Variables

: Controlled variables, also called constant variables, refer to factors in an experiment that remain consistent throughout all conditions. By keeping these factors unchanged, researchers can isolate the effects of the independent variable on the dependent variable.

Data Set

: A data set refers to any collection of observations, measurements, or information gathered for analysis.

Dependent Variables

: Dependent variables are the outcomes or results that are being measured or observed in an experiment. They depend on the independent variable and can change as a result of its manipulation.

Descriptive Statistics

: Descriptive statistics involves organizing, summarizing, and presenting data in a meaningful way to describe its main features.

Element

: An element refers to an individual unit or object in a population that is being studied.

Experimental Units

: Experimental units are the individuals or objects on which we collect data in an experiment. They can be people, animals, plants, or any other entities that are being studied.

Independent Variables

: Independent variables are factors or conditions that researchers manipulate or change in an experiment. They have a direct effect on the dependent variable and allow researchers to observe cause-and-effect relationships.

Inferential Statistics

: Inferential statistics involves using sample data to make inferences or draw conclusions about a population.

Observations

: Observations are data points collected during an experiment or study, often involving measurements or recordings.

Reliability

: Reliability refers to the consistency and stability of a measurement or test over time, across different conditions, and among different raters.

Respondents

: Respondents are individuals who participate in surveys or questionnaires by providing answers to specific questions.

Subjects

: Subjects refer to the individuals or objects that are being studied or observed in a statistical experiment. They can be people, animals, plants, or any other entities of interest.

Validity

: Validity refers to the extent to which a measurement or test accurately measures what it is intended to measure.

Variables

: Variables are characteristics or attributes that can vary among individuals or objects in a study. They are the measurable quantities that researchers collect data on during an experiment or survey.

1.1 Introducing Statistics: What Can We Learn from Data?

6 min readdecember 25, 2022

L

Lusine Ghazaryan

L

Lusine Ghazaryan

Attend a live cram event

Review all units live with expert teachers & students

Statistics & Data, Data, Data

Depending on how we use data, the study of statistics is divided into two main areas: descriptive and inferential. In , we describe a situation by collecting, organizing, summarizing, and presenting the data. In , we try to make an inference from our collected data to populations by generalizing, estimating, testing, and making predictions. We will preserve the for the future and will focus on the descriptive branch of statistics here.

Suppose the statistics class just had a test. The teacher checked and recorded the test scores of students. The test scores represent numbers that, in statistical terms, are called data, and the whole set of numbers of the students is called a . But these numbers are meaningless if we don’t know what measures and who those numbers are measured on. Since we know that these are the test scores for the students enrolled in statistics class, these numbers may convey important information about class performance, test difficulty, students’ abilities, content knowledge, and even testing environment if placed in context

The statisticians will call the students as elements, and the score of each student as an observation. Soon These were part of the teacher’s assessment, and she needs to use these data to analyze the content she taught. Imagine if she had over 30 students, it would be hard for her to look at a . It would be much more helpful if she organized the data into tables, drawn graphs, or calculated the average. 

Going Deeper into Data and the "W"s

As mentioned earlier, data can refer to numbers or other subjective labels, and they are useless without their context. One easy way to provide context is to answer the Ws—who, what, when, where, why (if possible), and how—of the dataset we're working with.

(1) Who

Knowing who is involved in generating the data we have at hand provides more information about the cases (circumstances) for which (or whom) data is collected. That being said, there are a lot of ways to describe these individuals involved:

  • refer to individuals who contribute and answer surveys, providing information about themselves or their opinions on a particulat topic. 🦉

  • (or participants) refer to individuals (or sometimes other types of units, such as groups or organizations) involved in experiments, where they are exposed to a treatment or intervention and the effect of that treatment is measured. 👩

  • In addition to human , data can be collected from a wide range of other types of units, such as animals, plants, or inanimate objects. These units are often referred to as . 🌳

It is important to consider the who of data when designing a study or analysis, as the characteristics of the units being studied can affect the results and conclusions that can be drawn. For example, a study that is conducted with a sample of college students may not be generalizable (you'll learn more about generalizability down the road!) to the broader population, while a study that is conducted with a representative sample of the population may be more generalizable.

(2) What

are characteristics or attributes that are measured or observed for each individual in a study. The should have a name that clearly identifies what has been measured, so that the data collected can be easily understood and analyzed. 🔎

There are different types of , including:

  • : These are the that are being measured or observed in a study. The value of the dependent variable is thought to depend on the value of one or more .

  • : These are the that are being manipulated or controlled in a study. The value of the independent variable is thought to influence the value of the dependent variable.

  • : These are that are kept constant or controlled in a study, in order to eliminate their influence on the dependent variable.

It is important to carefully consider the that will be measured in a study, as they will determine the questions that can be answered and the conclusions that can be drawn. It is also important to ensure that the are accurately and consistently measured, in order to ensure the and of the study (we'll learn more about this when we talk about experimental design and set-up!). In addition, section 1.2 goes further in-depth on the specifics of . 😁

(3) When and Where

The more we know about the context, the more we'll understand about the data we have! This is where the when and where of our data come in.

The when refers to the time at which the data was collected, which can have an impact on the values that are recorded. For example, values recorded at different points in time may reflect different trends or patterns. ⏰

The where of data refers to the location where the data was collected, which can also have an impact on the values that are recorded. For example, values recorded in different geographical locations may reflect different social, cultural, or economic factors. 🗺️

Both the when and where of data can be important considerations when interpreting the results of a study or analysis. It is important to carefully consider the context in which the data was collected, as it can help to better understand the meaning and implications of the results.

(4) Why

The questions that we ask of a variable, or the why of our analysis, shape how we think about and approach the variable. The questions we ask can influence the way we define and measure the variable, as well as the type of statistical analysis that we use to analyze the data. 🖥️

For example, if we are interested in understanding the relationship between two (say, amount of sleep and test scores), we might ask questions such as:

  • Is there a relationship between the two ?

  • If there is a relationship, what is the nature of the relationship (e.g. positive, negative, or no relationship)?

  • Is the relationship statistically significant, or could it have occurred by chance?

Answering these types of questions can help us to better understand the data and draw meaningful conclusions. It is important to carefully consider the questions that we want to answer when designing a study or analysis, in order to ensure that the appropriate data is collected and analyzed.

(5) How

The how of data collection refers to the methods or techniques that are used to collect the data, and it can have a significant impact on the quality and of the data.

There are many different methods for collecting data, including surveys, experiments, , and secondary data sources. Each method has its own strengths and limitations, and it is important to choose the most appropriate method for the research question being addressed. 📜

For example, Internet surveys can be a convenient and cost-effective way to collect data from a large number of , but they may also be unreliable due to biases, such as nonresponse bias (where certain groups are more or less likely to respond to the survey) or response bias (where the responses are not accurate or honest). 😔

It is important to carefully consider the how of data collection when designing a study or analysis, in order to ensure that the data is of sufficient quality and to support the research question and conclusions.

Tying these factors together, large data is hard to read and to draw conclusions from it. By constructing tables, drawing graphs, or calculating summary measures such as averages, make up the descriptive portion of statistics. The next few sections will show how to construct tables, graphs, and calculate summary measures.  The two branches of statistics are strongly connected, and the knowledge gained in the first few units is going to help you when you are introduced to many inference procedures.

Key Vocabulary

  • Data

Key Terms to Review (14)

Controlled Variables

: Controlled variables, also called constant variables, refer to factors in an experiment that remain consistent throughout all conditions. By keeping these factors unchanged, researchers can isolate the effects of the independent variable on the dependent variable.

Data Set

: A data set refers to any collection of observations, measurements, or information gathered for analysis.

Dependent Variables

: Dependent variables are the outcomes or results that are being measured or observed in an experiment. They depend on the independent variable and can change as a result of its manipulation.

Descriptive Statistics

: Descriptive statistics involves organizing, summarizing, and presenting data in a meaningful way to describe its main features.

Element

: An element refers to an individual unit or object in a population that is being studied.

Experimental Units

: Experimental units are the individuals or objects on which we collect data in an experiment. They can be people, animals, plants, or any other entities that are being studied.

Independent Variables

: Independent variables are factors or conditions that researchers manipulate or change in an experiment. They have a direct effect on the dependent variable and allow researchers to observe cause-and-effect relationships.

Inferential Statistics

: Inferential statistics involves using sample data to make inferences or draw conclusions about a population.

Observations

: Observations are data points collected during an experiment or study, often involving measurements or recordings.

Reliability

: Reliability refers to the consistency and stability of a measurement or test over time, across different conditions, and among different raters.

Respondents

: Respondents are individuals who participate in surveys or questionnaires by providing answers to specific questions.

Subjects

: Subjects refer to the individuals or objects that are being studied or observed in a statistical experiment. They can be people, animals, plants, or any other entities of interest.

Validity

: Validity refers to the extent to which a measurement or test accurately measures what it is intended to measure.

Variables

: Variables are characteristics or attributes that can vary among individuals or objects in a study. They are the measurable quantities that researchers collect data on during an experiment or survey.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.