← back to data, inference, and decisions

data, inference, and decisions unit 1 study guides

data, inference, and decisions: introduction

1.1

Overview of the course and its objectives

1.2

The role of data in decision-making processes

1.3

Types of data and their characteristics

1.4

Introduction to data visualization and exploration

unit 1 review

Data, Inference, and Decisions is a foundational course in statistical analysis and decision-making. It covers key concepts like data types, collection methods, probability, and inferential techniques, providing students with tools to analyze information and draw meaningful conclusions. The course also explores decision-making frameworks, data visualization, and ethical considerations in data analysis. Students learn to apply statistical methods, interpret results, and make informed decisions while considering potential biases and ethical implications.

Key Concepts and Terminology

Data refers to raw facts, observations, or measurements collected through various methods
Information is data that has been processed, organized, and given context to provide meaning and value
Variables are characteristics or attributes of interest that can take on different values across observations
- Quantitative variables are numerical and can be measured or counted (age, height, income)
- Qualitative variables are categorical and describe qualities or characteristics (gender, color, occupation)
Descriptive statistics summarize and describe the main features of a dataset (mean, median, mode, standard deviation)
Inferential statistics use sample data to make generalizations or predictions about a larger population
Probability is the likelihood of an event occurring, expressed as a number between 0 and 1
Hypothesis testing is a statistical method used to determine whether there is enough evidence to support a claim about a population parameter

Types of Data and Their Characteristics

Nominal data consists of categories with no inherent order or numerical value (blood type, country of origin)
Ordinal data has categories with a natural order but no consistent scale between values (education level, customer satisfaction ratings)
Interval data has a consistent scale between values but no true zero point (temperature in Celsius or Fahrenheit)
Ratio data has a consistent scale and a true zero point, allowing for meaningful ratios between values (height, weight, income)
Discrete data can only take on specific, countable values (number of children in a family, number of defective products)
Continuous data can take on any value within a range and is typically measured (time taken to complete a task, weight of an object)
Cross-sectional data is collected at a single point in time (a survey of consumer preferences)
Time series data is collected over a period of time at regular intervals (daily stock prices, monthly sales figures)

Data Collection Methods

Surveys involve asking participants a series of questions to gather information about their opinions, behaviors, or characteristics
- Surveys can be administered online, by phone, or in person
- Questions should be clear, unbiased, and relevant to the research objectives
Interviews are one-on-one conversations with participants to gather detailed, qualitative data
- Interviews can be structured (following a set of predetermined questions) or unstructured (allowing for more open-ended exploration of topics)
Observations involve watching and recording the behavior of individuals or groups in a natural or controlled setting
Experiments manipulate one or more variables to determine their effect on an outcome variable
- Participants are typically divided into treatment and control groups
- Randomization helps ensure that any differences between groups are due to the manipulation rather than pre-existing differences
Secondary data is data that has been previously collected by someone else for another purpose (government statistics, academic research, company reports)

Introduction to Probability and Statistics

Probability is the foundation of inferential statistics and helps quantify uncertainty
The probability of an event (A) is denoted as P(A) and ranges from 0 (impossible) to 1 (certain)
Independent events are not influenced by the occurrence of other events (rolling a die multiple times)
Dependent events are influenced by the occurrence of other events (drawing cards from a deck without replacement)
Conditional probability is the probability of an event (A) occurring given that another event (B) has already occurred, denoted as P(A|B)
The law of large numbers states that as the number of trials increases, the average of the results will converge to the expected value
The central limit theorem states that the distribution of sample means will approximate a normal distribution, regardless of the shape of the population distribution, as the sample size increases

Basic Inferential Techniques

Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the entire population
- Simple random sampling ensures that each individual has an equal chance of being selected
- Stratified sampling divides the population into subgroups (strata) and then randomly samples from each subgroup
Confidence intervals provide a range of values that are likely to contain the true population parameter with a certain level of confidence (95% confidence interval)
Hypothesis testing involves comparing a sample statistic to a hypothesized population parameter to determine whether there is enough evidence to support or reject the claim
- The null hypothesis (H0) represents the status quo or no effect
- The alternative hypothesis (Ha) represents the research claim or expected effect
T-tests compare the means of two groups to determine whether they are significantly different from each other
ANOVA (analysis of variance) tests compare the means of three or more groups to determine whether they are significantly different from each other

Decision-Making Frameworks

Decision trees visually represent the possible outcomes of a series of decisions, along with their associated probabilities and values
Expected value is the average outcome of a decision, calculated by multiplying each possible outcome by its probability and summing the results
Sensitivity analysis examines how changes in the input variables affect the outcome of a decision
Cost-benefit analysis compares the expected costs and benefits of a decision to determine whether it is worthwhile
Multi-criteria decision analysis (MCDA) evaluates alternatives based on multiple, often conflicting, criteria
- Criteria are assigned weights based on their relative importance
- Alternatives are scored on each criterion and then combined using the weights to determine an overall score

Data Visualization and Interpretation

Data visualization helps communicate complex data in a clear and accessible format
Bar charts compare the values of different categories using horizontal or vertical bars
Line graphs show trends or changes over time by connecting data points with lines
Scatter plots display the relationship between two continuous variables using points on a coordinate plane
Pie charts show the proportions of different categories within a whole using slices of a circle
Histograms display the distribution of a continuous variable using bars that represent the frequency of values within each bin
Box plots summarize the distribution of a continuous variable using quartiles and outliers
Heat maps use color intensity to represent the magnitude of values in a matrix or grid

Ethical Considerations in Data Analysis

Privacy concerns arise when collecting, storing, and analyzing personal or sensitive data
- Data should be anonymized or aggregated to protect individual identities
- Informed consent should be obtained from participants before collecting their data
Bias can enter the data analysis process at various stages, from data collection to interpretation
- Sampling bias occurs when the sample is not representative of the population
- Measurement bias occurs when the data collection instruments or methods are flawed
Transparency involves being open and clear about the data sources, methods, and limitations of the analysis
Reproducibility ensures that the analysis can be replicated by others using the same data and methods
Responsible use of data and results involves considering the potential consequences and ensuring that they are not misused or misinterpreted
Fairness and non-discrimination require that data analysis does not perpetuate or amplify existing biases or disparities
- Algorithms and models should be regularly audited for fairness and adjusted as needed

data, inference, and decisions unit 1 study guides

unit 1 review

Key Concepts and Terminology

Types of Data and Their Characteristics

Data Collection Methods

Introduction to Probability and Statistics

Basic Inferential Techniques

Decision-Making Frameworks

Data Visualization and Interpretation

Ethical Considerations in Data Analysis

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

hs classes

Study Content & Tools

Company

Resources