Math for Non-Math Majors

8.1 Gathering and Organizing Data

Citation:

Sampling methods help researchers gather data from a subset of a population. These techniques, like simple random and stratified sampling, ensure the sample represents the whole group. Understanding these methods is crucial for conducting accurate studies and surveys.

Data types fall into two main categories: categorical and quantitative. Categorical data groups information into labels, while quantitative data deals with numerical values. Organizing data into frequency distributions makes it easier to analyze and interpret large datasets.

Sampling Methods and Data Types

Types of sampling methods

Simple random sampling
- Selects members from a population randomly, giving each member an equal chance of being chosen
- Can be performed using a random number generator or by assigning numbers to population members and randomly selecting them (lottery)
Systematic sampling
- Selects members from a population at regular intervals from a list
- Determines the interval by dividing the population size by the desired sample size
- Randomly selects a starting point and then selects every nth member (every 10th person on a list)
Stratified sampling
- Divides a population into subgroups (strata) based on a specific characteristic (age groups)
- Performs simple random sampling within each stratum to ensure representation of all subgroups in the sample
- Useful when subgroups have different characteristics that need to be represented in the sample (income levels)
Cluster sampling
- Divides a population into naturally occurring groups (clusters) such as geographic regions (cities)
- Selects a random sample of clusters and includes all members within the selected clusters in the sample
- Useful when a complete list of population members is not available or when the population is geographically dispersed (households in a city)

Classification of data types

Categorical data
- Groups data into categories or labels that describe a characteristic or attribute
- Examples include gender (male, female), race (White, Black, Asian), and political affiliation (Democrat, Republican)
- Frequency distribution for categorical data lists each category and the number (frequency) of observations in that category
  - Can also include relative frequency (proportion or percentage) for each category (25% Democrats)
Quantitative data
- Measures or counts numerical values that represent quantities or amounts
- Examples include height (inches), weight (pounds), age (years), and test scores (points)
- Frequency distribution for quantitative data lists each unique value and the number (frequency) of observations with that value
  - Can also include relative frequency (proportion or percentage) for each value (10% scored 85 points)

Organizing Quantitative Data

Construction of frequency distributions

Binned frequency distribution
- Groups quantitative data into intervals (bins) of equal width to summarize the distribution of the data
- Useful when there are many unique values in the dataset (ages of a large group of people)
Determining bin width
- Follows the rule of thumb that the number of bins should be between 5 and 20 for optimal readability and interpretation
- Calculates bin width using the formula: $\text{Bin Width} = \frac{\text{Maximum Value} - \text{Minimum Value}}{\text{Number of Bins}}$
  - Example: If the maximum age is 80, the minimum age is 20, and we want 10 bins, the bin width would be $\frac{80 - 20}{10} = 6$ years
Setting bin boundaries
- Sets the lower boundary of the first bin to be less than or equal to the minimum value in the dataset
- Sets the upper boundary of the last bin to be greater than or equal to the maximum value in the dataset
- Ensures bins do not overlap and cover the entire range of data (20-25, 26-31, 32-37)
- Chooses bin boundaries that are easily interpretable values (multiples of 5 or 10)
Constructing the binned frequency distribution
- Determines the number of observations falling within each bin by counting or using software (12 people aged 20-25)
- Can also include relative frequency (proportion or percentage) for each bin (15% of people aged 26-31)

Data Collection and Analysis

Population and Sample
- Population refers to the entire group being studied
- Sample is a subset of the population used to make inferences about the population
Data Collection Methods
- Survey: A method of gathering information from a sample of individuals
- Census: A complete enumeration of an entire population
Bias in Data Collection
- Occurs when the sample is not representative of the population, leading to skewed results
Statistical Analysis
- The process of examining and interpreting collected data to draw meaningful conclusions and make informed decisions

Table of Contents

💯math for non-math majors review

8.1 Gathering and Organizing Data

Sampling Methods and Data Types

Types of sampling methods

Classification of data types

Organizing Quantitative Data

Construction of frequency distributions

Data Collection and Analysis

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes