# 1.0 Unit 1 Overview: Exploring One-Variable Data

Josh Argo

## What is Statistics?

Statistics is all about data. We collect sets of data, analyze our data and ultimately, use our data sets to make inferences about larger sets of individuals in our population.
We are going to be focusing on univariate data, or one-variable data in this unit. This is data that only has one aspect of it that is being measured. Among our sets of univariate data, we will divide our data sets into two different types: quantitative and categorical.

## Quantitative Data

Have you ever wondered what the average AP score was? Or perhaps the average number of bananas purchased at the grocery store per bunch? 🤔🤔🤔 Yeah, me neither. 🤣🤣🤣 However, both of these are examples of quantitative data because each individual is assigned a quantity. Whether it is assigning each test taker an AP score, or each banana bunch purchased a number of bananas, each individual being measures is assigned a number. One of the big giveaways for quantitative data is that we can take the mean, or the average of the data set. In other words, quantitative data is average-able.
Quantitative data uses means, or averages, to make inference

## Categorical Data

On the flip side, we have categorical data. Have you ever asked a group of people whether they liked coffee? What about what their favorite vegetable is? How about if they prefer 🍩 or 🍪 for dessert? Each of these types of surveys would be examples of categorical data. The reason why is because each individual chooses a category: do you fall into the 🍩 or 🍪 category? Because of this separation of data, it is impossible to calculate the average dessert preference. After all, it would not make sense to make a statement like "the average dessert preference is a cookie." Instead, we typically measure categorical datasets using measures like proportions. It makes a lot more sense to make a statement like, "the proportion of people who prefer cookies is 0.65."
Categorical data uses percentages, or proportions, to make inference.

## Context of Data

One of the major things that is going to feel very different for this course as opposed to other mathematics courses you have taken in the past is the way in which you record your answers. In an Algebra or Calculus course, it is sufficient to say "x=5" when that is your answer. In AP Statistics, it is a good idea to go ahead and get in a habit of tying your answer to whatever the specific context of the problem you are working on. Instead of simply saying, "x=5" make your answer more specific by saying things like "the average number of bananas per bunch is 5."
Our goal in statistics is not just to find the correct answer, but to communicate our findings to our audience so that the answer is useful in making further predictions.

## Describing Data

Perhaps the biggest concept and skill of this first unit is being able to describe data.
In quantitive data, this consists of 4 main parts: center, outliers, spread and shape. It is also important to include context in your answer. For example, if we had a set of data regarding the amount of bananas per bunch purchased, a model response may look like the following: "The mean number of bananas purchased was 5 bananas, There was one outlier when a customer purchased a bunch of 12 bananas. The shape of our data distribution was fairly symmetric. The range of bananas per bunch was 10, with the largest bunch being 12 and the smallest bunch being 2."
In categorical data, this process may look different. It is usually more valuable with context data to discuss which category was most likely to happen and which was least likely to happen. For example, a description could look like this: "Our most likely outcome was people who prefer donuts with a proportion of 0.45 and our least likely outcome was people who prefer cookies with a proportion of 0.15." Sometimes it is also beneficial with categorical data to discuss raw counts rather than proportions. However, it is more likely that the AP exam will ask you to describe a distribution of a quantitative data set rather than a categorical data set. For more information on content from Unit 1, check the link below.
## Resources:

