2 min read•june 3, 2020

Peter Cao

The simplest form of regression is linear regression where we find a linear equation of the form **ŷ=a+bx**, where a is the y-intercept and b is the slope. Given a scatterplot, there can be infinitely many linear regression approximations, but there is only one best linear regression model, and this is called the **least squares regression line (LSRL)**. We will discuss this and how to find the formula in a couple of sections.

It is also important to note here that our ŷ represents our **predicted response variable** value, while x represents our explanatory value variable. Since x is given in a data set, it is not necessarily predicted, but our y-value is always predicted from a least squares regression line.

Given a linear regression equation, we can find values of the response variable for values of the explanatory variable not given in the data set. This is more accurate if you stay in the domain of the set of values of the explanatory variable. If you go outside this domain, it is now called **extrapolation**, and farther outside the domain you go, the less accurate your predictions will be.

image courtesy of: statsforstem.org comfort

In a recent model built using data for 19-24 year olds, a least squares regression line is developed that says that an individual's comfort level with technology (on a scale of 1-10) can be predicted using the least squares regression line: ŷ=0.32x+0.67, where ŷ is the predicted comfort level and x represents one's age.

Predict what the comfort level would be of a 45 year old and why this response does not make sense.

