Linear models are the foundation of regression analysis. They give you a structured way to describe how one or more input variables relate to an outcome, and they show up constantly across statistics, economics, engineering, and the sciences. This section covers the core components, the general form of the equation, and how to interpret what each piece tells you.

Linear model basics

Key components of linear models

A linear model describes the relationship between one or more independent variables and a dependent variable using a mathematical equation. Every linear model is built from the same four components:

Dependent variable (also called the response variable): the outcome you're trying to predict or explain.
Independent variables (also called predictor or explanatory variables): the factors you believe influence the dependent variable.
Coefficients (parameters): numbers that quantify how much the dependent variable changes when a given independent variable changes by one unit, holding everything else constant.
Error term ( $\varepsilon$ ): captures all the variability in the dependent variable that the model doesn't explain. No model is perfect, and $\varepsilon$ is where that imperfection lives.

Assumptions and general form

Linear models assume a linear relationship between the predictors and the response. "Linear" here means that a one-unit change in any independent variable produces a constant, proportional change in the dependent variable (scaled by that variable's coefficient).

The general form is:

$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p + \varepsilon$

$y$ is the dependent variable
$x_1, x_2, \dots, x_p$ are the independent variables
$\beta_0, \beta_1, \beta_2, \dots, \beta_p$ are the coefficients
$\varepsilon$ is the error term

Notice that the equation is a weighted sum of the predictors plus a constant. That additive structure is what makes it "linear in the parameters."

Applications of linear models

Key components of linear models, Types of Regression

Uses in various fields

Linear models show up wherever researchers need to quantify relationships between variables:

Economics: modeling how price affects quantity demanded, or how GDP growth relates to unemployment rates.
Finance: estimating how market indicators predict stock returns, or measuring portfolio risk.
Social sciences: studying how education level and income predict health outcomes, or how demographic factors relate to voter turnout.
Engineering and natural sciences: relating physical properties like temperature and pressure to system performance or product quality.

Benefits and importance

Linear models provide a framework for hypothesis testing, prediction, and decision-making. They let you identify which predictors matter, quantify how strong each relationship is, and generate predictions from observed data.

Their biggest practical advantage is interpretability. Unlike more complex modeling approaches, every coefficient in a linear model has a direct, readable meaning. That makes it straightforward to communicate results to non-technical audiences and to use those results for real decisions.

Dependent vs independent variables

Key components of linear models, Simple Linear Regression Analysis - ReliaWiki

Defining dependent and independent variables

The distinction between dependent and independent variables comes from the role each plays in your analysis. The dependent variable is the outcome you care about. The independent variables are the factors you think drive or explain that outcome.

Which variable is "dependent" and which is "independent" isn't a property of the data itself. It depends on the research question you're asking. For example, income could be a dependent variable in a study of how education affects earnings, or an independent variable in a study of how earnings affect health.

Relationship and representation

With a single predictor, you get simple linear regression:

$y = \beta_0 + \beta_1 x + \varepsilon$

Here $x$ is the lone independent variable and $y$ is the dependent variable. This is the model you'll work with most in this unit.

When you add more predictors, you move to multiple linear regression:

$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p + \varepsilon$

The logic is the same; you just have more terms contributing to the prediction of $y$ .

Coefficient interpretation

Meaning and interpretation

Each coefficient tells you something specific:

Intercept ( $\beta_0$ ): the predicted value of $y$ when every independent variable equals zero. Think of it as the baseline or starting point of the model. (In many real-world problems, all-zeros may not be a meaningful scenario, so interpret $\beta_0$ with caution.)
Slope coefficients ( $\beta_1, \beta_2, \dots, \beta_p$ ): each one tells you how much $y$ changes for a one-unit increase in that predictor, holding all other predictors constant.

The sign of a coefficient tells you the direction:

A positive coefficient means $y$ increases as that predictor increases.
A negative coefficient means $y$ decreases as that predictor increases.

For a concrete example: if you're predicting monthly rent ( $y$ ) from apartment size in square feet ( $x$ ), and $\beta_1 = 1.50$ , that means each additional square foot is associated with a $1.50 increase in rent, all else equal.

Estimation and importance

The magnitude of a coefficient reflects the strength of that predictor's influence on $y$ . Larger absolute values mean a stronger effect per unit change.

Coefficients are typically estimated using ordinary least squares (OLS), which finds the values of $\beta_0, \beta_1, \dots, \beta_p$ that minimize the sum of squared differences between observed and predicted values of $y$ .

One caution: you can't directly compare the magnitudes of coefficients when predictors are measured on different scales. A coefficient of 500 for a variable measured in kilometers isn't necessarily "more important" than a coefficient of 0.3 for a variable measured in millions of dollars. Standardized coefficients (sometimes called beta coefficients) rescale everything to a common metric so you can compare relative importance across predictors.

2,589 studying →