Fiveable

🥖Linear Modeling Theory Unit 4 Review

QR code for Linear Modeling Theory practice questions

4.4 Transformations and Weighted Least Squares

4.4 Transformations and Weighted Least Squares

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🥖Linear Modeling Theory
Unit & Topic Study Guides

Data Transformations for Linear Regression

Purpose and Application of Data Transformations

Data transformations apply mathematical operations to your variables so the data better satisfies the assumptions of linear regression. When residual plots reveal non-normality, heteroscedasticity, or a nonlinear mean function, a well-chosen transformation can often fix the problem without abandoning the linear modeling framework.

Common transformations include logarithmic, square root, reciprocal, and power transformations. You can apply them to the response variable YY, the predictor XX, or both, depending on which assumption is violated and how the data behaves.

A transformation can do one or more of the following:

  • Stabilize the variance of residuals (address heteroscedasticity)
  • Linearize a curved relationship between XX and YY
  • Pull in long tails to make residuals more normally distributed

The choice of transformation depends on the nature of the data, the pattern you see in diagnostic plots, and how you want to interpret the resulting model coefficients. Keep in mind that once you transform a variable, your regression coefficients describe the relationship on the transformed scale, so interpretation requires extra care.

Types and Selection of Data Transformations

  • Logarithmic (log(Y)\log(Y) or log(X)\log(X)): Best for positively skewed data or when variance increases with the mean. Very common for financial data, biological measurements, and other strictly positive quantities.
  • Square root (Y\sqrt{Y}): Often used for count data, where the variance tends to be proportional to the mean.
  • Reciprocal (1/Y1/Y): Useful for inverse relationships between XX and YY, or when variance decreases as YY increases.
  • Box-Cox power transformation (YλY^{\lambda}): A family of transformations indexed by λ\lambda. The procedure estimates the value of λ\lambda that best achieves normality and constant variance simultaneously. Special cases include λ=1\lambda = 1 (no transformation), λ=0.5\lambda = 0.5 (square root), λ=0\lambda = 0 (log, by convention), and λ=1\lambda = -1 (reciprocal).

To choose among these, use exploratory tools: residual-vs-fitted plots reveal variance patterns, Q-Q plots show departures from normality, and scatterplots of YY against XX expose nonlinearity. For a more systematic approach, the Box-Cox procedure profiles the likelihood over λ\lambda and provides a confidence interval for the best power.

Addressing Non-normality and Heteroscedasticity

Identifying and Addressing Non-normality

Non-normality means the residuals deviate from a normal distribution. This matters because confidence intervals, prediction intervals, and hypothesis tests all rely on the normality assumption. With non-normal residuals, your p-values and intervals can be misleading.

How to detect it:

  • Q-Q plot: Plot residuals against theoretical normal quantiles. Systematic curvature or heavy tails indicate non-normality.
  • Histogram of residuals: Look for strong skewness or multiple modes.
  • Formal tests: The Shapiro-Wilk test or Kolmogorov-Smirnov test provides a p-value, though with large samples these tests can flag trivially small departures.

How to fix it:

Apply a transformation to YY. If the residual distribution is right-skewed, a log or square root transformation often helps. The Box-Cox procedure can guide you to the best power. After transforming, re-examine the Q-Q plot to confirm improvement.

Identifying and Addressing Heteroscedasticity

Heteroscedasticity means the spread of residuals changes across the range of fitted values. For example, residuals might fan out as Y^\hat{Y} increases. This violates the constant-variance assumption, making OLS standard errors unreliable.

How to detect it:

  • Residuals vs. fitted values plot: Look for a funnel shape or any systematic change in spread.
  • Formal tests: The Breusch-Pagan test regresses squared residuals on the predictors; a significant result indicates heteroscedasticity. The White test is a more general alternative.

How to fix it:

  • If variance increases with the mean, try log(Y)\log(Y).
  • If variance is proportional to the mean (common with counts), try Y\sqrt{Y}.
  • You can also transform XX, or both XX and YY.
  • If transformations distort the model's interpretability or don't fully resolve the issue, weighted least squares (below) is the natural alternative.
Purpose and Application of Data Transformations, Linear Regression (2 of 4) | Concepts in Statistics

Weighted Least Squares Regression

Concept and Rationale

Weighted least squares (WLS) is a modification of OLS designed specifically for heteroscedastic data. Instead of treating every observation equally, WLS assigns a weight to each observation that reflects how precise it is.

The core idea: observations with smaller variance carry more information, so they should count more in the fit. Observations with larger variance are noisier, so they should count less.

Formally, OLS minimizes i=1nei2\sum_{i=1}^{n} e_i^2, while WLS minimizes:

i=1nwiei2\sum_{i=1}^{n} w_i \, e_i^2

where wiw_i is the weight for observation ii and ei=YiY^ie_i = Y_i - \hat{Y}_i is the residual. The standard choice is to set weights inversely proportional to the variance of each observation:

wi=1Var(ϵi)w_i = \frac{1}{\text{Var}(\epsilon_i)}

This gives high-precision points large weights and low-precision points small weights. When the weights are correctly specified, WLS produces estimates that are unbiased and more efficient (lower variance) than OLS under heteroscedasticity. It can also reduce the influence of outliers that happen to fall in high-variance regions.

Implementation of Weighted Least Squares Regression

Fitting a WLS model involves several steps:

  1. Fit an initial OLS regression and examine residual plots or run a Breusch-Pagan/White test to confirm heteroscedasticity.

  2. Identify the variance structure. Determine how the residual variance relates to the predictors or fitted values. For example, you might find that Var(ϵi)Xi\text{Var}(\epsilon_i) \propto X_i or Var(ϵi)Y^i2\text{Var}(\epsilon_i) \propto \hat{Y}_i^2.

  3. Define the weights. Set wiw_i as the inverse of the estimated variance function:

    • If variance is proportional to XiX_i, use wi=1/Xiw_i = 1/X_i.
    • If variance is proportional to Y^i2\hat{Y}_i^2, use wi=1/Y^i2w_i = 1/\hat{Y}_i^2 (using fitted values from the initial OLS).
  4. Fit the weighted regression. Most statistical software (R, Python, SAS) has built-in WLS options where you supply the weight vector directly. Conceptually, the procedure multiplies each observation's YiY_i, XiX_i, and residual by wi\sqrt{w_i} and then runs OLS on the transformed data.

  5. Check diagnostics on the weighted residuals. Plot weighted residuals vs. fitted values to verify that the variance is now approximately constant. Also check normality with a Q-Q plot.

  6. Interpret coefficients carefully. The regression coefficients from WLS are on the original scale of YY and XX (unlike transformation approaches), which can make interpretation more straightforward. However, if you combined WLS with a transformation, remember that coefficients reflect the transformed scale.

Weighted Least Squares vs. Ordinary Least Squares

Assumptions and Limitations of Ordinary Least Squares

OLS relies on four key assumptions: linearity, independence of observations, normality of residuals, and homoscedasticity (constant variance). When homoscedasticity is violated:

  • The coefficient estimates themselves remain unbiased, but they are no longer the most efficient (minimum-variance) estimates available.
  • The standard errors computed by OLS are wrong, which means confidence intervals have incorrect coverage and hypothesis tests have incorrect Type I error rates.
  • OLS gives equal weight to every observation, so noisy data points pull the fitted line just as much as precise ones.
  • Outliers in high-variance regions can have a disproportionate effect on the fit.

Advantages of Weighted Least Squares over Ordinary Least Squares

WLS directly addresses heteroscedasticity by incorporating variance information into the estimation. Here's what you gain:

  • More efficient estimates. By down-weighting noisy observations and up-weighting precise ones, WLS achieves lower variance for the estimated coefficients than OLS.
  • Valid inference. Standard errors, confidence intervals, and hypothesis tests from WLS are trustworthy when the weight function is correctly specified, whereas OLS inference is distorted under heteroscedasticity.
  • Narrower confidence intervals and more powerful tests compared to OLS, because the estimator makes better use of the available information.
  • Reduced outlier influence. Observations in high-variance regions naturally receive smaller weights, limiting their pull on the regression line.
  • Flexibility. WLS lets you incorporate external knowledge about measurement precision. For instance, if some observations are averages of many measurements and others are single readings, you can weight accordingly.

The main caveat is that WLS requires you to correctly specify the weight function. If the assumed variance structure is wrong, WLS can actually perform worse than OLS. In practice, you often estimate the variance function from the data, which introduces some uncertainty. Always check the weighted residual plots after fitting to confirm the weights are doing their job.