Fiveable

🥖Linear Modeling Theory Unit 5 Review

QR code for Linear Modeling Theory practice questions

5.4 Statistical Inference Using Matrix Approach

5.4 Statistical Inference Using Matrix Approach

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🥖Linear Modeling Theory
Unit & Topic Study Guides

Variance Estimation with Matrices

Estimating the Error Term Variance

The error term in a linear regression model captures the variability in the response that the predictors don't explain. Estimating its variance is the foundation for everything else in this section: confidence intervals, hypothesis tests, and standard errors all depend on it.

In matrix notation, the estimated variance of the error term is:

σ^2=RSSnp\hat{\sigma}^2 = \frac{RSS}{n - p}

where:

  • RSSRSS is the residual sum of squares
  • nn is the number of observations
  • pp is the number of parameters in the model (including the intercept)

We divide by npn - p rather than nn because we've used up pp degrees of freedom estimating the coefficients. This correction makes σ^2\hat{\sigma}^2 an unbiased estimator of the true error variance.

Calculating the Residual Sum of Squares

The residual vector is the difference between observed and fitted values. In matrix form:

e^=yXβ^\hat{e} = y - X\hat{\beta}

The residual sum of squares is then:

RSS=e^e^=(yXβ^)(yXβ^)RSS = \hat{e}'\hat{e} = (y - X\hat{\beta})'(y - X\hat{\beta})

where yy is the response vector, XX is the design matrix, and β^=(XX)1Xy\hat{\beta} = (X'X)^{-1}X'y is the vector of estimated coefficients.

Example: In a simple linear regression with n=50n = 50 observations and p=2p = 2 parameters (intercept and slope), if RSS=100RSS = 100:

σ^2=100502=100482.08\hat{\sigma}^2 = \frac{100}{50 - 2} = \frac{100}{48} \approx 2.08

This estimate feeds directly into the standard errors you'll use for confidence intervals and tests below.

Confidence Intervals for Regression Parameters

Estimating the Error Term Variance, Data Analysis with R

Constructing Confidence Intervals

A confidence interval gives you a range of plausible values for a true regression coefficient, based on your data and a chosen confidence level.

The key matrix quantity here is the variance-covariance matrix of the estimated coefficients:

Var(β^)=σ^2(XX)1\text{Var}(\hat{\beta}) = \hat{\sigma}^2 (X'X)^{-1}

The standard error of a single coefficient β^j\hat{\beta}_j comes from the jj-th diagonal element of this matrix:

SE(β^j)=σ^2[(XX)1]jjSE(\hat{\beta}_j) = \sqrt{\hat{\sigma}^2 \cdot [(X'X)^{-1}]_{jj}}

To build the confidence interval:

  1. Compute β^j\hat{\beta}_j from the least squares solution

  2. Calculate SE(β^j)SE(\hat{\beta}_j) using the formula above

  3. Look up the critical value tα/2,npt_{\alpha/2,\, n-p} from the t-distribution with npn - p degrees of freedom

  4. Form the interval: β^j±tα/2,npSE(β^j)\hat{\beta}_j \pm t_{\alpha/2,\, n-p} \cdot SE(\hat{\beta}_j)

Interpreting Confidence Intervals

A (1α)×100%(1 - \alpha) \times 100\% confidence interval means that if you repeated the sampling process many times, roughly (1α)×100%(1 - \alpha) \times 100\% of the constructed intervals would contain the true parameter value. It does not mean there's a (1α)(1 - \alpha) probability that this particular interval contains the true value; the true value is fixed, and the interval is random.

Example: A 95% confidence interval for a slope parameter of (0.5,  1.2)(0.5,\; 1.2) tells you that, based on the data, values between 0.5 and 1.2 are plausible for the true slope. Since the interval doesn't include 0, you also have evidence that this predictor has a nonzero effect.

Hypothesis Testing with Matrices

Estimating the Error Term Variance, Introduction to Assessing the Fit of a Line | Concepts in Statistics

Conducting Hypothesis Tests

Hypothesis tests let you assess whether a particular regression coefficient is significantly different from zero (or some other hypothesized value). The standard setup:

  • Null hypothesis: H0:βj=0H_0: \beta_j = 0 (the predictor has no linear effect)
  • Alternative hypothesis: Ha:βj0H_a: \beta_j \neq 0 (two-sided), or Ha:βj>0H_a: \beta_j > 0 / Ha:βj<0H_a: \beta_j < 0 (one-sided)

Steps for the t-test of a single coefficient:

  1. Compute the estimated coefficient β^j\hat{\beta}_j
  2. Compute its standard error SE(β^j)=σ^2[(XX)1]jjSE(\hat{\beta}_j) = \sqrt{\hat{\sigma}^2 \cdot [(X'X)^{-1}]_{jj}}
  3. Calculate the test statistic: t=β^jSE(β^j)t = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)}
  4. Compare to the t-distribution with npn - p degrees of freedom to obtain a p-value

Evaluating Hypothesis Test Results

Under H0H_0, the test statistic follows a tnpt_{n-p} distribution. You reject H0H_0 when the p-value falls below your chosen significance level α\alpha.

Example: Suppose β^j=0.8\hat{\beta}_j = 0.8 and SE(β^j)=0.2SE(\hat{\beta}_j) = 0.2. Then:

t=0.80.2=4.0t = \frac{0.8}{0.2} = 4.0

With, say, 48 degrees of freedom, a tt-value of 4.0 gives a two-sided p-value well below 0.05. You'd reject H0H_0 and conclude this coefficient is statistically significant.

Notice the connection to confidence intervals: rejecting H0:βj=0H_0: \beta_j = 0 at level α\alpha is equivalent to 0 falling outside the (1α)(1 - \alpha) confidence interval for βj\beta_j. These are two views of the same inference.

Interpreting Matrix-Based Inference

Understanding Regression Coefficients

Each estimated coefficient β^j\hat{\beta}_j represents the expected change in the response variable for a one-unit increase in predictor xjx_j, holding all other predictors constant. That "holding constant" part is critical in multiple regression because the matrix approach simultaneously accounts for all predictors.

Example: In a model predicting house prices, if the coefficient for square footage is 50, each additional square foot is associated with a $50 increase in price, assuming the other predictors (number of bedrooms, location, etc.) stay the same.

Assessing Model Fit and Precision

Confidence interval width tells you about estimation precision. Narrow intervals mean your data provide a tight estimate of the true coefficient; wide intervals signal more uncertainty, often due to small sample size, high collinearity, or large error variance.

For overall model performance, the coefficient of determination summarizes how much variability the predictors explain:

R2=1RSSTSSR^2 = 1 - \frac{RSS}{TSS}

where TSS=(yyˉ)(yyˉ)TSS = (y - \bar{y})'(y - \bar{y}) is the total sum of squares. An R2R^2 of 0.85 means the predictors account for 85% of the variation in the response.

In multiple regression, adjusted R2R^2 is generally preferred because it penalizes for adding predictors that don't meaningfully improve the fit:

Radj2=1RSS/(np)TSS/(n1)R^2_{adj} = 1 - \frac{RSS/(n-p)}{TSS/(n-1)}

The matrix formulation ties all of this together compactly. The same (XX)1(X'X)^{-1} matrix that gives you β^\hat{\beta} also gives you the standard errors, which in turn give you confidence intervals and test statistics. That's the real payoff of the matrix approach: one coherent framework for estimation and inference.