Variance Estimation with Matrices
Estimating the Error Term Variance
The error term in a linear regression model captures the variability in the response that the predictors don't explain. Estimating its variance is the foundation for everything else in this section: confidence intervals, hypothesis tests, and standard errors all depend on it.
In matrix notation, the estimated variance of the error term is:
where:
- is the residual sum of squares
- is the number of observations
- is the number of parameters in the model (including the intercept)
We divide by rather than because we've used up degrees of freedom estimating the coefficients. This correction makes an unbiased estimator of the true error variance.
Calculating the Residual Sum of Squares
The residual vector is the difference between observed and fitted values. In matrix form:
The residual sum of squares is then:
where is the response vector, is the design matrix, and is the vector of estimated coefficients.
Example: In a simple linear regression with observations and parameters (intercept and slope), if :
This estimate feeds directly into the standard errors you'll use for confidence intervals and tests below.
Confidence Intervals for Regression Parameters

Constructing Confidence Intervals
A confidence interval gives you a range of plausible values for a true regression coefficient, based on your data and a chosen confidence level.
The key matrix quantity here is the variance-covariance matrix of the estimated coefficients:
The standard error of a single coefficient comes from the -th diagonal element of this matrix:
To build the confidence interval:
-
Compute from the least squares solution
-
Calculate using the formula above
-
Look up the critical value from the t-distribution with degrees of freedom
-
Form the interval:
Interpreting Confidence Intervals
A confidence interval means that if you repeated the sampling process many times, roughly of the constructed intervals would contain the true parameter value. It does not mean there's a probability that this particular interval contains the true value; the true value is fixed, and the interval is random.
Example: A 95% confidence interval for a slope parameter of tells you that, based on the data, values between 0.5 and 1.2 are plausible for the true slope. Since the interval doesn't include 0, you also have evidence that this predictor has a nonzero effect.
Hypothesis Testing with Matrices

Conducting Hypothesis Tests
Hypothesis tests let you assess whether a particular regression coefficient is significantly different from zero (or some other hypothesized value). The standard setup:
- Null hypothesis: (the predictor has no linear effect)
- Alternative hypothesis: (two-sided), or / (one-sided)
Steps for the t-test of a single coefficient:
- Compute the estimated coefficient
- Compute its standard error
- Calculate the test statistic:
- Compare to the t-distribution with degrees of freedom to obtain a p-value
Evaluating Hypothesis Test Results
Under , the test statistic follows a distribution. You reject when the p-value falls below your chosen significance level .
Example: Suppose and . Then:
With, say, 48 degrees of freedom, a -value of 4.0 gives a two-sided p-value well below 0.05. You'd reject and conclude this coefficient is statistically significant.
Notice the connection to confidence intervals: rejecting at level is equivalent to 0 falling outside the confidence interval for . These are two views of the same inference.
Interpreting Matrix-Based Inference
Understanding Regression Coefficients
Each estimated coefficient represents the expected change in the response variable for a one-unit increase in predictor , holding all other predictors constant. That "holding constant" part is critical in multiple regression because the matrix approach simultaneously accounts for all predictors.
Example: In a model predicting house prices, if the coefficient for square footage is 50, each additional square foot is associated with a $50 increase in price, assuming the other predictors (number of bedrooms, location, etc.) stay the same.
Assessing Model Fit and Precision
Confidence interval width tells you about estimation precision. Narrow intervals mean your data provide a tight estimate of the true coefficient; wide intervals signal more uncertainty, often due to small sample size, high collinearity, or large error variance.
For overall model performance, the coefficient of determination summarizes how much variability the predictors explain:
where is the total sum of squares. An of 0.85 means the predictors account for 85% of the variation in the response.
In multiple regression, adjusted is generally preferred because it penalizes for adding predictors that don't meaningfully improve the fit:
The matrix formulation ties all of this together compactly. The same matrix that gives you also gives you the standard errors, which in turn give you confidence intervals and test statistics. That's the real payoff of the matrix approach: one coherent framework for estimation and inference.