Matrix notation and operations provide a compact, powerful way to represent and manipulate the data structures that underlie linear regression. Instead of writing out individual equations for every observation, you can express an entire regression model in a single matrix equation. This unit builds the foundation you'll need to derive estimators, compute predictions, and analyze model properties using matrix algebra.

Matrix basics and notation

Matrix fundamentals

A matrix is a rectangular array of numbers arranged in rows and columns, enclosed by square brackets. You denote a matrix as $A = [a_{ij}]$ , where $a_{ij}$ represents the element sitting in the $i$ -th row and $j$ -th column.

The size (or dimension) of a matrix is written as $m \times n$ , where $m$ is the number of rows and $n$ is the number of columns. A $3 \times 4$ matrix, for example, has 3 rows and 4 columns, giving you 12 elements total.

Each element is identified by its row and column indices. So $a_{23}$ refers to the element in the 2nd row and 3rd column of matrix $A$ .

Vectors

A vector is a special case of a matrix with only one dimension of length greater than 1:

A row vector has dimensions $1 \times n$ : $\vec{v} = (v_1, v_2, \ldots, v_n)$
A column vector has dimensions $m \times 1$ : $\vec{v} = [v_1, v_2, \ldots, v_m]^T$

In regression, you'll encounter column vectors constantly. The response variable $\mathbf{y}$ is stored as an $n \times 1$ column vector, and each estimated coefficient lives inside a parameter vector $\boldsymbol{\beta}$ .

Applications of matrices and vectors

Matrices and vectors let you represent systems of linear equations in a single compact expression. For example, the system

$2x + 3y = 5$ $4x - y = 3$

can be written as:

$\begin{bmatrix} 2 & 3 \\ 4 & -1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 5 \\ 3 \end{bmatrix}$

This is exactly the form $A\mathbf{x} = \mathbf{b}$ that you'll see again when the regression model is written as $\mathbf{y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon}$ . The coefficient matrix holds your predictor data, the vector $\boldsymbol{\beta}$ holds the unknowns, and $\mathbf{y}$ holds the responses.

Matrix operations

Matrix fundamentals, Matrix (mathematics) - Wikipedia

Addition, subtraction, and scalar multiplication

Matrix addition and subtraction require the matrices to be the same size. You simply add or subtract corresponding elements:

$c_{ij} = a_{ij} + b_{ij}$ for all $i$ and $j$

$\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}$

Scalar multiplication means multiplying every element by a single number:

$2 \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 6 & 8 \end{bmatrix}$

Matrix multiplication

Matrix multiplication between $A$ ( $m \times n$ ) and $B$ ( $n \times p$ ) is only defined when the number of columns in $A$ equals the number of rows in $B$ . The result $C$ has dimensions $m \times p$ .

To compute each element $c_{ij}$ , take the dot product of the $i$ -th row of $A$ with the $j$ -th column of $B$ :

$c_{ij} = \sum_{k=1}^n a_{ik}b_{kj}$

Here's a worked example, step by step:

$\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}$

Row 1 of $A$ dotted with Column 1 of $B$ : $(1)(5) + (2)(7) = 19$
Row 1 of $A$ dotted with Column 2 of $B$ : $(1)(6) + (2)(8) = 22$
Row 2 of $A$ dotted with Column 1 of $B$ : $(3)(5) + (4)(7) = 43$
Row 2 of $A$ dotted with Column 2 of $B$ : $(3)(6) + (4)(8) = 50$

$C = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}$

Solving linear systems

Augmented matrix and Gaussian elimination

A system of linear equations can be represented as an augmented matrix, which appends the constant terms to the right of the coefficient matrix:

$\left[\begin{array}{cc|c} 2 & 3 & 5 \\ 4 & -1 & 3 \end{array}\right]$

Gaussian elimination solves the system by applying elementary row operations to transform this augmented matrix into row echelon form (upper triangular, with leading ones and zeros below each pivot). The three permitted row operations are:

Swap two rows
Multiply a row by a non-zero constant
Add a multiple of one row to another row

These operations never change the solution set. Once you reach row echelon form, use back-substitution to solve for each variable, starting from the last row and working upward.

Matrix fundamentals, Matrix: Values arranged in rows and columns.

Cramer's rule

Cramer's rule solves a system of $n$ equations with $n$ unknowns using determinants. It applies only when the coefficient matrix $A$ is square and invertible (i.e., $\det(A) \neq 0$ ).

The solution for the $i$ -th variable is:

$x_i = \frac{\det(A_i)}{\det(A)}$

where $A_i$ is the matrix formed by replacing the $i$ -th column of $A$ with the constant vector.

For the system $2x + 3y = 5$ , $4x - y = 3$ :

$\det(A) = \det\begin{bmatrix} 2 & 3 \\ 4 & -1 \end{bmatrix} = (2)(-1) - (3)(4) = -14$

$x = \frac{\det\begin{bmatrix} 5 & 3 \\ 3 & -1 \end{bmatrix}}{\det(A)} = \frac{(5)(-1) - (3)(3)}{-14} = \frac{-14}{-14} = 1$

$y = \frac{\det\begin{bmatrix} 2 & 5 \\ 4 & 3 \end{bmatrix}}{\det(A)} = \frac{(2)(3) - (5)(4)}{-14} = \frac{-14}{-14} = 1$

Note on the original guide's example: The determinant of $\begin{bmatrix} 2 & 3 \\ 4 & -1 \end{bmatrix}$ is $(2)(-1) - (3)(4) = -14$ , not $-11$ . The correct solution to this system is $x = 1, y = 1$ , which you can verify by substituting back: $2(1) + 3(1) = 5$ and $4(1) - 1 = 3$ .

Properties of matrix operations

Commutativity and associativity

Matrix addition is commutative: $A + B = B + A$
Matrix multiplication is associative: $(AB)C = A(BC)$
Matrix multiplication is NOT commutative: $AB \neq BA$ in general

This non-commutativity matters a lot. With $A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ and $B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}$ :

$AB = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}$ but $BA = \begin{bmatrix} 23 & 34 \\ 31 & 46 \end{bmatrix}$

The order of multiplication always matters, so be careful when rearranging matrix expressions in regression derivations.

Identity matrix and inverse matrix

The identity matrix $I$ is a square matrix with ones on the main diagonal and zeros everywhere else. It acts like the number 1 in scalar multiplication:

$AI = IA = A$

For example, the $2 \times 2$ identity matrix is $\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$ .

A square matrix $A$ is invertible if there exists a matrix $A^{-1}$ such that $AA^{-1} = A^{-1}A = I$ . Not all square matrices have inverses; a matrix is invertible if and only if its determinant is non-zero.

The inverse of $\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ is $\begin{bmatrix} -2 & 1 \\ \frac{3}{2} & -\frac{1}{2} \end{bmatrix}$ . You can verify this by multiplying the two matrices together and confirming you get $I$ .

In regression, the inverse shows up directly in the OLS estimator: $\hat{\boldsymbol{\beta}} = (X^TX)^{-1}X^T\mathbf{y}$ . If $X^TX$ isn't invertible, the model has a problem (typically perfect multicollinearity).

Transpose and determinant

The transpose of a matrix $A$ , written $A^T$ , flips the matrix over its main diagonal: rows become columns and columns become rows.

$A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \implies A^T = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}$

A key property for regression derivations: $(AB)^T = B^T A^T$ . Notice the order reverses, just like with inverses.

The determinant of a square matrix, written $\det(A)$ or $|A|$ , is a scalar that tells you two critical things:

Invertibility: $A$ is invertible if and only if $\det(A) \neq 0$
Linear independence: The rows (or columns) of $A$ are linearly independent if and only if $\det(A) \neq 0$

For a $2 \times 2$ matrix, the formula is straightforward:

$\det\begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc$

For example, $\det\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = (1)(4) - (2)(3) = -2$ . Since this is non-zero, the matrix is invertible.

2,589 studying →