Matrix notation and operations provide a compact, powerful way to represent and manipulate the data structures that underlie linear regression. Instead of writing out individual equations for every observation, you can express an entire regression model in a single matrix equation. This unit builds the foundation you'll need to derive estimators, compute predictions, and analyze model properties using matrix algebra.
Matrix basics and notation
Matrix fundamentals
A matrix is a rectangular array of numbers arranged in rows and columns, enclosed by square brackets. You denote a matrix as , where represents the element sitting in the -th row and -th column.
The size (or dimension) of a matrix is written as , where is the number of rows and is the number of columns. A matrix, for example, has 3 rows and 4 columns, giving you 12 elements total.
Each element is identified by its row and column indices. So refers to the element in the 2nd row and 3rd column of matrix .
Vectors
A vector is a special case of a matrix with only one dimension of length greater than 1:
- A row vector has dimensions :
- A column vector has dimensions :
In regression, you'll encounter column vectors constantly. The response variable is stored as an column vector, and each estimated coefficient lives inside a parameter vector .
Applications of matrices and vectors
Matrices and vectors let you represent systems of linear equations in a single compact expression. For example, the system
can be written as:
This is exactly the form that you'll see again when the regression model is written as . The coefficient matrix holds your predictor data, the vector holds the unknowns, and holds the responses.
Matrix operations

Addition, subtraction, and scalar multiplication
Matrix addition and subtraction require the matrices to be the same size. You simply add or subtract corresponding elements:
for all and
Scalar multiplication means multiplying every element by a single number:
Matrix multiplication
Matrix multiplication between () and () is only defined when the number of columns in equals the number of rows in . The result has dimensions .
To compute each element , take the dot product of the -th row of with the -th column of :
Here's a worked example, step by step:
- Row 1 of dotted with Column 1 of :
- Row 1 of dotted with Column 2 of :
- Row 2 of dotted with Column 1 of :
- Row 2 of dotted with Column 2 of :
Solving linear systems
Augmented matrix and Gaussian elimination
A system of linear equations can be represented as an augmented matrix, which appends the constant terms to the right of the coefficient matrix:
Gaussian elimination solves the system by applying elementary row operations to transform this augmented matrix into row echelon form (upper triangular, with leading ones and zeros below each pivot). The three permitted row operations are:
- Swap two rows
- Multiply a row by a non-zero constant
- Add a multiple of one row to another row
These operations never change the solution set. Once you reach row echelon form, use back-substitution to solve for each variable, starting from the last row and working upward.

Cramer's rule
Cramer's rule solves a system of equations with unknowns using determinants. It applies only when the coefficient matrix is square and invertible (i.e., ).
The solution for the -th variable is:
where is the matrix formed by replacing the -th column of with the constant vector.
For the system , :
Note on the original guide's example: The determinant of is , not . The correct solution to this system is , which you can verify by substituting back: and .
Properties of matrix operations
Commutativity and associativity
- Matrix addition is commutative:
- Matrix multiplication is associative:
- Matrix multiplication is NOT commutative: in general
This non-commutativity matters a lot. With and :
but
The order of multiplication always matters, so be careful when rearranging matrix expressions in regression derivations.
Identity matrix and inverse matrix
The identity matrix is a square matrix with ones on the main diagonal and zeros everywhere else. It acts like the number 1 in scalar multiplication:
For example, the identity matrix is .
A square matrix is invertible if there exists a matrix such that . Not all square matrices have inverses; a matrix is invertible if and only if its determinant is non-zero.
The inverse of is . You can verify this by multiplying the two matrices together and confirming you get .
In regression, the inverse shows up directly in the OLS estimator: . If isn't invertible, the model has a problem (typically perfect multicollinearity).
Transpose and determinant
The transpose of a matrix , written , flips the matrix over its main diagonal: rows become columns and columns become rows.
A key property for regression derivations: . Notice the order reverses, just like with inverses.
The determinant of a square matrix, written or , is a scalar that tells you two critical things:
- Invertibility: is invertible if and only if
- Linear independence: The rows (or columns) of are linearly independent if and only if
For a matrix, the formula is straightforward:
For example, . Since this is non-zero, the matrix is invertible.