Fiveable

🥖Linear Modeling Theory Unit 5 Review

QR code for Linear Modeling Theory practice questions

5.1 Matrix Notation and Operations

5.1 Matrix Notation and Operations

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🥖Linear Modeling Theory
Unit & Topic Study Guides

Matrix notation and operations provide a compact, powerful way to represent and manipulate the data structures that underlie linear regression. Instead of writing out individual equations for every observation, you can express an entire regression model in a single matrix equation. This unit builds the foundation you'll need to derive estimators, compute predictions, and analyze model properties using matrix algebra.

Matrix basics and notation

Matrix fundamentals

A matrix is a rectangular array of numbers arranged in rows and columns, enclosed by square brackets. You denote a matrix as A=[aij]A = [a_{ij}], where aija_{ij} represents the element sitting in the ii-th row and jj-th column.

The size (or dimension) of a matrix is written as m×nm \times n, where mm is the number of rows and nn is the number of columns. A 3×43 \times 4 matrix, for example, has 3 rows and 4 columns, giving you 12 elements total.

Each element is identified by its row and column indices. So a23a_{23} refers to the element in the 2nd row and 3rd column of matrix AA.

Vectors

A vector is a special case of a matrix with only one dimension of length greater than 1:

  • A row vector has dimensions 1×n1 \times n: v=(v1,v2,,vn)\vec{v} = (v_1, v_2, \ldots, v_n)
  • A column vector has dimensions m×1m \times 1: v=[v1,v2,,vm]T\vec{v} = [v_1, v_2, \ldots, v_m]^T

In regression, you'll encounter column vectors constantly. The response variable y\mathbf{y} is stored as an n×1n \times 1 column vector, and each estimated coefficient lives inside a parameter vector β\boldsymbol{\beta}.

Applications of matrices and vectors

Matrices and vectors let you represent systems of linear equations in a single compact expression. For example, the system

2x+3y=52x + 3y = 5 4xy=34x - y = 3

can be written as:

[2341][xy]=[53]\begin{bmatrix} 2 & 3 \\ 4 & -1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 5 \\ 3 \end{bmatrix}

This is exactly the form Ax=bA\mathbf{x} = \mathbf{b} that you'll see again when the regression model is written as y=Xβ+ε\mathbf{y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon}. The coefficient matrix holds your predictor data, the vector β\boldsymbol{\beta} holds the unknowns, and y\mathbf{y} holds the responses.

Matrix operations

Matrix fundamentals, Matrix (mathematics) - Wikipedia

Addition, subtraction, and scalar multiplication

Matrix addition and subtraction require the matrices to be the same size. You simply add or subtract corresponding elements:

cij=aij+bijc_{ij} = a_{ij} + b_{ij} for all ii and jj

[1234]+[5678]=[681012]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}

Scalar multiplication means multiplying every element by a single number:

2[1234]=[2468]2 \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 6 & 8 \end{bmatrix}

Matrix multiplication

Matrix multiplication between AA (m×nm \times n) and BB (n×pn \times p) is only defined when the number of columns in AA equals the number of rows in BB. The result CC has dimensions m×pm \times p.

To compute each element cijc_{ij}, take the dot product of the ii-th row of AA with the jj-th column of BB:

cij=k=1naikbkjc_{ij} = \sum_{k=1}^n a_{ik}b_{kj}

Here's a worked example, step by step:

[1234][5678]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}

  1. Row 1 of AA dotted with Column 1 of BB: (1)(5)+(2)(7)=19(1)(5) + (2)(7) = 19
  2. Row 1 of AA dotted with Column 2 of BB: (1)(6)+(2)(8)=22(1)(6) + (2)(8) = 22
  3. Row 2 of AA dotted with Column 1 of BB: (3)(5)+(4)(7)=43(3)(5) + (4)(7) = 43
  4. Row 2 of AA dotted with Column 2 of BB: (3)(6)+(4)(8)=50(3)(6) + (4)(8) = 50

C=[19224350]C = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}

Solving linear systems

Augmented matrix and Gaussian elimination

A system of linear equations can be represented as an augmented matrix, which appends the constant terms to the right of the coefficient matrix:

[235413]\left[\begin{array}{cc|c} 2 & 3 & 5 \\ 4 & -1 & 3 \end{array}\right]

Gaussian elimination solves the system by applying elementary row operations to transform this augmented matrix into row echelon form (upper triangular, with leading ones and zeros below each pivot). The three permitted row operations are:

  1. Swap two rows
  2. Multiply a row by a non-zero constant
  3. Add a multiple of one row to another row

These operations never change the solution set. Once you reach row echelon form, use back-substitution to solve for each variable, starting from the last row and working upward.

Matrix fundamentals, Matrix: Values arranged in rows and columns.

Cramer's rule

Cramer's rule solves a system of nn equations with nn unknowns using determinants. It applies only when the coefficient matrix AA is square and invertible (i.e., det(A)0\det(A) \neq 0).

The solution for the ii-th variable is:

xi=det(Ai)det(A)x_i = \frac{\det(A_i)}{\det(A)}

where AiA_i is the matrix formed by replacing the ii-th column of AA with the constant vector.

For the system 2x+3y=52x + 3y = 5, 4xy=34x - y = 3:

det(A)=det[2341]=(2)(1)(3)(4)=14\det(A) = \det\begin{bmatrix} 2 & 3 \\ 4 & -1 \end{bmatrix} = (2)(-1) - (3)(4) = -14

x=det[5331]det(A)=(5)(1)(3)(3)14=1414=1x = \frac{\det\begin{bmatrix} 5 & 3 \\ 3 & -1 \end{bmatrix}}{\det(A)} = \frac{(5)(-1) - (3)(3)}{-14} = \frac{-14}{-14} = 1

y=det[2543]det(A)=(2)(3)(5)(4)14=1414=1y = \frac{\det\begin{bmatrix} 2 & 5 \\ 4 & 3 \end{bmatrix}}{\det(A)} = \frac{(2)(3) - (5)(4)}{-14} = \frac{-14}{-14} = 1

Note on the original guide's example: The determinant of [2341]\begin{bmatrix} 2 & 3 \\ 4 & -1 \end{bmatrix} is (2)(1)(3)(4)=14(2)(-1) - (3)(4) = -14, not 11-11. The correct solution to this system is x=1,y=1x = 1, y = 1, which you can verify by substituting back: 2(1)+3(1)=52(1) + 3(1) = 5 and 4(1)1=34(1) - 1 = 3.

Properties of matrix operations

Commutativity and associativity

  • Matrix addition is commutative: A+B=B+AA + B = B + A
  • Matrix multiplication is associative: (AB)C=A(BC)(AB)C = A(BC)
  • Matrix multiplication is NOT commutative: ABBAAB \neq BA in general

This non-commutativity matters a lot. With A=[1234]A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} and B=[5678]B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}:

AB=[19224350]AB = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} but BA=[23343146]BA = \begin{bmatrix} 23 & 34 \\ 31 & 46 \end{bmatrix}

The order of multiplication always matters, so be careful when rearranging matrix expressions in regression derivations.

Identity matrix and inverse matrix

The identity matrix II is a square matrix with ones on the main diagonal and zeros everywhere else. It acts like the number 1 in scalar multiplication:

AI=IA=AAI = IA = A

For example, the 2×22 \times 2 identity matrix is [1001]\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.

A square matrix AA is invertible if there exists a matrix A1A^{-1} such that AA1=A1A=IAA^{-1} = A^{-1}A = I. Not all square matrices have inverses; a matrix is invertible if and only if its determinant is non-zero.

The inverse of [1234]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} is [213212]\begin{bmatrix} -2 & 1 \\ \frac{3}{2} & -\frac{1}{2} \end{bmatrix}. You can verify this by multiplying the two matrices together and confirming you get II.

In regression, the inverse shows up directly in the OLS estimator: β^=(XTX)1XTy\hat{\boldsymbol{\beta}} = (X^TX)^{-1}X^T\mathbf{y}. If XTXX^TX isn't invertible, the model has a problem (typically perfect multicollinearity).

Transpose and determinant

The transpose of a matrix AA, written ATA^T, flips the matrix over its main diagonal: rows become columns and columns become rows.

A=[1234]    AT=[1324]A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \implies A^T = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}

A key property for regression derivations: (AB)T=BTAT(AB)^T = B^T A^T. Notice the order reverses, just like with inverses.

The determinant of a square matrix, written det(A)\det(A) or A|A|, is a scalar that tells you two critical things:

  • Invertibility: AA is invertible if and only if det(A)0\det(A) \neq 0
  • Linear independence: The rows (or columns) of AA are linearly independent if and only if det(A)0\det(A) \neq 0

For a 2×22 \times 2 matrix, the formula is straightforward:

det[abcd]=adbc\det\begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc

For example, det[1234]=(1)(4)(2)(3)=2\det\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = (1)(4) - (2)(3) = -2. Since this is non-zero, the matrix is invertible.