Linear algebra forms the backbone of control theory, providing the tools to represent and analyze linear systems. Matrices, vectors, and linear transformations let you model system dynamics and design controllers in a compact, efficient way.

Key concepts like eigenvalues, inner products, and least squares approximation enable stability analysis, optimization, and system identification. These fundamentals are essential for understanding and applying control theory principles.

Fundamentals of Linear Algebra

Linear algebra deals with linear equations, matrices, and vector spaces. It shows up everywhere in control theory because control systems are often modeled as linear systems, and linear algebra gives you the language and machinery to work with them.

Scalars, Vectors, and Matrices

Scalars are single numbers (e.g., $5$ or $-3.2$ ). Vectors are ordered lists of numbers, typically written as columns, that can represent quantities with both magnitude and direction. Matrices are rectangular arrays of numbers arranged in rows and columns, used to represent linear transformations and systems of linear equations.

Basic operations you need to know:

Scalar multiplication: multiply every entry of a vector or matrix by a single number
Vector/matrix addition: add corresponding elements entry by entry
Matrix multiplication: the entry in row $i$ , column $j$ of the product $AB$ is the dot product of row $i$ of $A$ with column $j$ of $B$

Linear Equations and Systems

A linear equation is one where variables appear only to the first power and aren't multiplied together, such as $ax + by = c$ . A system of linear equations consists of two or more such equations sharing the same variables. The solution is the set of values that satisfies all equations simultaneously.

In matrix form, a system can be written as $Ax = b$ , where $A$ is the coefficient matrix, $x$ is the vector of unknowns, and $b$ is the vector of constants. This compact notation is exactly how control engineers represent the relationship between inputs and outputs of a linear system.

Row Reduction and Echelon Forms

Row reduction is a systematic process for simplifying a matrix using three elementary row operations:

Swap two rows
Multiply a row by a nonzero scalar
Add a multiple of one row to another row

The goal is to reach one of two standard forms:

Row echelon form (REF): each leading entry (first nonzero entry from the left) is to the right of the leading entry in the row above, and all entries below each leading entry are zero
Reduced row echelon form (RREF): same as REF, but every leading entry is 1, and it's the only nonzero entry in its column

RREF is unique for any given matrix, which makes it especially useful. Row reduction is the workhorse method for solving systems of linear equations, finding the rank of a matrix, and computing matrix inverses.

Vector Spaces and Subspaces

A vector space is a set of vectors together with addition and scalar multiplication operations that satisfy certain axioms. In control theory, the state space and input/output spaces of a system are modeled as vector spaces, so understanding their structure is critical.

Vector Space Axioms and Properties

A vector space $V$ over a field $F$ must satisfy these axioms:

Closure under addition: $u + v \in V$ for any $u, v \in V$
Associativity of addition: $(u + v) + w = u + (v + w)$
Commutativity of addition: $u + v = v + u$
Additive identity: a zero vector $0$ exists such that $v + 0 = v$
Additive inverses: for every $v$ , there exists $-v$ such that $v + (-v) = 0$
Closure under scalar multiplication: $av \in V$ for any $a \in F$ , $v \in V$
Distributivity over vector addition: $a(u + v) = au + av$
Distributivity over scalar addition: $(a + b)v = av + bv$
Compatibility of scalar multiplication: $(ab)v = a(bv)$
Scalar multiplicative identity: $1v = v$

Common examples include $\mathbb{R}^n$ (real $n$ -dimensional vectors), $\mathbb{C}^n$ (complex $n$ -dimensional vectors), and the space of polynomials of degree at most $n$ .

Null Space, Column Space, and Row Space

These three subspaces associated with a matrix $A$ reveal a lot about the linear system it represents:

Null space (kernel): the set of all vectors $x$ satisfying $Ax = 0$ . This tells you about the "freedom" in the system, i.e., which inputs produce zero output.
Column space (range): the set of all linear combinations of the columns of $A$ . This represents all possible outputs of the transformation $x \mapsto Ax$ . A system $Ax = b$ has a solution if and only if $b$ lies in the column space of $A$ .
Row space: the set of all linear combinations of the rows of $A$ , which equals the column space of $A^T$ .

These subspaces are connected by the rank-nullity theorem: for an $m \times n$ matrix $A$ ,

$\text{rank}(A) + \text{nullity}(A) = n$

where rank is the dimension of the column space and nullity is the dimension of the null space.

Basis and Dimension of Vector Spaces

A basis of a vector space $V$ is a set of vectors that is both linearly independent and spans $V$ . Every vector in $V$ can be written as a unique linear combination of the basis vectors.

The dimension of $V$ is the number of vectors in any basis. You can think of it as the number of degrees of freedom in the space. For example, the standard basis for $\mathbb{R}^n$ is $\{e_1, e_2, \ldots, e_n\}$ where $e_i$ has a 1 in position $i$ and 0s elsewhere, so $\mathbb{R}^n$ has dimension $n$ .

A vector space with a finite basis is finite-dimensional; otherwise it's infinite-dimensional (like the space of all polynomials with no degree bound).

Scalars, vectors, and matrices, Vectors and scalars - grade 10

Linear Transformations

A linear transformation is a function between vector spaces that preserves addition and scalar multiplication. In control theory, these describe how a system's state and output change in response to inputs.

Definition and Properties of Linear Transformations

A function $T: V \to W$ is a linear transformation if it satisfies two properties for all $u, v \in V$ and $a \in F$ :

Additivity: $T(u + v) = T(u) + T(v)$
Homogeneity: $T(av) = aT(v)$

These two properties together guarantee that $T(0_V) = 0_W$ (the zero vector maps to the zero vector). The composition of two linear transformations is also linear: if $T: V \to W$ and $S: W \to U$ are both linear, then $S \circ T: V \to U$ is linear too.

Familiar examples include matrix multiplication ( $x \mapsto Ax$ ), differentiation of polynomials, and integration over a fixed interval.

Kernel and Range of Linear Transformations

The kernel of $T: V \to W$ is $\ker(T) = \{v \in V : T(v) = 0_W\}$ . It's a subspace of $V$ .
The range of $T$ is $\text{range}(T) = \{T(v) : v \in V\}$ . It's a subspace of $W$ .

The dimension of the kernel is called the nullity, and the dimension of the range is called the rank. These are linked by the rank-nullity theorem:

$\text{rank}(T) + \text{nullity}(T) = \dim(V)$

A transformation is injective (one-to-one) if and only if its kernel is $\{0\}$ , and surjective (onto) if and only if its range equals the entire codomain $W$ .

Matrices of Linear Transformations

Every linear transformation between finite-dimensional vector spaces can be represented by a matrix once you choose bases for the domain and codomain.

If $\{v_1, \ldots, v_n\}$ is a basis for $V$ and $\{w_1, \ldots, w_m\}$ is a basis for $W$ , the matrix $A$ of $T$ is the $m \times n$ matrix whose $j$ -th column is the coordinate vector of $T(v_j)$ expressed in the basis $\{w_1, \ldots, w_m\}$ .

A key consequence: matrix multiplication corresponds to composition of transformations. If $T$ has matrix $A$ and $S$ has matrix $B$ , then $S \circ T$ has matrix $BA$ (note the reversed order).

Eigenvalues and Eigenvectors

Eigenvalues and eigenvectors describe directions along which a linear transformation acts by simple scaling. In control theory, they're central to stability analysis: the eigenvalues of a system matrix determine whether the system's response grows, decays, or oscillates over time.

Characteristic Equation and Eigenvalues

An eigenvector of a square matrix $A$ is a nonzero vector $v$ satisfying:

$Av = \lambda v$

The scalar $\lambda$ is the corresponding eigenvalue. To find eigenvalues, you solve the characteristic equation:

$\det(A - \lambda I) = 0$

The roots of this polynomial are the eigenvalues of $A$ . For an $n \times n$ matrix, the characteristic polynomial has degree $n$ , so there are at most $n$ distinct eigenvalues (though some may be complex or repeated).

Eigenvalues reveal key properties: $A$ is invertible if and only if none of its eigenvalues are zero. For a control system with state matrix $A$ , eigenvalues with negative real parts indicate stability, while positive real parts indicate instability.

Eigenvectors and Eigenspaces

For each eigenvalue $\lambda$ , the set of all eigenvectors corresponding to $\lambda$ , together with the zero vector, forms a subspace called the eigenspace of $\lambda$ . You find it by solving $(A - \lambda I)x = 0$ .

Two types of multiplicity matter here:

Algebraic multiplicity: how many times $\lambda$ appears as a root of the characteristic polynomial
Geometric multiplicity: the dimension of the eigenspace (i.e., the number of linearly independent eigenvectors for $\lambda$ )

The geometric multiplicity is always less than or equal to the algebraic multiplicity. When they differ, the matrix cannot be diagonalized using that eigenvalue alone.

Scalars, vectors, and matrices, Vectors, Scalars, and Coordinate Systems | Physics

Diagonalization of Matrices

A square matrix $A$ is diagonalizable if there exists an invertible matrix $P$ such that:

$P^{-1}AP = D$

where $D$ is a diagonal matrix. The diagonal entries of $D$ are the eigenvalues, and the columns of $P$ are the corresponding eigenvectors.

$A$ is diagonalizable if and only if it has $n$ linearly independent eigenvectors (equivalently, the geometric multiplicity equals the algebraic multiplicity for every eigenvalue).

Why this matters for control theory: diagonalization makes computing matrix powers and matrix exponentials straightforward. If $A = PDP^{-1}$ , then $A^k = PD^kP^{-1}$ and $e^{At} = Pe^{Dt}P^{-1}$ , where $e^{Dt}$ is just the exponential applied to each diagonal entry. This directly gives you the solution to the state equation $\dot{x} = Ax$ .

Inner Product Spaces

An inner product space is a vector space equipped with an inner product, which generalizes the dot product. It lets you measure lengths, angles, and distances between vectors. In control theory, inner products underpin optimization problems, stability analysis via Lyapunov methods, and optimal controller design.

Dot Product and Inner Product

The dot product of two vectors $x = (x_1, \ldots, x_n)$ and $y = (y_1, \ldots, y_n)$ in $\mathbb{R}^n$ is:

$x \cdot y = x_1y_1 + x_2y_2 + \cdots + x_ny_n$

More generally, an inner product on a vector space $V$ over $\mathbb{R}$ or $\mathbb{C}$ is a function $\langle \cdot, \cdot \rangle: V \times V \to F$ satisfying:

Conjugate symmetry: $\langle x, y \rangle = \overline{\langle y, x \rangle}$
Linearity in the second argument: $\langle x, ay + z \rangle = a\langle x, y \rangle + \langle x, z \rangle$
Positive definiteness: $\langle x, x \rangle \geq 0$ , with equality if and only if $x = 0$

The dot product is the standard inner product on $\mathbb{R}^n$ . On $\mathbb{C}^n$ , the standard inner product is $\langle x, y \rangle = \bar{x}_1y_1 + \cdots + \bar{x}_ny_n$ . The norm (length) of a vector is $\|x\| = \sqrt{\langle x, x \rangle}$ .

Orthogonality and Orthonormal Bases

Two vectors are orthogonal if $\langle x, y \rangle = 0$ . A set of vectors is orthogonal if every pair is orthogonal, and orthonormal if additionally each vector has unit length ( $\langle v_i, v_i \rangle = 1$ ).

An orthonormal basis is both a basis and an orthonormal set. Orthonormal bases are especially convenient because coordinates are easy to compute: the coefficient of basis vector $e_i$ for any vector $v$ is simply $\langle v, e_i \rangle$ . No matrix inversion needed.

Gram-Schmidt Orthogonalization Process

The Gram-Schmidt process converts any set of linearly independent vectors into an orthonormal set spanning the same subspace. Here's the procedure:

Given linearly independent vectors $\{v_1, \ldots, v_n\}$ :

Set $e_1 = \dfrac{v_1}{\|v_1\|}$
For each subsequent vector $v_i$ $v_{i}$ ( $i = 2, \ldots, n$ $i = 2, \dots, n$ ):
- Subtract off the projections onto all previous basis vectors: $u_i = v_i - \sum_{j=1}^{i-1} \langle v_i, e_j \rangle \, e_j$
- Normalize: $e_i = \dfrac{u_i}{\|u_i\|}$

Each step removes the component of $v_i$ that lies along the already-constructed orthonormal vectors, leaving only the "new" direction. The result $\{e_1, \ldots, e_n\}$ is orthonormal.

This process is the foundation of the QR decomposition ( $A = QR$ , where $Q$ is orthogonal and $R$ is upper triangular), which is widely used in numerical least squares and controller design.

Least Squares Approximation

Least squares finds the "best fit" when an exact solution doesn't exist. Given an overdetermined system $Ax = b$ (more equations than unknowns, typically no exact solution), least squares minimizes the sum of squared residuals $\|Ax - b\|^2$ . In control theory, this is the go-to method for system identification, parameter estimation, and fitting models to measured data.

Orthogonal Projections and Least Squares

The geometric idea is clean: you're projecting the vector $b$ onto the column space of $A$ . The projection $\hat{b} = A\hat{x}$ is the closest point in the column space to $b$ , and the residual $b - A\hat{x}$ is orthogonal to the column space.

That orthogonality condition gives you the normal equations:

$A^TA\hat{x} = A^Tb$

Any solution $\hat{x}$ to the normal equations minimizes $\|Ax - b\|^2$ .

Normal Equations and Pseudoinverse

The normal equations $A^TAx = A^Tb$ come directly from requiring the residual to be orthogonal to every column of $A$ .

If $A$ has full column rank (its columns are linearly independent), then $A^TA$ is invertible and the unique least squares solution is:

$\hat{x} = (A^TA)^{-1}A^Tb$

The matrix $(A^TA)^{-1}A^T$ is called the left pseudoinverse (or Moore-Penrose pseudoinverse for full column rank matrices), often denoted $A^+$ . It generalizes the concept of a matrix inverse to non-square or rank-deficient matrices.

When $A$ does not have full column rank, $A^TA$ is singular, and the least squares solution is not unique. In that case, the pseudoinverse still selects the minimum-norm solution among all minimizers. This situation arises in control when a system is overparameterized or has redundant inputs.

2,589 studying →