โž—Linear Algebra and Differential Equations

Key Concepts of Orthogonality

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Orthogonality is one of those foundational ideas that shows up everywhere in linear algebra. The concepts here connect directly to solving systems of equations, simplifying matrix computations, data compression, and signal processing. When you understand orthogonality, you can break complex problems into simpler, independent pieces.

Orthogonality isn't just about right angles in 2D or 3D space. It's about independence and efficiency: orthogonal components don't interfere with each other, which makes calculations cleaner and solutions more stable. Know what principle each concept demonstrates and how it connects to practical applications like least squares regression, QR decomposition, and SVD.


Foundational Definitions and the Dot Product

The dot product is your primary tool for detecting orthogonality. When it equals zero, you've found perpendicular vectors.

Definition of Orthogonal Vectors

  • Two vectors are orthogonal if their dot product equals zero. This is your go-to test for orthogonality in any dimension.
  • Orthogonal vectors are linearly independent, meaning neither can be written as a scalar multiple of the other. (Note: this holds for nonzero vectors. The zero vector is technically orthogonal to everything but isn't linearly independent with anything.)
  • Geometric interpretation: orthogonal vectors meet at right angles in Euclidean space, forming the basis for coordinate systems.

Dot Product and Its Relation to Orthogonality

The dot product connects geometry (angles) to algebra (component-wise multiplication).

  • Geometric form: aโ‹…b=โˆฃโˆฃaโˆฃโˆฃโ€‰โˆฃโˆฃbโˆฃโˆฃcosโก(ฮธ)\mathbf{a} \cdot \mathbf{b} = ||\mathbf{a}|| \, ||\mathbf{b}|| \cos(\theta). When ฮธ=90ยฐ\theta = 90ยฐ, cosine is zero, so the dot product is zero.
  • Computational form: aโ‹…b=a1b1+a2b2+โ‹ฏ+anbn\mathbf{a} \cdot \mathbf{b} = a_1b_1 + a_2b_2 + \cdots + a_nb_n. This gives you a quick way to check orthogonality without thinking about angles at all.
  • Sign interpretation: positive means acute angle, negative means obtuse, zero means orthogonal.

Orthonormal Basis

An orthonormal set goes one step further than orthogonal: the vectors are also all unit length.

  • Orthonormal vectors satisfy both uiโ‹…uj=0\mathbf{u}_i \cdot \mathbf{u}_j = 0 for iโ‰ ji \neq j and โˆฃโˆฃuiโˆฃโˆฃ=1||\mathbf{u}_i|| = 1.
  • Coordinate extraction becomes trivial: to find a vector's coefficient along a basis vector, just take the dot product. No division needed.
  • The standard basis {e1,e2,โ€ฆ}\{\mathbf{e}_1, \mathbf{e}_2, \ldots\} is the classic example. This is why Cartesian coordinates work so cleanly.

Compare: Orthogonal basis vs. Orthonormal basis: both have perpendicular vectors, but orthonormal adds the unit length constraint. For exam problems, orthonormal bases make projection calculations simpler since you skip the denominator in the projection formula.


Building Orthogonal Sets: Gram-Schmidt and Projections

These techniques let you construct orthogonal vectors from arbitrary starting points. The core mechanism is subtraction of projections: you remove the component that lies along previously established directions.

Orthogonal Projections

  • Projection formula: projba=aโ‹…bbโ‹…bb\text{proj}_{\mathbf{b}} \mathbf{a} = \frac{\mathbf{a} \cdot \mathbf{b}}{\mathbf{b} \cdot \mathbf{b}} \mathbf{b}. This extracts the component of a\mathbf{a} in the direction of b\mathbf{b}.
  • Minimization property: the projection gives the point on the line through b\mathbf{b} that is closest to a\mathbf{a}.
  • Residual is orthogonal: aโˆ’projba\mathbf{a} - \text{proj}_{\mathbf{b}} \mathbf{a} is perpendicular to b\mathbf{b}. This fact is what drives Gram-Schmidt.

Gram-Schmidt Orthogonalization Process

Gram-Schmidt takes any set of linearly independent vectors and produces orthogonal vectors spanning the same subspace. Here's how it works:

  1. Keep the first vector as-is: set v1=u1\mathbf{v}_1 = \mathbf{u}_1.
  2. For each subsequent vector uk\mathbf{u}_k, subtract its projections onto all previously computed vectors: vk=ukโˆ’โˆ‘j=1kโˆ’1projvjuk\mathbf{v}_k = \mathbf{u}_k - \sum_{j=1}^{k-1} \text{proj}_{\mathbf{v}_j} \mathbf{u}_k
  3. Normalize at the end to get an orthonormal set: divide each vk\mathbf{v}_k by โˆฃโˆฃvkโˆฃโˆฃ||\mathbf{v}_k||.

Each subtraction removes the part of the new vector that "overlaps" with the directions you've already established, leaving only the genuinely new component.

Orthogonal Complements

  • Definition: VโŠฅV^\perp contains all vectors orthogonal to every vector in subspace VV.
  • Dimension relationship: dimโก(V)+dimโก(VโŠฅ)=n\dim(V) + \dim(V^\perp) = n for subspaces of Rn\mathbb{R}^n.
  • Direct sum property: VโŠ•VโŠฅ=RnV \oplus V^\perp = \mathbb{R}^n. Any vector decomposes uniquely into a component in VV and a component in VโŠฅV^\perp.

Compare: Orthogonal projection onto a vector vs. onto a subspace: same principle, but subspace projection requires projecting onto each basis vector of the subspace and summing the results. If an exam problem asks for "closest point in a subspace," you're doing orthogonal projection.


Orthogonal Matrices and Transformations

When orthogonality is built into a matrix's structure, you get transformations that preserve geometry. These matrices satisfy ATA=IA^T A = I, meaning the transpose is the inverse.

Orthogonal Matrices

  • Defining property: AT=Aโˆ’1A^T = A^{-1}, equivalently ATA=AAT=IA^T A = AA^T = I.
  • Column (and row) structure: the columns form an orthonormal set. This is how you verify a matrix is orthogonal: check that columns are pairwise orthogonal and each has unit length.
  • Determinant constraint: detโก(A)=ยฑ1\det(A) = \pm 1 always. A determinant of +1 corresponds to a rotation; -1 corresponds to a reflection.

Orthogonal Transformations

  • Length preservation: โˆฃโˆฃAxโˆฃโˆฃ=โˆฃโˆฃxโˆฃโˆฃ||A\mathbf{x}|| = ||\mathbf{x}|| for all vectors. Distances don't change.
  • Angle preservation: the angle between any two vectors remains unchanged after transformation.
  • Geometric examples: rotations and reflections in Rn\mathbb{R}^n are the classic orthogonal transformations.

Orthogonal Diagonalization

  • Applies to symmetric matrices: if A=ATA = A^T, then A=QDQTA = QDQ^T where QQ is orthogonal and DD is diagonal.
  • Spectral theorem guarantee: symmetric matrices always have real eigenvalues, and eigenvectors corresponding to distinct eigenvalues are automatically orthogonal. The columns of QQ are these orthonormal eigenvectors.
  • Computational advantage: powers and functions of AA become straightforward. Since An=QDnQTA^n = QD^nQ^T, you just raise the diagonal entries to the nnth power.

Compare: General diagonalization vs. Orthogonal diagonalization: any diagonalizable matrix has A=PDPโˆ’1A = PDP^{-1}, but only symmetric matrices guarantee PP is orthogonal. This matters because computing Qโˆ’1=QTQ^{-1} = Q^T is just a transpose, which is computationally cheap compared to a general matrix inverse.


Matrix Decompositions Using Orthogonality

These decompositions leverage orthogonality to solve practical problems efficiently. Orthogonal factors are numerically stable and easy to invert, which is why they show up in so many algorithms.

QR Decomposition

  • Factorization: A=QRA = QR where QQ has orthonormal columns and RR is upper triangular.
  • Construction method: apply Gram-Schmidt to the columns of AA. The orthonormal results form QQ, and the entries of RR record the dot products computed during the process.
  • Applications: solving Ax=bA\mathbf{x} = \mathbf{b} becomes Rx=QTbR\mathbf{x} = Q^T\mathbf{b}, which you solve by back-substitution since RR is upper triangular.

Singular Value Decomposition (SVD)

  • Factorization: A=UฮฃVTA = U\Sigma V^T where UU and VV are orthogonal and ฮฃ\Sigma is diagonal with nonnegative entries called singular values.
  • Works for ANY matrix, not just square or symmetric. This is its major advantage over eigenvalue decomposition.
  • Reveals matrix structure: the rank of AA equals the number of nonzero singular values. The best rank-kk approximation of AA uses the top kk singular values (this is the Eckart-Young theorem, which underlies data compression techniques like PCA).

Orthogonality in Least Squares Problems

When Ax=bA\mathbf{x} = \mathbf{b} has no exact solution (the system is overdetermined), you find the best approximate solution x^\hat{\mathbf{x}} using orthogonality.

  • The residual bโˆ’Ax^\mathbf{b} - A\hat{\mathbf{x}} is orthogonal to the column space of AA. This is the geometric condition that defines the least squares solution.
  • Normal equations: ATAx^=ATbA^T A \hat{\mathbf{x}} = A^T \mathbf{b} arise directly from multiplying both sides of the orthogonality condition by ATA^T.
  • Best fit interpretation: orthogonality ensures minimum squared error. No other solution produces a smaller residual.

Compare: QR decomposition vs. SVD: both use orthogonal matrices, but QR factors into orthogonal ร— triangular while SVD factors into orthogonal ร— diagonal ร— orthogonal. SVD gives more information (singular values reveal rank and conditioning) but requires more computation. For solving Ax=bA\mathbf{x} = \mathbf{b}, QR is usually preferred; for analyzing matrix structure or handling rank-deficient systems, SVD is the better tool.


Orthogonality in Function Spaces

The same principles extend beyond finite-dimensional vectors to functions. The inner product becomes an integral, and "orthogonal" means the integral of the product equals zero.

Orthogonality in Function Spaces

  • Inner product definition: โŸจf,gโŸฉ=โˆซabf(x)g(x)โ€‰dx\langle f, g \rangle = \int_a^b f(x)g(x) \, dx. Functions are orthogonal when this equals zero.
  • Weight functions: sometimes โŸจf,gโŸฉ=โˆซabw(x)f(x)g(x)โ€‰dx\langle f, g \rangle = \int_a^b w(x)f(x)g(x) \, dx with weight w(x)>0w(x) > 0. The weight function changes which functions count as orthogonal.
  • Infinite-dimensional analogy: function spaces work like Rn\mathbb{R}^n but with infinitely many "directions." Projections, orthogonal complements, and Gram-Schmidt all carry over.

Orthogonal Polynomials

  • Definition: polynomials {P0,P1,P2,โ€ฆ}\{P_0, P_1, P_2, \ldots\} satisfying โŸจPm,PnโŸฉ=0\langle P_m, P_n \rangle = 0 for mโ‰ nm \neq n with respect to a given inner product.
  • Famous families: Legendre polynomials use weight 1 on [โˆ’1,1][-1,1]; Chebyshev polynomials use weight 1/1โˆ’x21/\sqrt{1-x^2} on [โˆ’1,1][-1,1]. Each family is tailored to a specific type of problem.
  • Applications: numerical integration (Gaussian quadrature), polynomial approximation, and solving differential equations.

Fourier Series and Orthogonal Functions

  • Basis functions: {1,cosโก(x),sinโก(x),cosโก(2x),sinโก(2x),โ€ฆ}\{1, \cos(x), \sin(x), \cos(2x), \sin(2x), \ldots\} are orthogonal on [0,2ฯ€][0, 2\pi].
  • Coefficient extraction: an=โŸจf,cosโก(nx)โŸฉโŸจcosโก(nx),cosโก(nx)โŸฉa_n = \frac{\langle f, \cos(nx) \rangle}{\langle \cos(nx), \cos(nx) \rangle}. Orthogonality is what makes this formula work: each coefficient depends only on the corresponding basis function, not on any of the others.
  • Signal processing foundation: decomposing signals into frequency components relies entirely on this orthogonality.

Compare: Orthogonal polynomials vs. Fourier basis: both are orthogonal function sets, but polynomials are better for approximation on finite intervals while Fourier functions excel at representing periodic signals. Choose based on your problem's structure.


Quick Reference Table

ConceptKey Facts
Testing orthogonalityDot product = 0; for matrices, check ATA=IA^T A = I
Building orthogonal setsGram-Schmidt process, QR decomposition
Orthogonal matrix propertiesAT=Aโˆ’1A^T = A^{-1}, preserves lengths/angles, detโก=ยฑ1\det = \pm 1
Matrix decompositionsQR (solving systems), SVD (any matrix), orthogonal diagonalization (symmetric)
Projection applicationsLeast squares, closest point in subspace, Gram-Schmidt
Function space orthogonalityFourier series, Legendre polynomials, Chebyshev polynomials
Subspace relationshipsVโŠ•VโŠฅ=RnV \oplus V^\perp = \mathbb{R}^n, dimโก(V)+dimโก(VโŠฅ)=n\dim(V) + \dim(V^\perp) = n

Self-Check Questions

  1. If you're given three linearly independent vectors and asked to produce an orthonormal basis for their span, which process do you use, and what's the key operation at each step?

  2. Compare QR decomposition and SVD: what types of matrices can each handle, and when would you choose one over the other for solving Ax=bA\mathbf{x} = \mathbf{b}?

  3. A symmetric matrix AA can be orthogonally diagonalized as A=QDQTA = QDQ^T. What special property do the columns of QQ have, and why does this make computing A100A^{100} straightforward?

  4. In a least squares problem, the residual vector is orthogonal to what? Explain why this orthogonality condition guarantees the minimum error solution.

  5. How does the concept of orthogonality in Rn\mathbb{R}^n (dot product = 0) generalize to function spaces, and why does this make Fourier coefficient calculation possible?