Linear Algebra for Data Science Unit 5 – Inner Products & Orthogonality in Linear Algebra

Inner products and orthogonality are fundamental concepts in linear algebra that extend the idea of dot products to abstract vector spaces. These concepts provide a framework for understanding geometric relationships between vectors, including angles and lengths. Orthogonal projections and the Gram-Schmidt process are key tools for decomposing vectors and constructing orthonormal bases. These techniques have wide-ranging applications in data science, including dimensionality reduction, feature extraction, and signal processing, making them essential for working with high-dimensional data and machine learning algorithms.

Key Concepts

  • Inner products generalize the notion of the dot product to abstract vector spaces
  • Properties of inner products include symmetry, linearity, and positive definiteness
  • Geometric interpretation of inner products relates to angles and lengths of vectors
  • Orthogonality describes vectors that are perpendicular to each other with respect to an inner product
  • Orthogonal projections decompose a vector into components parallel and orthogonal to a subspace
  • Gram-Schmidt process constructs an orthonormal basis from a linearly independent set of vectors
  • Applications in data science include dimensionality reduction, feature extraction, and signal processing
  • Understanding these concepts is crucial for working with high-dimensional data and machine learning algorithms

Definition and Properties of Inner Products

  • An inner product is a function that takes two vectors as input and returns a scalar value
  • Denoted as u,v\langle \mathbf{u}, \mathbf{v} \rangle for vectors u\mathbf{u} and v\mathbf{v}
  • Satisfies the following properties for all vectors u,v,w\mathbf{u}, \mathbf{v}, \mathbf{w} and scalar cc:
    • Symmetry: u,v=v,u\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle
    • Linearity: cu,v=cu,v\langle c\mathbf{u}, \mathbf{v} \rangle = c\langle \mathbf{u}, \mathbf{v} \rangle and u+v,w=u,w+v,w\langle \mathbf{u} + \mathbf{v}, \mathbf{w} \rangle = \langle \mathbf{u}, \mathbf{w} \rangle + \langle \mathbf{v}, \mathbf{w} \rangle
    • Positive definiteness: u,u0\langle \mathbf{u}, \mathbf{u} \rangle \geq 0 and u,u=0\langle \mathbf{u}, \mathbf{u} \rangle = 0 if and only if u=0\mathbf{u} = \mathbf{0}
  • The standard inner product (dot product) in Rn\mathbb{R}^n is defined as u,v=i=1nuivi\langle \mathbf{u}, \mathbf{v} \rangle = \sum_{i=1}^n u_i v_i
  • Other examples of inner products include weighted inner products and inner products on function spaces

Geometric Interpretation

  • The inner product of two vectors is related to their lengths and the angle between them
  • u,v=uvcosθ\langle \mathbf{u}, \mathbf{v} \rangle = \|\mathbf{u}\| \|\mathbf{v}\| \cos \theta, where θ\theta is the angle between u\mathbf{u} and v\mathbf{v}
  • If u,v=0\langle \mathbf{u}, \mathbf{v} \rangle = 0, then u\mathbf{u} and v\mathbf{v} are orthogonal (perpendicular)
  • The inner product can be used to compute the length (norm) of a vector: u=u,u\|\mathbf{u}\| = \sqrt{\langle \mathbf{u}, \mathbf{u} \rangle}
  • The Cauchy-Schwarz inequality states that u,vuv|\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\| \|\mathbf{v}\|, with equality if and only if u\mathbf{u} and v\mathbf{v} are linearly dependent
  • Geometric interpretation helps to visualize and understand the relationships between vectors in high-dimensional spaces

Orthogonality and Orthogonal Vectors

  • Two vectors u\mathbf{u} and v\mathbf{v} are orthogonal if their inner product is zero: u,v=0\langle \mathbf{u}, \mathbf{v} \rangle = 0
  • Orthogonal vectors are perpendicular to each other in the vector space
  • A set of vectors {v1,v2,,vn}\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n\} is orthogonal if vi,vj=0\langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0 for all iji \neq j
  • An orthogonal set of vectors is linearly independent
  • If the vectors in an orthogonal set are also unit vectors (length 1), the set is called orthonormal
  • Orthogonal and orthonormal bases have useful properties for representing and manipulating vectors in a vector space
    • Any vector can be uniquely expressed as a linear combination of the basis vectors
    • The coefficients in the linear combination are easily computed using inner products

Orthogonal Projections

  • Orthogonal projection decomposes a vector into components parallel and orthogonal to a subspace
  • The projection of a vector u\mathbf{u} onto a subspace WW is the closest point in WW to u\mathbf{u}
  • Denoted as projW(u)\text{proj}_W(\mathbf{u}), the orthogonal projection is characterized by:
    • projW(u)W\text{proj}_W(\mathbf{u}) \in W
    • uprojW(u)\mathbf{u} - \text{proj}_W(\mathbf{u}) is orthogonal to every vector in WW
  • If {w1,w2,,wk}\{\mathbf{w}_1, \mathbf{w}_2, \ldots, \mathbf{w}_k\} is an orthonormal basis for WW, then projW(u)=i=1ku,wiwi\text{proj}_W(\mathbf{u}) = \sum_{i=1}^k \langle \mathbf{u}, \mathbf{w}_i \rangle \mathbf{w}_i
  • Orthogonal projections are used in least squares approximation, signal denoising, and dimensionality reduction techniques (PCA)
  • The orthogonal complement of a subspace WW, denoted WW^\perp, is the set of all vectors orthogonal to every vector in WW
    • u=projW(u)+projW(u)\mathbf{u} = \text{proj}_W(\mathbf{u}) + \text{proj}_{W^\perp}(\mathbf{u}) for any vector u\mathbf{u}

Gram-Schmidt Process

  • The Gram-Schmidt process constructs an orthonormal basis from a linearly independent set of vectors
  • Given a linearly independent set {v1,v2,,vn}\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n\}, the process iteratively constructs an orthonormal set {u1,u2,,un}\{\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_n\}
  • The process works as follows:
    1. Set u1=v1v1\mathbf{u}_1 = \frac{\mathbf{v}_1}{\|\mathbf{v}_1\|}
    2. For i=2,,ni = 2, \ldots, n:
      • Compute wi=vij=1i1vi,ujuj\mathbf{w}_i = \mathbf{v}_i - \sum_{j=1}^{i-1} \langle \mathbf{v}_i, \mathbf{u}_j \rangle \mathbf{u}_j
      • Set ui=wiwi\mathbf{u}_i = \frac{\mathbf{w}_i}{\|\mathbf{w}_i\|}
  • The resulting set {u1,u2,,un}\{\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_n\} is an orthonormal basis for the span of the original set
  • Gram-Schmidt process is numerically unstable for large sets of vectors, so modified versions (modified Gram-Schmidt) are often used in practice

Applications in Data Science

  • Inner products and orthogonality are fundamental concepts in many data science and machine learning algorithms
  • Principal Component Analysis (PCA) uses orthogonal projections to find the directions of maximum variance in a dataset
    • The principal components are orthogonal and capture the most important information in the data
  • Singular Value Decomposition (SVD) factorizes a matrix into orthogonal matrices and a diagonal matrix of singular values
    • SVD is used in recommender systems, latent semantic analysis, and data compression
  • Orthogonal basis functions (Fourier, wavelets) are used in signal processing and feature extraction
    • Representing signals in an orthogonal basis can reveal important patterns and structures
  • Least squares regression finds the best-fitting linear model by minimizing the orthogonal projection of the residuals
  • Orthogonal matching pursuit is a sparse coding algorithm that iteratively selects the most correlated basis vector with the residual

Practice Problems and Examples

  1. Verify that the following function defines an inner product on R2\mathbb{R}^2: (x1,y1),(x2,y2)=2x1x2+3y1y2\langle (x_1, y_1), (x_2, y_2) \rangle = 2x_1x_2 + 3y_1y_2
  2. Find the angle between the vectors u=(1,2,1)\mathbf{u} = (1, 2, -1) and v=(2,0,3)\mathbf{v} = (2, 0, 3) using the standard inner product in R3\mathbb{R}^3
  3. Determine whether the vectors u=(1,1,1)\mathbf{u} = (1, 1, 1), v=(1,1,0)\mathbf{v} = (1, -1, 0), and w=(1,1,2)\mathbf{w} = (1, 1, -2) form an orthogonal set in R3\mathbb{R}^3
  4. Find the orthogonal projection of the vector u=(3,4)\mathbf{u} = (3, 4) onto the subspace spanned by v=(1,1)\mathbf{v} = (1, 1) in R2\mathbb{R}^2
  5. Apply the Gram-Schmidt process to the vectors v1=(1,0,1)\mathbf{v}_1 = (1, 0, 1), v2=(1,1,0)\mathbf{v}_2 = (1, 1, 0), and v3=(0,1,1)\mathbf{v}_3 = (0, 1, 1) to construct an orthonormal basis for R3\mathbb{R}^3
  6. Given a dataset with features x1,x2,,xn\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n, explain how PCA uses orthogonal projections to reduce the dimensionality of the data
  7. Discuss how the Gram-Schmidt process can be used in the QR factorization of a matrix and its applications in solving linear systems and least squares problems


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.