➗Linear Algebra for Data Science Unit 5 – Inner Products & Orthogonality in Linear Algebra

Inner products and orthogonality are fundamental concepts in linear algebra that extend the idea of dot products to abstract vector spaces. These concepts provide a framework for understanding geometric relationships between vectors, including angles and lengths. Orthogonal projections and the Gram-Schmidt process are key tools for decomposing vectors and constructing orthonormal bases. These techniques have wide-ranging applications in data science, including dimensionality reduction, feature extraction, and signal processing, making them essential for working with high-dimensional data and machine learning algorithms.

Study Guides for Unit 5

5.1

Inner products and their properties

4 min read

5.2

Orthogonality and orthonormal bases

4 min read

5.3

Gram-Schmidt process

3 min read

5.4

Least squares approximation

3 min read

Key Concepts

Inner products generalize the notion of the dot product to abstract vector spaces
Properties of inner products include symmetry, linearity, and positive definiteness
Geometric interpretation of inner products relates to angles and lengths of vectors
Orthogonality describes vectors that are perpendicular to each other with respect to an inner product
Orthogonal projections decompose a vector into components parallel and orthogonal to a subspace
Gram-Schmidt process constructs an orthonormal basis from a linearly independent set of vectors
Applications in data science include dimensionality reduction, feature extraction, and signal processing
Understanding these concepts is crucial for working with high-dimensional data and machine learning algorithms

Definition and Properties of Inner Products

An inner product is a function that takes two vectors as input and returns a scalar value
Denoted as $\langle \mathbf{u}, \mathbf{v} \rangle$ for vectors $\mathbf{u}$ and $\mathbf{v}$
Satisfies the following properties for all vectors $\mathbf{u}, \mathbf{v}, \mathbf{w}$ $u, v, w$ and scalar $c$ $c$ :
- Symmetry: $\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle$
- Linearity: $\langle c\mathbf{u}, \mathbf{v} \rangle = c\langle \mathbf{u}, \mathbf{v} \rangle$ and $\langle \mathbf{u} + \mathbf{v}, \mathbf{w} \rangle = \langle \mathbf{u}, \mathbf{w} \rangle + \langle \mathbf{v}, \mathbf{w} \rangle$
- Positive definiteness: $\langle \mathbf{u}, \mathbf{u} \rangle \geq 0$ and $\langle \mathbf{u}, \mathbf{u} \rangle = 0$ if and only if $\mathbf{u} = \mathbf{0}$
The standard inner product (dot product) in $\mathbb{R}^n$ is defined as $\langle \mathbf{u}, \mathbf{v} \rangle = \sum_{i=1}^n u_i v_i$
Other examples of inner products include weighted inner products and inner products on function spaces

Geometric Interpretation

The inner product of two vectors is related to their lengths and the angle between them
$\langle \mathbf{u}, \mathbf{v} \rangle = \|\mathbf{u}\| \|\mathbf{v}\| \cos \theta$ , where $\theta$ is the angle between $\mathbf{u}$ and $\mathbf{v}$
If $\langle \mathbf{u}, \mathbf{v} \rangle = 0$ , then $\mathbf{u}$ and $\mathbf{v}$ are orthogonal (perpendicular)
The inner product can be used to compute the length (norm) of a vector: $\|\mathbf{u}\| = \sqrt{\langle \mathbf{u}, \mathbf{u} \rangle}$
The Cauchy-Schwarz inequality states that $|\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\| \|\mathbf{v}\|$ , with equality if and only if $\mathbf{u}$ and $\mathbf{v}$ are linearly dependent
Geometric interpretation helps to visualize and understand the relationships between vectors in high-dimensional spaces

Orthogonality and Orthogonal Vectors

Two vectors $\mathbf{u}$ and $\mathbf{v}$ are orthogonal if their inner product is zero: $\langle \mathbf{u}, \mathbf{v} \rangle = 0$
Orthogonal vectors are perpendicular to each other in the vector space
A set of vectors $\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n\}$ is orthogonal if $\langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0$ for all $i \neq j$
An orthogonal set of vectors is linearly independent
If the vectors in an orthogonal set are also unit vectors (length 1), the set is called orthonormal
Orthogonal and orthonormal bases have useful properties for representing and manipulating vectors in a vector space
- Any vector can be uniquely expressed as a linear combination of the basis vectors
- The coefficients in the linear combination are easily computed using inner products

Orthogonal Projections

Orthogonal projection decomposes a vector into components parallel and orthogonal to a subspace
The projection of a vector $\mathbf{u}$ onto a subspace $W$ is the closest point in $W$ to $\mathbf{u}$
Denoted as $\text{proj}_W(\mathbf{u})$ $proj_{W} (u)$ , the orthogonal projection is characterized by:
- $\text{proj}_W(\mathbf{u}) \in W$
- $\mathbf{u} - \text{proj}_W(\mathbf{u})$ is orthogonal to every vector in $W$
If $\{\mathbf{w}_1, \mathbf{w}_2, \ldots, \mathbf{w}_k\}$ is an orthonormal basis for $W$ , then $\text{proj}_W(\mathbf{u}) = \sum_{i=1}^k \langle \mathbf{u}, \mathbf{w}_i \rangle \mathbf{w}_i$
Orthogonal projections are used in least squares approximation, signal denoising, and dimensionality reduction techniques (PCA)
The orthogonal complement of a subspace $W$ $W$ , denoted $W^\perp$ $W^{⊥}$ , is the set of all vectors orthogonal to every vector in $W$ $W$
- $\mathbf{u} = \text{proj}_W(\mathbf{u}) + \text{proj}_{W^\perp}(\mathbf{u})$ for any vector $\mathbf{u}$

Gram-Schmidt Process

The Gram-Schmidt process constructs an orthonormal basis from a linearly independent set of vectors
Given a linearly independent set $\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n\}$ , the process iteratively constructs an orthonormal set $\{\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_n\}$
The process works as follows:
1. Set $\mathbf{u}_1 = \frac{\mathbf{v}_1}{\|\mathbf{v}_1\|}$
2. For $i = 2, \ldots, n$ $i = 2, \dots, n$ :
  - Compute $\mathbf{w}_i = \mathbf{v}_i - \sum_{j=1}^{i-1} \langle \mathbf{v}_i, \mathbf{u}_j \rangle \mathbf{u}_j$
  - Set $\mathbf{u}_i = \frac{\mathbf{w}_i}{\|\mathbf{w}_i\|}$
The resulting set $\{\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_n\}$ is an orthonormal basis for the span of the original set
Gram-Schmidt process is numerically unstable for large sets of vectors, so modified versions (modified Gram-Schmidt) are often used in practice

Applications in Data Science

Inner products and orthogonality are fundamental concepts in many data science and machine learning algorithms
Principal Component Analysis (PCA) uses orthogonal projections to find the directions of maximum variance in a dataset
- The principal components are orthogonal and capture the most important information in the data
Singular Value Decomposition (SVD) factorizes a matrix into orthogonal matrices and a diagonal matrix of singular values
- SVD is used in recommender systems, latent semantic analysis, and data compression
Orthogonal basis functions (Fourier, wavelets) are used in signal processing and feature extraction
- Representing signals in an orthogonal basis can reveal important patterns and structures
Least squares regression finds the best-fitting linear model by minimizing the orthogonal projection of the residuals
Orthogonal matching pursuit is a sparse coding algorithm that iteratively selects the most correlated basis vector with the residual

Practice Problems and Examples

Verify that the following function defines an inner product on $\mathbb{R}^2$ : $\langle (x_1, y_1), (x_2, y_2) \rangle = 2x_1x_2 + 3y_1y_2$
Find the angle between the vectors $\mathbf{u} = (1, 2, -1)$ and $\mathbf{v} = (2, 0, 3)$ using the standard inner product in $\mathbb{R}^3$
Determine whether the vectors $\mathbf{u} = (1, 1, 1)$ , $\mathbf{v} = (1, -1, 0)$ , and $\mathbf{w} = (1, 1, -2)$ form an orthogonal set in $\mathbb{R}^3$
Find the orthogonal projection of the vector $\mathbf{u} = (3, 4)$ onto the subspace spanned by $\mathbf{v} = (1, 1)$ in $\mathbb{R}^2$
Apply the Gram-Schmidt process to the vectors $\mathbf{v}_1 = (1, 0, 1)$ , $\mathbf{v}_2 = (1, 1, 0)$ , and $\mathbf{v}_3 = (0, 1, 1)$ to construct an orthonormal basis for $\mathbb{R}^3$
Given a dataset with features $\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n$ , explain how PCA uses orthogonal projections to reduce the dimensionality of the data
Discuss how the Gram-Schmidt process can be used in the QR factorization of a matrix and its applications in solving linear systems and least squares problems