Study smarter with Fiveable
Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.
PCA is the workhorse of dimensionality reduction, and you'll encounter it everywhere in data science—from preprocessing high-dimensional datasets to building recommendation systems and visualizing complex data. But here's what exams really test: your understanding of the linear algebra mechanics underneath the algorithm. You're being tested on eigendecomposition, variance maximization, orthogonal projections, and matrix transformations—PCA just happens to be the perfect vehicle for all of these concepts.
Don't just memorize "standardize, then find eigenvectors." Know why each step exists and what linear algebra principle it demonstrates. When an FRQ asks you to explain why we use the covariance matrix or what eigenvalues actually represent, you need to connect the dots between the algorithm and the underlying mathematics. Master these connections, and PCA questions become straightforward applications of concepts you already understand.
Before any linear algebra magic happens, your data needs to be in the right form. Standardization ensures that the covariance matrix reflects true relationships between features, not artifacts of measurement scale.
The covariance matrix is the foundation of PCA—it encodes everything about how your features relate to each other. This symmetric matrix contains all the information needed to find directions of maximum spread in your data.
Compare: Covariance matrix vs. Correlation matrix—both capture feature relationships, but the correlation matrix is already standardized (values between -1 and 1). If your data is standardized first, they're identical. FRQs may ask when you'd use one over the other.
This is where core linear algebra takes center stage. Eigenvectors of the covariance matrix point in directions of maximum variance; eigenvalues tell you how much variance each direction captures.
Compare: Eigenvalues vs. Singular values—in PCA on standardized data, singular values from SVD equal the square root of eigenvalues from eigendecomposition. SVD is often more numerically stable, which is why libraries like scikit-learn use it internally.
Choosing how many components to keep is both art and science. The goal is to retain enough variance to preserve meaningful structure while eliminating noise and redundancy.
Compare: Keeping 2 components vs. keeping 10—with 2 components, you can visualize data in 2D but may lose important structure. With 10, you preserve more information but lose interpretability. The right choice depends on your downstream task.
The final step applies everything you've computed. Projection is a linear transformation that maps your original data onto the subspace spanned by the selected principal components.
Compare: PCA projection vs. feature selection—PCA creates new composite features (linear combinations), while feature selection keeps original features intact. PCA is better for correlated features; selection preserves interpretability.
| Concept | Best Examples |
|---|---|
| Matrix centering/scaling | Data standardization, z-score transformation |
| Symmetric matrix properties | Covariance matrix calculation, guaranteed orthogonal eigenvectors |
| Eigendecomposition | Eigenvalue/eigenvector computation, characteristic equation |
| Variance maximization | Sorting by eigenvalues, explained variance ratio |
| Orthogonal projection | Projecting data onto PCs, matrix multiplication |
| Dimensionality trade-offs | Selecting principal components, elbow method |
| Linear transformation | Final projection, reconstruction via |
Why must data be standardized before computing the covariance matrix, and what would happen to your principal components if you skipped this step with features on different scales?
Which two PCA steps both rely directly on the eigenvalues of the covariance matrix, and how does each step use them differently?
Compare and contrast the information contained in eigenvectors versus eigenvalues—if someone gave you only the eigenvectors, what could you determine about the data, and what would be missing?
If your first three principal components explain 95% of the variance, what does this tell you about the effective dimensionality of your original dataset? How might this inform your modeling choices?
Explain why the projected data has uncorrelated features. What property of the eigenvectors guarantees this, and why is it useful for downstream analysis?