Why This Matters
Matrix multiplication is the backbone of nearly every data science algorithm you'll encounter—from neural network forward passes to PCA dimensionality reduction to solving linear regression with the normal equation. You're being tested not just on whether you can multiply matrices, but on whether you understand why the order matters, how properties like associativity let you optimize computation, and when you can (or can't) rearrange operations.
These properties fall into distinct categories: structural rules that govern when and how multiplication works, algebraic identities that let you simplify expressions, and derived properties that connect multiplication to other operations like transposes and determinants. Don't just memorize formulas—know which property applies when you're debugging a dimension mismatch, optimizing a chain of matrix operations, or proving that a transformation preserves certain characteristics.
Structural Rules: When Multiplication Works
Before you can use any other property, you need matrices that can actually be multiplied. These rules define the fundamental constraints and behaviors of matrix multiplication.
The core requirement: inner dimensions must match, and the order of operands changes the result.
Compatibility (Dimension Matching)
- Inner dimensions must agree—for AB to exist, if A is m×n, then B must be n×p, yielding an m×p result
- Output dimensions come from outer dimensions—the rows of the first matrix and columns of the second determine your result's shape
- Dimension errors are the most common bug in data science code; always verify shapes before multiplying in NumPy or PyTorch
Non-Commutativity
- Order matters: AB=BA in general—even when both products are defined, they typically produce different results
- Geometric interpretation: applying transformation A then B is different from B then A—think rotation then scaling vs. scaling then rotation
- Critical for neural networks—layer order in forward propagation cannot be rearranged without changing the model entirely
Compare: Compatibility vs. Non-Commutativity—compatibility tells you whether you can multiply, while non-commutativity tells you whether order matters (it almost always does). If an exam asks why AB exists but BA doesn't, that's a compatibility issue; if both exist but differ, that's non-commutativity.
Algebraic Identities: Simplifying Expressions
These properties let you restructure matrix expressions without changing results—essential for both hand calculations and computational optimization.
Use these rules to factor, expand, and regroup matrix products strategically.
Associativity
- Parentheses can shift: A(BC)=(AB)C—the grouping doesn't change the final result, only the intermediate steps
- Computational optimization—choosing the right grouping can dramatically reduce operations; multiplying a 100×1 vector last saves massive computation
- Enables chain rule derivations in backpropagation where you need to regroup gradient computations
Distributivity Over Addition
- Multiplication distributes: A(B+C)=AB+AC—works on both left and right: (B+C)A=BA+CA
- Essential for expanding expressions when solving matrix equations or simplifying loss function gradients
- Connects to linearity—this property is why matrix multiplication represents linear transformations
Multiplication by Identity Matrix
- Identity is neutral: AI=IA=A—the identity matrix I leaves any compatible matrix unchanged
- I is the matrix equivalent of multiplying by 1—it has 1s on the diagonal and 0s elsewhere
- Foundation for inverses—we define A−1 as the matrix where AA−1=I
Multiplication by Zero Matrix
- Zero annihilates: A⋅0=0—multiplying by the zero matrix (all entries 0) always yields a zero matrix
- Represents null transformation—maps every vector to the zero vector, collapsing all information
- Useful in proofs when showing that certain conditions force a result to vanish
Compare: Identity vs. Zero Matrix—both are special matrices that produce predictable results, but identity preserves information while zero destroys it. Exam questions often test whether you recognize these as the multiplicative analogues of 1 and 0 from scalar arithmetic.
Derived Properties: Connecting Operations
These properties link multiplication to other matrix operations—transposes, determinants, inverses, and traces. They're crucial for proofs and for understanding how transformations compose.
When you combine operations, the order often reverses.
Transpose of a Product
- Order reverses: (AB)T=BTAT—transposing a product flips the order of the factors
- Extends to chains: (ABC)T=CTBTAT—each factor transposes and the entire sequence reverses
- Essential for gradient derivations—backpropagation relies heavily on this property when computing weight updates
Inverse of a Product
- Order reverses: (AB)−1=B−1A−1—only valid when both A and B are individually invertible
- Think "socks and shoes"—to undo putting on socks then shoes, you remove shoes first, then socks
- Critical for solving ABx=b—you can write x=B−1A−1b, though computing inverses directly is often avoided in practice
Compare: Transpose vs. Inverse of a Product—both reverse the order of factors, but transpose always exists while inverse requires invertibility. If asked to simplify (AB)T(AB)−1, you need both rules and must verify invertibility.
Determinant of a Product
- Determinants multiply: det(AB)=det(A)⋅det(B)—the determinant of a product equals the product of determinants
- Geometric meaning—determinants measure volume scaling, so composing transformations multiplies their scaling factors
- Invertibility test—if det(AB)=0, then at least one of A or B is singular (non-invertible)
Trace of a Product
- Cyclic property: tr(AB)=tr(BA)—trace is invariant under cyclic permutations, even when AB=BA
- Extends to chains: tr(ABC)=tr(CAB)=tr(BCA)—you can cycle but not arbitrarily reorder
- Key in optimization—appears in Frobenius norm calculations and matrix calculus for machine learning objectives
Compare: Determinant vs. Trace of a Product—determinant is fully multiplicative (det(AB)=det(A)det(B)), while trace only has cyclic invariance (tr(AB)=tr(BA)), not tr(A)tr(B). Both appear in eigenvalue problems: determinant relates to the product of eigenvalues, trace to their sum.
Quick Reference Table
|
| Structural constraints | Compatibility, Non-commutativity |
| Expression simplification | Associativity, Distributivity |
| Special matrices | Identity matrix, Zero matrix |
| Order-reversing properties | Transpose of product, Inverse of product |
| Scalar-valued properties | Determinant of product, Trace of product |
| Computational optimization | Associativity (choosing grouping) |
| Invertibility conditions | Determinant of product, Inverse of product |
| Gradient/backprop essentials | Transpose of product, Distributivity |
Self-Check Questions
-
If A is 3×4 and B is 4×2, what are the dimensions of AB? Can you compute BA? Which property determines this?
-
You need to compute ABc where A is 1000×1000, B is 1000×1000, and c is 1000×1. Which grouping—(AB)c or A(Bc)—is computationally cheaper, and which property guarantees they give the same result?
-
Compare and contrast the transpose-of-product rule (AB)T=BTAT with the inverse-of-product rule (AB)−1=B−1A−1. What do they share, and when does one apply but not the other?
-
If det(A)=3 and det(B)=0, what is det(AB)? What does this tell you about the invertibility of AB?
-
True or false: tr(AB)=tr(A)⋅tr(B). If false, what is the correct trace property for products, and how does it differ from the determinant property?