Polar coordinates $(r, \theta)$ represent points in a 2D plane using a distance $r$ from the origin and an angle $\theta$ from the positive x-axis. They're the natural choice for problems with circular symmetry, such as circular motion or gravitational fields.

Spherical coordinates $(r, \theta, \phi)$ extend this idea to 3D by adding a polar angle $\phi$ measured from the positive z-axis. These are advantageous whenever the geometry has spherical symmetry (electric fields around point charges, angular momentum problems).

Conversion formulas to Cartesian $(x, y, z)$ :

Polar: $x = r\cos\theta$ , $y = r\sin\theta$
Spherical: $x = r\sin\phi\cos\theta$ , $y = r\sin\phi\sin\theta$ , $z = r\cos\phi$

Note the convention used here: $\phi$ is the polar angle from the z-axis and $\theta$ is the azimuthal angle in the xy-plane. Some textbooks swap these labels, so always check which convention your course follows.

Cylindrical and Elliptical Coordinate Systems

Cylindrical coordinates $(r, \theta, z)$ combine polar coordinates in the xy-plane with a standard Cartesian z-coordinate. They're the go-to system for problems with cylindrical symmetry: fluid flow through pipes, electromagnetic fields in waveguides, etc.

Elliptical (ellipsoidal) coordinates $(\xi, \eta, \phi)$ are built from families of confocal ellipsoids and hyperboloids. These show up when solving PDEs like Laplace's equation or the wave equation on ellipsoidal domains.

The guiding principle for choosing a coordinate system: pick the one whose level surfaces align with the natural boundaries of your region. This collapses complicated boundary descriptions into simple constant-coordinate surfaces, which makes setting up integration limits far easier.

Polar and Spherical Coordinate Systems, Triple Integrals in Cylindrical and Spherical Coordinates · Calculus

Physical Applications

Mass and Moment of Inertia Calculations

Change of variables lets you compute mass and center of mass for objects whose geometry doesn't fit neatly into Cartesian coordinates. The key is transforming the volume element correctly.

In Cartesian coordinates the mass element is $dm = \rho(x, y, z)\, dV$ . Under a change to spherical coordinates this becomes:

$dm = \rho(r, \theta, \phi)\, r^2 \sin\phi\, dr\, d\theta\, d\phi$

The factor $r^2 \sin\phi$ is the absolute value of the Jacobian determinant for the Cartesian-to-spherical transformation. Forgetting this factor is one of the most common mistakes on exams.

The moment of inertia tensor $I$ quantifies an object's resistance to rotational acceleration. It's a $3 \times 3$ symmetric matrix:

Diagonal elements $I_{xx}, I_{yy}, I_{zz}$ : moments of inertia about the coordinate axes
Off-diagonal elements $I_{xy}, I_{xz}, I_{yz}$ : products of inertia, which capture coupling between rotational axes

Choosing coordinates that match the object's symmetry often zeroes out the off-diagonal terms automatically. Use spherical coordinates for spheres and shells, cylindrical coordinates for shafts and disks.

Polar and Spherical Coordinate Systems, Quadric Surfaces · Calculus

Surface Area Calculations

For a parametric surface $\mathbf{r}(u, v) = (x(u,v),\, y(u,v),\, z(u,v))$ , the surface area is:

$A = \iint_D \left\lVert \frac{\partial \mathbf{r}}{\partial u} \times \frac{\partial \mathbf{r}}{\partial v} \right\rVert \, du\, dv$

The cross product of the two partial derivatives gives a vector normal to the surface, and its magnitude is the infinitesimal area element $dS$ .

Worked example: surface area of a sphere of radius $R$

Parametrize with $\mathbf{r}(\theta, \phi) = (R\sin\phi\cos\theta,\, R\sin\phi\sin\theta,\, R\cos\phi)$ where $\theta \in [0, 2\pi)$ and $\phi \in [0, \pi]$ . Computing the cross product gives $\lVert \mathbf{r}_\theta \times \mathbf{r}_\phi \rVert = R^2 \sin\phi$ , so:

$A = \int_0^{2\pi}\int_0^{\pi} R^2 \sin\phi\, d\phi\, d\theta = 4\pi R^2$

The Divergence Theorem (Gauss-Ostrogradsky) connects surface integrals to volume integrals: the flux of a vector field $\mathbf{F}$ through a closed surface $S$ equals the volume integral of $\nabla \cdot \mathbf{F}$ over the enclosed region $V$ . When a change of variables simplifies the volume integral, this theorem lets you avoid computing the surface integral directly.

Probability Distributions

Probability Density Functions and Random Variables

A joint probability density function (PDF) $f(x_1, \ldots, x_n)$ describes a continuous multivariate distribution. The probability that the random vector $(X_1, \ldots, X_n)$ falls in a region $A$ is:

$P((X_1, \ldots, X_n) \in A) = \int_A f(x_1, \ldots, x_n)\, dx_1 \cdots dx_n$

The change-of-variables theorem for PDFs is the probabilistic analog of what you've been doing with multiple integrals. If $\mathbf{Y} = \mathbf{g}(\mathbf{X})$ is a one-to-one transformation with inverse $\mathbf{X} = \mathbf{h}(\mathbf{Y})$ , then:

$f_{\mathbf{Y}}(\mathbf{y}) = f_{\mathbf{X}}(\mathbf{h}(\mathbf{y}))\, \lvert \det J_{\mathbf{h}}(\mathbf{y}) \rvert$

The absolute value of the Jacobian determinant plays exactly the same role here as it does in a standard change-of-variables integral: it accounts for how the transformation stretches or compresses volume.

Two related constructions that rely on integration:

Marginal PDF: integrate out all other variables to get the distribution of a single variable $X_i$ : $f_{X_i}(x_i) = \int f(x_1, \ldots, x_n)\, dx_1 \cdots dx_{i-1}\, dx_{i+1} \cdots dx_n$
Conditional PDF: divide the joint by the marginal: $f_{X_i | X_j}(x_i | x_j) = \frac{f(x_i, x_j)}{f_{X_j}(x_j)}$

Applications in Statistics and Machine Learning

The multivariate normal distribution is the workhorse model for correlated random variables. Its PDF is:

$f(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^n \det \Sigma}}\, \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^\top \Sigma^{-1}(\mathbf{x} - \boldsymbol{\mu})\right)$

where $\boldsymbol{\mu}$ is the mean vector and $\Sigma$ is the covariance matrix. A linear change of variables $\mathbf{Y} = A\mathbf{X} + \mathbf{b}$ transforms one multivariate normal into another, which is why linear algebra and change-of-variables techniques are so tightly linked in statistics.

Bayesian inference updates a prior distribution $f(\boldsymbol{\theta})$ to a posterior distribution using observed data $\mathbf{x}$ :

$f(\boldsymbol{\theta} | \mathbf{x}) \propto f(\mathbf{x} | \boldsymbol{\theta})\, f(\boldsymbol{\theta})$

Computing the normalizing constant requires integrating over the full parameter space, which is often high-dimensional. This is where coordinate transformations become essential:

MCMC methods (Metropolis-Hastings, Gibbs sampling) generate samples from complex posteriors by constructing Markov chains whose stationary distribution matches the target. Reparametrizations can dramatically improve convergence.
Variational inference sidesteps direct integration by approximating the posterior with a simpler family of distributions, minimizing the Kullback-Leibler divergence between the approximation and the true posterior.

In both cases, a well-chosen change of variables can turn an intractable integral into a tractable one, which is the same core idea you've been applying throughout this unit.