Fiveable

👁️Computer Vision and Image Processing Unit 2 Review

QR code for Computer Vision and Image Processing practice questions

2.5 Geometric transformations

2.5 Geometric transformations

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
👁️Computer Vision and Image Processing
Unit & Topic Study Guides

Geometric transformations let you manipulate the spatial relationships between pixels in an image. They're fundamental to computer vision tasks like image registration, feature matching, and 3D reconstruction. This guide covers the main transformation types, their matrix representations, and how they're applied in practice.

Types of geometric transformations

Geometric transformations change where pixels end up in an image. Different transformations preserve different properties (distances, angles, parallelism), and knowing which properties are preserved tells you a lot about what each transformation can and can't do.

Translation vs rotation

Translation moves every point in an image by a fixed distance in a specified direction. If you shift by txt_x in x and tyt_y in y:

(x,y)=(x+tx,y+ty)(x', y') = (x + t_x, y + t_y)

Rotation turns every point around a fixed center by angle θ\theta:

(x,y)=(xcosθysinθ, xsinθ+ycosθ)(x', y') = (x \cos \theta - y \sin \theta,\ x \sin \theta + y \cos \theta)

Both are rigid transformations, meaning they preserve distances between points and the shape/size of objects. The difference: translation also preserves orientation (nothing tilts), while rotation changes it.

Scaling vs shearing

Scaling multiplies coordinates by a scale factor to change object size.

  • Uniform scaling uses the same factor for both axes: (x,y)=(sx, sy)(x', y') = (s \cdot x,\ s \cdot y)
  • Non-uniform scaling uses different factors: (x,y)=(sxx, syy)(x', y') = (s_x \cdot x,\ s_y \cdot y)

Shearing slants an object along one axis, distorting its shape while preserving area.

  • Horizontal shear: (x,y)=(x+ky, y)(x', y') = (x + ky,\ y)
  • Vertical shear: (x,y)=(x, y+kx)(x', y') = (x,\ y + kx)

Scaling changes size; shearing changes angles. Both are used for perspective correction and image warping.

Affine vs projective transformations

Affine transformations combine translation, rotation, scaling, and shearing. The key property: they preserve parallelism. Lines that are parallel before the transformation stay parallel after. In 2D, an affine transformation is represented by a 2×3 matrix (or 3×3 in homogeneous coordinates).

Projective transformations (also called homographies in 2D) are more general. They map lines to lines but do not necessarily preserve parallelism. Think of how railroad tracks appear to converge toward the horizon. In 2D, a projective transformation uses a full 3×3 matrix; in 3D, a 4×4 matrix.

Every affine transformation is a special case of a projective transformation, but not the other way around. Projective transformations are essential for modeling camera perspective and reconstructing 3D scenes.

Matrix representation

Matrices give you a unified way to represent, apply, and combine geometric transformations. Instead of handling each transformation type differently, you express them all as matrix multiplications.

Homogeneous coordinates

Standard Euclidean coordinates can't represent translation as a matrix multiplication (it's an addition). Homogeneous coordinates fix this by adding an extra dimension:

  • 2D point (x,y)(x, y) becomes (x,y,1)(x, y, 1)
  • 3D point (x,y,z)(x, y, z) becomes (x,y,z,1)(x, y, z, 1)

With this extra coordinate, all geometric transformations (including translation) become matrix multiplications. Homogeneous coordinates also let you represent points at infinity, which matters for projective geometry.

Transformation matrices

Using homogeneous coordinates, here are the standard 3×3 matrices for 2D transformations:

Translation: [10tx01ty001]\begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix}

Rotation by angle θ\theta: [cosθsinθ0sinθcosθ0001]\begin{bmatrix} \cos \theta & -\sin \theta & 0 \\ \sin \theta & \cos \theta & 0 \\ 0 & 0 & 1 \end{bmatrix}

Scaling: [sx000sy0001]\begin{bmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{bmatrix}

To apply a transformation, multiply the matrix by the homogeneous coordinate vector of each point.

Composition of transformations

You can chain multiple transformations by multiplying their matrices together. For example, to rotate then translate, you multiply TRT \cdot R (the rightmost matrix is applied first).

Two things to remember:

  • Order matters. Matrix multiplication is not commutative. Rotating then translating gives a different result than translating then rotating.
  • Efficiency. You can pre-multiply all your transformation matrices into a single matrix, then apply that one matrix to every point. This is much faster than applying each transformation separately.

2D transformations

These operate on images in a two-dimensional plane and form the basis for most image processing tasks.

2D translation

Shifts every pixel by (tx,ty)(t_x, t_y). The matrix:

[10tx01ty001]\begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix}

Preserves shape, size, and orientation. Common uses: image alignment, object tracking, and correcting camera shake.

2D rotation

Rotates every pixel around a center point by angle θ\theta. The matrix (for rotation about the origin):

[cosθsinθ0sinθcosθ0001]\begin{bmatrix} \cos \theta & -\sin \theta & 0 \\ \sin \theta & \cos \theta & 0 \\ 0 & 0 & 1 \end{bmatrix}

To rotate around an arbitrary point (cx,cy)(c_x, c_y), you compose three steps: translate the center to the origin, rotate, then translate back. Preserves shape and size but changes orientation.

2D scaling

Resizes objects by scale factors sxs_x and sys_y. The matrix:

[sx000sy0001]\begin{bmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{bmatrix}

When sx=sys_x = s_y, you get uniform scaling (aspect ratio preserved). When they differ, you get non-uniform scaling, which distorts proportions. Used for image resizing, zooming, and multi-scale analysis.

Translation vs rotation, Computer Vision

2D shearing

Slants an object along one axis. The matrices:

Horizontal shear (slants along x): [1k0010001]\begin{bmatrix} 1 & k & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Vertical shear (slants along y): [100k10001]\begin{bmatrix} 1 & 0 & 0 \\ k & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Shearing preserves area but changes angles. Note that shearing does not preserve parallelism in general for all line pairs, but it does preserve parallelism along the axis perpendicular to the shear direction. It's used in perspective correction and visual effects.

3D transformations

These extend 2D concepts into three-dimensional space using 4×4 matrices in homogeneous coordinates.

3D translation

Moves points by a vector (tx,ty,tz)(t_x, t_y, t_z):

[100tx010ty001tz0001]\begin{bmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix}

Preserves shape, size, and orientation. Used for positioning 3D objects and simulating camera movement.

3D rotation

Rotates points around a specified axis. Unlike 2D (where there's only one rotation axis), 3D has three basic rotation matrices, one for each coordinate axis. For example, rotation around the z-axis by angle θ\theta:

[cosθsinθ00sinθcosθ0000100001]\begin{bmatrix} \cos \theta & -\sin \theta & 0 & 0 \\ \sin \theta & \cos \theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}

Arbitrary 3D rotations are composed by multiplying rotations around individual axes. The order of these rotations matters (this is related to the concept of Euler angles and gimbal lock).

3D scaling

Scales objects along each axis independently:

[sx0000sy0000sz00001]\begin{bmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}

Uniform scaling (sx=sy=szs_x = s_y = s_z) preserves proportions. Non-uniform scaling distorts them. Used for model resizing and level-of-detail representations.

3D shearing

Slants a 3D object along one or more planes (xy, yz, xz). Like 2D shearing, it preserves volume but changes angles. Applied in deformation modeling and special effects.

Projective geometry

Projective geometry extends Euclidean geometry by including points at infinity. This framework models how 3D scenes look when projected through a camera onto a 2D image.

Perspective projection

Perspective projection maps 3D points onto a 2D image plane, mimicking how a camera captures a scene. It's represented by a 3×4 projection matrix that combines:

  • Intrinsic parameters: focal length, principal point, pixel scaling
  • Extrinsic parameters: camera position and orientation in the world

This projection produces effects like foreshortening (distant objects appear smaller) and vanishing points (parallel lines converge). Understanding this matrix is fundamental to camera modeling.

Homography

A homography is a projective transformation between two planes, represented by a 3×3 matrix. If you have corresponding points between two images of the same planar surface, you can compute the homography that maps one to the other.

Key properties:

  • Preserves collinearity (points on a line stay on a line)
  • Requires at least 4 point correspondences to solve (since the 3×3 matrix has 8 degrees of freedom, up to scale)

Applications include image stitching (panoramas), augmented reality (overlaying graphics on flat surfaces), and camera calibration.

Vanishing points

When parallel lines in 3D (like road edges or building edges) are projected onto a 2D image, they appear to converge at a vanishing point. A set of parallel lines sharing the same 3D direction all converge to the same vanishing point.

Vanishing points are useful because they encode information about 3D scene geometry. You can use them to estimate camera orientation, infer the 3D layout of a scene, and detect dominant directions in architectural images.

Applications in computer vision

Translation vs rotation, scikit-image: image processing in Python [PeerJ]

Image registration

Image registration aligns multiple images of the same scene, taken from different viewpoints or at different times. The process typically involves:

  1. Detecting and matching features across images
  2. Estimating the transformation (translation, rotation, scaling, or a full homography) that best aligns the matched features
  3. Warping one image to match the other using the estimated transformation

Used in medical imaging (aligning scans over time), remote sensing (combining satellite images), and panorama stitching.

Camera calibration

Camera calibration determines the intrinsic parameters (focal length, principal point, lens distortion) and extrinsic parameters (position and orientation) of a camera. A common approach uses a known calibration pattern (like a checkerboard):

  1. Capture multiple images of the pattern at different orientations
  2. Detect the pattern's corners in each image
  3. Use the known 3D geometry and detected 2D points to solve for camera parameters

Accurate calibration is critical for 3D reconstruction, augmented reality, and any application where you need to make real-world measurements from images.

3D reconstruction

3D reconstruction recovers the three-dimensional structure of a scene from 2D images. It relies heavily on projective geometry and multi-view geometry:

  • Stereo vision uses two calibrated cameras to triangulate 3D points from corresponding 2D points
  • Structure from Motion (SfM) estimates both camera poses and 3D structure from a sequence of images
  • Depth sensors (like LiDAR or structured light) provide direct 3D measurements

Applications include autonomous navigation, 3D modeling, and scene understanding.

Implementation techniques

OpenCV for transformations

OpenCV is the most widely used open-source computer vision library. Key functions for geometric transformations include:

  • cv2.warpAffine() for affine transformations
  • cv2.warpPerspective() for projective transformations
  • cv2.getRotationMatrix2D() to build a rotation matrix
  • cv2.findHomography() to compute a homography from point correspondences
  • cv2.calibrateCamera() for camera calibration

Available in both C++ and Python.

MATLAB for transformations

MATLAB's Image Processing Toolbox provides high-level functions like imwarp(), affine2d(), and projective2d() for applying transformations. Its visualization tools make it particularly useful for prototyping and debugging transformation pipelines.

Python libraries for transformations

Beyond OpenCV, several Python libraries handle geometric transformations:

  • NumPy: efficient matrix operations for building and applying transformation matrices
  • SciPy (scipy.ndimage): functions like affine_transform() for image warping
  • Pillow (PIL): basic transformations like resize, rotate, and crop
  • scikit-image: more advanced warping and geometric transformation tools via skimage.transform

Optimization of transformations

Inverse transformations

When warping an image, you typically use the inverse transformation rather than the forward one. Instead of asking "where does this source pixel go?", you ask "which source pixel maps to this destination pixel?" This avoids holes in the output image where no source pixel lands.

For simple transformations, the inverse is straightforward (e.g., the inverse of a rotation by θ\theta is a rotation by θ-\theta). For composed transformations, the inverse of ABA \cdot B is B1A1B^{-1} \cdot A^{-1}.

Efficient computation methods

  • Matrix decomposition (e.g., LU or SVD) speeds up solving transformation equations
  • Caching precomputed transformation matrices avoids redundant calculations when applying the same transformation to many images
  • Fixed-point arithmetic replaces floating-point operations with integer math for faster computation on embedded systems
  • Look-up tables for trigonometric values (sin, cos) used in rotation can reduce computation time

Parallel processing techniques

Geometric transformations are highly parallelizable because each output pixel can be computed independently.

  • GPU acceleration: libraries like CUDA and OpenCL process thousands of pixels simultaneously
  • SIMD instructions: vectorized CPU operations that apply the same transformation to multiple pixels in a single clock cycle
  • Batch processing: applying transformations to multiple images concurrently across CPU cores
  • Distributed computing: frameworks like Apache Spark for processing very large image datasets across multiple machines