Geometric transformations let you manipulate the spatial relationships between pixels in an image. They're fundamental to computer vision tasks like image registration, feature matching, and 3D reconstruction. This guide covers the main transformation types, their matrix representations, and how they're applied in practice.
Types of geometric transformations
Geometric transformations change where pixels end up in an image. Different transformations preserve different properties (distances, angles, parallelism), and knowing which properties are preserved tells you a lot about what each transformation can and can't do.
Translation vs rotation
Translation moves every point in an image by a fixed distance in a specified direction. If you shift by in x and in y:
Rotation turns every point around a fixed center by angle :
Both are rigid transformations, meaning they preserve distances between points and the shape/size of objects. The difference: translation also preserves orientation (nothing tilts), while rotation changes it.
Scaling vs shearing
Scaling multiplies coordinates by a scale factor to change object size.
- Uniform scaling uses the same factor for both axes:
- Non-uniform scaling uses different factors:
Shearing slants an object along one axis, distorting its shape while preserving area.
- Horizontal shear:
- Vertical shear:
Scaling changes size; shearing changes angles. Both are used for perspective correction and image warping.
Affine vs projective transformations
Affine transformations combine translation, rotation, scaling, and shearing. The key property: they preserve parallelism. Lines that are parallel before the transformation stay parallel after. In 2D, an affine transformation is represented by a 2×3 matrix (or 3×3 in homogeneous coordinates).
Projective transformations (also called homographies in 2D) are more general. They map lines to lines but do not necessarily preserve parallelism. Think of how railroad tracks appear to converge toward the horizon. In 2D, a projective transformation uses a full 3×3 matrix; in 3D, a 4×4 matrix.
Every affine transformation is a special case of a projective transformation, but not the other way around. Projective transformations are essential for modeling camera perspective and reconstructing 3D scenes.
Matrix representation
Matrices give you a unified way to represent, apply, and combine geometric transformations. Instead of handling each transformation type differently, you express them all as matrix multiplications.
Homogeneous coordinates
Standard Euclidean coordinates can't represent translation as a matrix multiplication (it's an addition). Homogeneous coordinates fix this by adding an extra dimension:
- 2D point becomes
- 3D point becomes
With this extra coordinate, all geometric transformations (including translation) become matrix multiplications. Homogeneous coordinates also let you represent points at infinity, which matters for projective geometry.
Transformation matrices
Using homogeneous coordinates, here are the standard 3×3 matrices for 2D transformations:
Translation:
Rotation by angle :
Scaling:
To apply a transformation, multiply the matrix by the homogeneous coordinate vector of each point.
Composition of transformations
You can chain multiple transformations by multiplying their matrices together. For example, to rotate then translate, you multiply (the rightmost matrix is applied first).
Two things to remember:
- Order matters. Matrix multiplication is not commutative. Rotating then translating gives a different result than translating then rotating.
- Efficiency. You can pre-multiply all your transformation matrices into a single matrix, then apply that one matrix to every point. This is much faster than applying each transformation separately.
2D transformations
These operate on images in a two-dimensional plane and form the basis for most image processing tasks.
2D translation
Shifts every pixel by . The matrix:
Preserves shape, size, and orientation. Common uses: image alignment, object tracking, and correcting camera shake.
2D rotation
Rotates every pixel around a center point by angle . The matrix (for rotation about the origin):
To rotate around an arbitrary point , you compose three steps: translate the center to the origin, rotate, then translate back. Preserves shape and size but changes orientation.
2D scaling
Resizes objects by scale factors and . The matrix:
When , you get uniform scaling (aspect ratio preserved). When they differ, you get non-uniform scaling, which distorts proportions. Used for image resizing, zooming, and multi-scale analysis.

2D shearing
Slants an object along one axis. The matrices:
Horizontal shear (slants along x):
Vertical shear (slants along y):
Shearing preserves area but changes angles. Note that shearing does not preserve parallelism in general for all line pairs, but it does preserve parallelism along the axis perpendicular to the shear direction. It's used in perspective correction and visual effects.
3D transformations
These extend 2D concepts into three-dimensional space using 4×4 matrices in homogeneous coordinates.
3D translation
Moves points by a vector :
Preserves shape, size, and orientation. Used for positioning 3D objects and simulating camera movement.
3D rotation
Rotates points around a specified axis. Unlike 2D (where there's only one rotation axis), 3D has three basic rotation matrices, one for each coordinate axis. For example, rotation around the z-axis by angle :
Arbitrary 3D rotations are composed by multiplying rotations around individual axes. The order of these rotations matters (this is related to the concept of Euler angles and gimbal lock).
3D scaling
Scales objects along each axis independently:
Uniform scaling () preserves proportions. Non-uniform scaling distorts them. Used for model resizing and level-of-detail representations.
3D shearing
Slants a 3D object along one or more planes (xy, yz, xz). Like 2D shearing, it preserves volume but changes angles. Applied in deformation modeling and special effects.
Projective geometry
Projective geometry extends Euclidean geometry by including points at infinity. This framework models how 3D scenes look when projected through a camera onto a 2D image.
Perspective projection
Perspective projection maps 3D points onto a 2D image plane, mimicking how a camera captures a scene. It's represented by a 3×4 projection matrix that combines:
- Intrinsic parameters: focal length, principal point, pixel scaling
- Extrinsic parameters: camera position and orientation in the world
This projection produces effects like foreshortening (distant objects appear smaller) and vanishing points (parallel lines converge). Understanding this matrix is fundamental to camera modeling.
Homography
A homography is a projective transformation between two planes, represented by a 3×3 matrix. If you have corresponding points between two images of the same planar surface, you can compute the homography that maps one to the other.
Key properties:
- Preserves collinearity (points on a line stay on a line)
- Requires at least 4 point correspondences to solve (since the 3×3 matrix has 8 degrees of freedom, up to scale)
Applications include image stitching (panoramas), augmented reality (overlaying graphics on flat surfaces), and camera calibration.
Vanishing points
When parallel lines in 3D (like road edges or building edges) are projected onto a 2D image, they appear to converge at a vanishing point. A set of parallel lines sharing the same 3D direction all converge to the same vanishing point.
Vanishing points are useful because they encode information about 3D scene geometry. You can use them to estimate camera orientation, infer the 3D layout of a scene, and detect dominant directions in architectural images.
Applications in computer vision
![Translation vs rotation, scikit-image: image processing in Python [PeerJ]](https://storage.googleapis.com/static.prod.fiveable.me/search-images%2F%22Translation_vs_rotation_geometric_transformations_in_computer_vision_image_processing_with_mathematical_representations%22-fig-1-full.png)
Image registration
Image registration aligns multiple images of the same scene, taken from different viewpoints or at different times. The process typically involves:
- Detecting and matching features across images
- Estimating the transformation (translation, rotation, scaling, or a full homography) that best aligns the matched features
- Warping one image to match the other using the estimated transformation
Used in medical imaging (aligning scans over time), remote sensing (combining satellite images), and panorama stitching.
Camera calibration
Camera calibration determines the intrinsic parameters (focal length, principal point, lens distortion) and extrinsic parameters (position and orientation) of a camera. A common approach uses a known calibration pattern (like a checkerboard):
- Capture multiple images of the pattern at different orientations
- Detect the pattern's corners in each image
- Use the known 3D geometry and detected 2D points to solve for camera parameters
Accurate calibration is critical for 3D reconstruction, augmented reality, and any application where you need to make real-world measurements from images.
3D reconstruction
3D reconstruction recovers the three-dimensional structure of a scene from 2D images. It relies heavily on projective geometry and multi-view geometry:
- Stereo vision uses two calibrated cameras to triangulate 3D points from corresponding 2D points
- Structure from Motion (SfM) estimates both camera poses and 3D structure from a sequence of images
- Depth sensors (like LiDAR or structured light) provide direct 3D measurements
Applications include autonomous navigation, 3D modeling, and scene understanding.
Implementation techniques
OpenCV for transformations
OpenCV is the most widely used open-source computer vision library. Key functions for geometric transformations include:
cv2.warpAffine()for affine transformationscv2.warpPerspective()for projective transformationscv2.getRotationMatrix2D()to build a rotation matrixcv2.findHomography()to compute a homography from point correspondencescv2.calibrateCamera()for camera calibration
Available in both C++ and Python.
MATLAB for transformations
MATLAB's Image Processing Toolbox provides high-level functions like imwarp(), affine2d(), and projective2d() for applying transformations. Its visualization tools make it particularly useful for prototyping and debugging transformation pipelines.
Python libraries for transformations
Beyond OpenCV, several Python libraries handle geometric transformations:
- NumPy: efficient matrix operations for building and applying transformation matrices
- SciPy (
scipy.ndimage): functions likeaffine_transform()for image warping - Pillow (PIL): basic transformations like resize, rotate, and crop
- scikit-image: more advanced warping and geometric transformation tools via
skimage.transform
Optimization of transformations
Inverse transformations
When warping an image, you typically use the inverse transformation rather than the forward one. Instead of asking "where does this source pixel go?", you ask "which source pixel maps to this destination pixel?" This avoids holes in the output image where no source pixel lands.
For simple transformations, the inverse is straightforward (e.g., the inverse of a rotation by is a rotation by ). For composed transformations, the inverse of is .
Efficient computation methods
- Matrix decomposition (e.g., LU or SVD) speeds up solving transformation equations
- Caching precomputed transformation matrices avoids redundant calculations when applying the same transformation to many images
- Fixed-point arithmetic replaces floating-point operations with integer math for faster computation on embedded systems
- Look-up tables for trigonometric values (sin, cos) used in rotation can reduce computation time
Parallel processing techniques
Geometric transformations are highly parallelizable because each output pixel can be computed independently.
- GPU acceleration: libraries like CUDA and OpenCL process thousands of pixels simultaneously
- SIMD instructions: vectorized CPU operations that apply the same transformation to multiple pixels in a single clock cycle
- Batch processing: applying transformations to multiple images concurrently across CPU cores
- Distributed computing: frameworks like Apache Spark for processing very large image datasets across multiple machines