Image preprocessing and enhancement are the steps you take to clean up, correct, and improve raw imagery before doing any real analysis. Without them, distortions, noise, and inconsistent radiometry can lead to wrong conclusions when you try to extract features, classify land cover, or detect changes over time.
This guide covers noise reduction, contrast enhancement, image restoration, geometric transformations, radiometric calibration, and color correction.
Image preprocessing fundamentals
Preprocessing prepares raw digital images for analysis by correcting distortions, removing noise, improving contrast, and standardizing pixel values. The goal is to make sure the data you feed into classification, feature extraction, or change detection algorithms is as clean and consistent as possible.
Noise reduction techniques
Median filtering replaces each pixel with the median value of its neighbors. It's especially good at removing salt-and-pepper noise (random bright/dark pixels) while keeping edges sharp, since the median isn't pulled by extreme outliers.
Gaussian filtering applies a weighted average where closer neighbors contribute more, following a Gaussian (bell curve) distribution. This smooths the image and reduces high-frequency noise, but it blurs edges more than median filtering does.
Bilateral filtering is a smarter version of Gaussian filtering. It weights neighbors by both spatial distance and intensity similarity, so it smooths flat regions while preserving edges. If two pixels are close together but very different in brightness, the filter won't blend them.
Non-local means filtering goes further by searching the entire image for patches that look similar to the current pixel's neighborhood. It averages those similar patches to estimate the noise-free value. This leverages the natural redundancy in images (many patches repeat) and tends to preserve fine detail better than local filters.
Contrast enhancement methods
Linear contrast stretching maps the original range of pixel values to the full dynamic range of the display. If your image only uses DN values 50–150 out of a possible 0–255, stretching remaps 50→0 and 150→255, spreading everything in between.
Gamma correction applies a power-law transformation: . Values of brighten the image (lifting dark tones), while darkens it and increases contrast. This compensates for non-linear sensor or display responses.
Adaptive histogram equalization (AHE) divides the image into small tiles and applies histogram equalization independently to each one. This enhances local contrast, which is useful when different parts of the image have very different brightness ranges.
Contrast-limited adaptive histogram equalization (CLAHE) improves on AHE by capping the contrast enhancement in each tile. Without this limit, AHE can amplify noise in homogeneous regions. CLAHE clips the histogram at a user-defined threshold before computing the equalization, keeping noise under control.
Histogram equalization
Histogram equalization is a global technique that redistributes pixel intensities so the histogram becomes approximately uniform. The steps are:
- Compute the histogram of the image (count of pixels at each intensity level).
- Calculate the cumulative distribution function (CDF) from that histogram.
- Map each original pixel value to a new value using the CDF, scaled to the full output range.
The effect is that the most common intensity values get spread apart, boosting overall contrast. This technique works well when useful information is concentrated in a narrow intensity range, but it can over-enhance noise in already-uniform areas since it's applied globally.
Spatial filtering for enhancement
Spatial filtering works by sliding a small convolution kernel (a matrix of weights) across the image. At each pixel, the kernel multiplies the surrounding pixel values by its weights and sums the result to produce the output value.
- Low-pass filters (e.g., averaging filter) suppress high-frequency components. They smooth the image and reduce noise, but blur edges.
- High-pass filters (e.g., Laplacian filter) accentuate high-frequency components. They sharpen edges and fine details, but also amplify noise.
- Unsharp masking subtracts a blurred (low-pass filtered) version of the image from the original, then adds the difference back. This emphasizes edges and improves perceived sharpness.
- Morphological operations (erosion, dilation, opening, closing) modify pixel values based on shape-based structuring elements. They can remove small objects, fill gaps, or extract structures of a particular size and shape.
Color space transformations
Color space transformations convert images between different representations of color, each useful for different tasks.
- RGB to grayscale reduces a three-channel color image to a single intensity channel. Useful for simplifying processing when color isn't needed.
- RGB to HSV (hue, saturation, value) separates chromatic information (hue, saturation) from brightness (value). This makes color-based segmentation much easier since you can threshold hue independently of illumination.
- RGB to LAB () maps colors into a perceptually uniform space where equal numerical distances correspond to equal perceived color differences. This is valuable for accurate color comparisons.
- YCbCr separates luminance (Y) from chrominance (Cb, Cr). Widely used in image compression (e.g., JPEG) because the human eye is less sensitive to chrominance detail, allowing heavier compression on those channels.
Image restoration techniques
Image restoration tries to recover the original image from a degraded version. Unlike enhancement (which just makes things look better), restoration explicitly models the degradation process and attempts to invert it. In geospatial work, degradation commonly comes from atmospheric distortion, sensor limitations, or platform motion.
Deblurring vs denoising
Deblurring removes blur caused by camera shake, defocus, or atmospheric turbulence. The goal is to restore sharpness and recover lost detail.
Denoising removes random noise introduced during acquisition or transmission, aiming to suppress unwanted variation while preserving real image features.
These two problems are related but distinct. A key challenge is that deblurring tends to amplify noise, while aggressive denoising can interfere with blur estimation. In practice, you often need to address both simultaneously.
Inverse filtering approach
Inverse filtering is the most straightforward restoration method. It models degradation as convolution with a blur kernel (point spread function, or PSF) and attempts to undo it:
- Take the Fourier transform of the degraded image.
- Divide by the Fourier transform of the known blur kernel.
- Apply the inverse Fourier transform to get the restored image.
The problem: wherever the blur kernel's frequency response is near zero, dividing by it massively amplifies noise. This makes inverse filtering extremely sensitive to noise and prone to ringing artifacts. It's rarely used in practice, but it's important to understand as the conceptual starting point for more robust methods.
Wiener filtering
Wiener filtering improves on inverse filtering by incorporating knowledge of the noise. It minimizes the mean square error between the restored and original images by balancing deblurring against noise amplification.
The filter adapts based on the signal-to-noise ratio (SNR) at each frequency. Where SNR is high, it behaves like inverse filtering. Where SNR is low, it backs off to avoid amplifying noise. This requires estimates of the power spectra of both the signal and the noise, which can be estimated from the degraded image or assumed from prior knowledge.
Wiener filtering is much more robust than inverse filtering and remains a standard tool for image restoration.
Regularization methods
Regularization adds constraints or prior knowledge about the original image to stabilize the restoration and prevent noise amplification.
- Tikhonov regularization adds a smoothness penalty, discouraging solutions with rapid intensity variations. It's simple and effective but tends to over-smooth edges.
- Total variation (TV) regularization minimizes the total variation of the image, which promotes piecewise smooth solutions. It preserves edges much better than Tikhonov while still suppressing noise.
- Sparsity-based regularization assumes the image has a sparse representation in some transform domain (e.g., wavelets or gradients) and penalizes solutions with many non-zero coefficients. This captures the structure of natural images well.
Blind deconvolution
Blind deconvolution tackles the harder problem where the blur kernel itself is unknown. You must estimate both the original image and the PSF from the degraded image alone.
This is highly ill-posed (many combinations of image and kernel could produce the same degraded result), so additional constraints are essential. Common assumptions include sparsity of image gradients and non-negativity of the blur kernel.
Iterative approaches alternate between estimating the kernel and estimating the image, refining both until convergence. Deep learning approaches have shown strong results by learning the degraded-to-clean mapping from large training datasets, bypassing explicit kernel estimation entirely.
Geometric transformations
Geometric transformations modify the spatial relationships between pixels. In geospatial engineering, they're essential for image registration (aligning images to each other), orthorectification (removing terrain and sensor distortions), and mosaicking (stitching images together into a seamless map).
Affine vs non-affine transformations
Affine transformations preserve parallel lines and distance ratios. They combine translation, rotation, scaling, and shearing into a single linear transformation. They're good for correcting simple, global geometric distortions.
Non-affine transformations (polynomial, elastic, projective) allow more complex, localized warping. They can model lens distortion, terrain-induced displacement, or other spatially varying effects that affine transforms can't capture.
Use affine when the distortion is uniform across the image. Use non-affine when distortion varies spatially or involves perspective effects.
Translation, rotation, scaling
These are the building blocks of geometric correction:
- Translation shifts every pixel by a constant offset in x and y. Used to align misregistered images.
- Rotation turns the image around a fixed point (typically the center) by a specified angle. Corrects orientation differences.
- Scaling changes image size by a factor in x and/or y. Used to match resolutions between datasets or to zoom into regions of interest.
All three can be combined into a single affine transformation matrix for efficient computation.
Homography estimation
A homography is a projective (perspective) transformation that maps points from one image plane to another, assuming the scene points lie on a common plane (or the camera undergoes pure rotation).
Estimating the homography involves:
- Identifying corresponding feature points in both images.
- Setting up a system of equations relating the point pairs through a homography matrix .
- Solving for using at least 4 point correspondences (8 equations for 8 unknowns, since is defined up to scale).
- Using RANSAC (Random Sample Consensus) to robustly handle outlier correspondences.
Homography estimation is fundamental for image stitching, perspective correction, and camera calibration.
Resampling techniques
After applying a geometric transformation, the output pixel locations generally don't land exactly on the original pixel grid. Resampling computes new pixel values at these non-integer locations.
- Nearest-neighbor: assigns the value of the closest input pixel. Fast, but produces blocky/aliased results. Preserves original DN values, which matters for thematic data.
- Bilinear interpolation: linearly interpolates using the 4 nearest input pixels. Smoother than nearest-neighbor, with slight blurring.
- Bicubic interpolation: fits a cubic polynomial using a neighborhood of input pixels. Best balance of smoothness and detail preservation, but computationally heavier.
For continuous data (reflectance, elevation), bilinear or bicubic is usually preferred. For categorical data (land cover classes), nearest-neighbor avoids creating invalid intermediate values.
Interpolation methods
Beyond resampling transformed images, interpolation is needed whenever you work with irregularly spaced or missing data.
- Linear interpolation estimates values between two known samples using a straight line. Simple and fast, but can't capture curvature.
- Spline interpolation fits piecewise polynomials (e.g., cubic splines) through the known samples, producing smooth, continuous results that follow the shape of the data.
- Kriging is a geostatistical method that accounts for spatial correlation (modeled via a variogram) when estimating values at unsampled locations. It provides both an estimate and an uncertainty measure, making it especially valuable for spatial data in geospatial applications.
Radiometric calibration
Radiometric calibration converts raw digital numbers (DNs) recorded by a sensor into physically meaningful units like radiance () or reflectance. Without calibration, you can't reliably compare measurements across different sensors, dates, or viewing conditions.
Sensor response function
The sensor response function describes how incoming radiance maps to the DN value the sensor records. This relationship is typically non-linear and can vary across spectral bands and individual detector elements.
Common models include linear, logarithmic, and gamma functions. The response function is characterized through laboratory calibration before launch and updated via in-flight calibration using known reference targets (e.g., onboard calibration lamps or ground targets with known reflectance).
Accurate knowledge of the response function is the foundation of all radiometric calibration. Without it, you can't convert DNs to physical units.
Flat-field correction
Flat-field correction compensates for spatial non-uniformity in sensor response. Even if every pixel receives the same radiance, differences in detector sensitivity, lens vignetting, or dust on the optics can cause some pixels to read higher or lower than others.
The correction process:
- Image a spatially uniform reference target (e.g., an integrating sphere) under controlled illumination.
- Record this as the flat-field image.
- Divide each pixel in your raw scene image by the corresponding pixel in the flat-field image.
This normalizes out pixel-to-pixel sensitivity variations. Regular flat-field calibration is necessary because sensor characteristics drift over time.
Vignetting correction
Vignetting is a radial brightness falloff from the center to the edges of an image, caused by lens geometry, aperture settings, or sensor packaging. It's especially pronounced with wide-angle lenses.
Correction typically involves:
- Modeling the vignetting pattern as a radial function (polynomial or cosine-based).
- Estimating the model parameters from calibration images or from the scene itself.
- Dividing each pixel by the modeled falloff value to restore uniform brightness.
Vignetting correction is critical when mosaicking multiple images, since uncorrected falloff creates visible seams at image boundaries.
Absolute vs relative calibration
Absolute calibration ties sensor measurements to physical units (radiance in , or irradiance in ). It requires precise knowledge of the sensor response function, atmospheric conditions, and reference target properties. This is necessary when you need actual physical measurements.
Relative calibration ensures consistency between pixels or between images without converting to physical units. It normalizes measurements so they're internally comparable.
Relative calibration is often sufficient for change detection or classification, where you care about relative differences rather than absolute values. Absolute calibration is essential for physical modeling, such as estimating surface temperature or energy budgets.
Reflectance calibration
Reflectance calibration goes a step further than radiance calibration by converting to surface reflectance, the fraction of incoming solar radiation reflected by the surface.
The process involves:
- Converting DNs to at-sensor radiance using the sensor calibration coefficients.
- Correcting for atmospheric effects (scattering and absorption) using an atmospheric model or empirical methods.
- Dividing by the estimated incoming solar irradiance at the surface.
The result is a value between 0 and 1 that represents an intrinsic property of the surface, independent of illumination angle, atmospheric conditions, or sensor characteristics. This makes reflectance data comparable across different dates, sensors, and locations, which is essential for vegetation monitoring, soil mapping, and mineral exploration.
Color correction techniques
Color correction ensures that the color information in your images is accurate and consistent. Differences in illumination, sensor characteristics, and atmospheric conditions can all introduce color shifts that affect visual interpretation and multi-source data fusion.
White balance adjustment
White balance corrects color casts caused by the illumination spectrum. Under tungsten lighting, for example, images appear yellowish; under overcast skies, they appear bluish. White balance adjusts the gain of each color channel so that neutral objects (white or gray) actually appear neutral.
Manual white balance: select a known neutral reference point in the image and scale the color channels to make it neutral.
Automatic white balance algorithms estimate the illuminant from the image content using assumptions like:
- Gray world assumption: the average color of the scene should be neutral gray.
- White patch method: the brightest pixels in the scene represent the illuminant color.
Proper white balance is the foundation of accurate color reproduction.
Color space conversion
Different color spaces serve different purposes:
- RGB is standard for cameras and displays.
- CMYK is used for print production.
- LAB () is perceptually uniform, making it ideal for measuring color differences.
- HSV separates hue from brightness, useful for color-based analysis.
Converting between spaces requires mathematical transformations that account for different primaries, white points, and gamma curves. Proper color space management prevents color shifts when moving images between devices or software.
Gamma correction
Gamma correction applies the power-law transformation to adjust brightness and contrast.
- : brightens the image (lifts shadows)
- : no change
- : darkens the image (increases contrast in highlights)
Typical gamma values range from about 0.45 to 2.5. Displays typically have a built-in gamma around 2.2, so images are often encoded with (the inverse) to compensate. Understanding gamma is important when working with imagery from different sources, since mismatched gamma encoding produces washed-out or overly dark results.
Color constancy algorithms
Color constancy algorithms try to estimate the true colors of objects regardless of illumination, mimicking how the human visual system perceives stable colors under changing light.
The general approach:
- Estimate the illumination spectrum from the image.
- Compensate for its effect on observed colors.
Simple methods use statistical assumptions (gray world, white patch). More advanced methods incorporate spatial color distributions, specular highlights, or machine learning trained on datasets of images under known illuminants. Color constancy is important for object recognition and scene understanding, where consistent color representation across varying conditions is critical.
Chromatic adaptation
Chromatic adaptation is the human visual system's ability to adjust to changes in illumination and still perceive object colors as relatively stable. Chromatic adaptation transforms (CATs) are mathematical models that simulate this process.
Common CATs include:
- Von Kries transform: scales each cone response independently. Simple but limited.
- Bradford transform: uses a modified cone space that better predicts adaptation for blue colors. Widely used in color management.
- CIECAT02 transform: part of the CIECAM02 color appearance model, designed for more accurate predictions across a wide range of conditions.
CATs are applied when you need to predict how colors will appear under a different illuminant than the one they were captured under. This is relevant when displaying geospatial imagery on different monitors or when fusing data captured under different lighting conditions.