๐Ÿ‘๏ธComputer Vision and Image Processing

Image Enhancement Techniques

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Image enhancement sits at the foundation of nearly every computer vision pipeline. Before any algorithm can detect objects, recognize faces, or segment scenes, the input image often needs preprocessing to correct for poor lighting, sensor noise, or low contrast. You're being tested on your understanding of how these techniques manipulate pixel values and when to apply each one.

The techniques here demonstrate core principles: intensity transformations, spatial domain operations, frequency domain analysis, and adaptive processing. Exam questions will ask you to select the appropriate technique for a given scenario, explain the mathematical basis behind a method, or compare approaches for noise reduction versus edge preservation. Don't just memorize definitions. Know what problem each technique solves and the tradeoffs involved.


Intensity Transformation Techniques

These methods operate directly on pixel values using mathematical functions, transforming input intensity to output intensity without considering neighboring pixels. The key principle: a point operation applies the same transformation function to every pixel independently.

Histogram Equalization

Histogram equalization redistributes pixel intensities to produce a more uniform histogram, which spreads out the most frequent intensity values and increases global contrast.

  • The mapping uses the cumulative distribution function (CDF) of the original histogram: sk=(Lโˆ’1)โˆ‘j=0kpr(rj)s_k = (L-1) \sum_{j=0}^{k} p_r(r_j), where pr(rj)p_r(r_j) is the probability of intensity level rjr_j and LL is the number of possible intensity levels (typically 256).
  • Best for low-contrast images where pixel values cluster in a narrow range. It's less effective when you need to preserve specific tonal relationships, since it remaps the entire histogram without regard for which regions you care about.

Contrast Stretching

Contrast stretching linearly maps the existing intensity range [rmin,rmax][r_{min}, r_{max}] to the full available range [0,Lโˆ’1][0, L-1]:

s=rโˆ’rminrmaxโˆ’rminร—(Lโˆ’1)s = \frac{r - r_{min}}{r_{max} - r_{min}} \times (L-1)

  • Simpler than histogram equalization because it applies a fixed linear transformation rather than a histogram-dependent mapping.
  • Sensitive to outliers. A single very bright or dark pixel can compress the stretch for everything else. In practice, you'll often clip the top and bottom 1-2% of pixel values before stretching.

Gamma Correction

This is a nonlinear power-law transformation defined as s=cโ‹…rฮณs = c \cdot r^\gamma, where cc is a scaling constant and ฮณ\gamma controls the curve shape.

  • ฮณ<1\gamma < 1 brightens the image (expands dark tones), while ฮณ>1\gamma > 1 darkens it (compresses dark tones). Think of ฮณ\gamma as controlling how aggressively mid-tones get shifted.
  • Unlike linear stretching, gamma correction targets mid-tones specifically while leaving the extremes (very dark and very bright pixels) relatively unchanged.
  • It also compensates for display nonlinearity, since monitors and human vision don't perceive brightness linearly. Standard sRGB encoding uses ฮณโ‰ˆ2.2\gamma \approx 2.2.

Compare: Histogram Equalization vs. Contrast Stretching. Both improve contrast, but histogram equalization adapts to the image's actual distribution while contrast stretching applies a fixed linear mapping. If an exam question mentions "adaptive contrast improvement," histogram equalization is your answer.


Spatial Domain Filtering

Spatial filters modify pixel values based on the values of neighboring pixels within a defined kernel (or window). The underlying mechanism: convolution of the image with a filter mask determines whether you smooth, sharpen, or detect features.

Spatial Filtering (Smoothing and Sharpening)

  • Smoothing uses averaging kernels like the box filter (all equal weights) or Gaussian kernel (center-weighted) to blur images and reduce high-frequency noise.
  • Sharpening enhances edges by subtracting a smoothed version of the image, or by using Laplacian-based kernels that respond to second-order derivatives (rapid intensity changes).
  • Kernel size matters. Larger kernels produce stronger effects but blur fine details or create halos around edges. A 3ร—33 \times 3 kernel is common for light smoothing; 5ร—55 \times 5 or 7ร—77 \times 7 for heavier noise.

Unsharp Masking

Unsharp masking sharpens by adding a scaled version of the high-frequency detail back into the original image:

g(x,y)=f(x,y)+k[f(x,y)โˆ’fblur(x,y)]g(x,y) = f(x,y) + k[f(x,y) - f_{blur}(x,y)]

The term f(x,y)โˆ’fblur(x,y)f(x,y) - f_{blur}(x,y) isolates the detail (edges and texture) that was removed by blurring.

  • Parameter kk controls sharpening strength. Values typically range from 0.5 to 2.0. Setting kk too high causes ringing artifacts (bright/dark halos along edges).
  • This is an industry standard in photography and medical imaging where subtle detail enhancement improves diagnostic or visual quality, and it achieves sharpening without directly computing image derivatives.

Noise Reduction Techniques

Choosing the right filter depends on the type of noise:

  • Mean filtering averages all neighbors in the kernel. It reduces noise but blurs edges.
  • Median filtering replaces each pixel with the median of its neighborhood. It's nonlinear and preserves edges much better than averaging.
  • Gaussian filtering weights neighbors by distance from the center: G(x,y)=12ฯ€ฯƒ2eโˆ’x2+y22ฯƒ2G(x,y) = \frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}}, providing smooth blur with controllable spread via ฯƒ\sigma.

The matching rule: salt-and-pepper noise responds best to median filtering (because the median ignores the extreme outlier values), while Gaussian noise responds better to Gaussian or mean filters (which average out the distributed noise).

Adaptive Filtering

Fixed filters apply the same operation everywhere, which is a problem when noise levels or image content vary across the frame. Adaptive filters adjust their behavior based on local statistics like mean and variance.

  • In uniform regions (low variance), the filter applies stronger smoothing. Near edges (high variance), it backs off to preserve detail.
  • The Wiener filter is a classic example. It minimizes mean square error by adapting to local signal and noise characteristics.
  • Adaptive filters outperform fixed filters in most real-world applications where conditions aren't uniform.

Compare: Median Filter vs. Gaussian Filter. Both reduce noise, but median filtering is nonlinear and excels at removing salt-and-pepper noise while preserving edges. Gaussian filtering is linear and better for Gaussian-distributed noise but blurs edges. Exam questions often ask which filter to choose for a specific noise type.


Edge and Boundary Detection

Edge detection identifies locations where intensity changes rapidly, marking boundaries between regions. The mathematical basis: edges correspond to high values of the first derivative (gradient) or zero-crossings of the second derivative (Laplacian).

Edge Detection

Gradient-based operators compute the magnitude of intensity change. The Sobel operator, for example, computes separate horizontal (GxG_x) and vertical (GyG_y) derivatives, then combines them:

G=Gx2+Gy2G = \sqrt{G_x^2 + G_y^2}

The Canny edge detector is a multi-stage pipeline and remains the gold standard for accuracy and noise robustness:

  1. Gaussian smoothing to reduce noise
  2. Gradient computation (magnitude and direction)
  3. Non-maximum suppression to thin edges to single-pixel width
  4. Hysteresis thresholding using two thresholds (high and low) to connect strong edges to weak ones while discarding isolated weak responses

Prewitt and Roberts operators are simpler alternatives with smaller kernels, but they're more sensitive to noise than Canny.

Image Thresholding

Thresholding converts a grayscale image to binary: g(x,y)=1g(x,y) = 1 if f(x,y)>Tf(x,y) > T, else 00. This is a critical step in segmentation pipelines.

  • Global thresholding uses a single value TT for the entire image. Otsu's method automatically selects TT by maximizing between-class variance, which finds the threshold that best separates foreground and background pixel distributions.
  • Adaptive thresholding computes a different threshold for each pixel based on its local neighborhood. This is essential for images with uneven illumination, such as document scanning and OCR, where shadows or lighting gradients would cause global thresholding to fail.

Compare: Global Thresholding vs. Adaptive Thresholding. Global works when lighting is uniform, but adaptive thresholding handles shadows and gradients by computing thresholds from local neighborhoods. If an exam scenario mentions "varying illumination," adaptive is the correct choice.


Frequency Domain Processing

Frequency domain methods transform images using the Fourier Transform, allowing you to manipulate specific frequency components directly. The core insight: low frequencies carry overall structure and smooth variations; high frequencies encode edges, noise, and fine details.

Frequency Domain Filtering

The Discrete Fourier Transform (DFT) converts spatial data to a frequency representation:

F(u,v)=โˆ‘x=0Mโˆ’1โˆ‘y=0Nโˆ’1f(x,y)eโˆ’j2ฯ€(ux/M+vy/N)F(u,v) = \sum_{x=0}^{M-1}\sum_{y=0}^{N-1}f(x,y)e^{-j2\pi(ux/M + vy/N)}

Once in the frequency domain, you multiply by a filter function to keep or remove specific frequencies, then apply the inverse transform to get back to a spatial image.

  • Low-pass filters attenuate high frequencies to smooth images. High-pass filters attenuate low frequencies to sharpen or detect edges.
  • Butterworth and Gaussian filters provide smoother frequency transitions than ideal (sharp cutoff) filters. Ideal filters cause ringing artifacts because their sharp cutoffs in frequency correspond to infinite spatial extent.

Compare: Spatial Domain vs. Frequency Domain Filtering. Spatial filtering is intuitive and computationally efficient for small kernels. Frequency domain filtering excels for large kernels and gives precise control over which frequencies to modify. The key relationship: convolution in the spatial domain equals multiplication in the frequency domain. This means a large spatial convolution (expensive) can be done as a simple multiplication after a Fourier Transform (often faster).


Quick Reference Table

ConceptBest Examples
Point operations (intensity transforms)Histogram Equalization, Contrast Stretching, Gamma Correction
Linear spatial filteringGaussian Smoothing, Mean Filter, Sharpening Kernels
Nonlinear spatial filteringMedian Filter, Adaptive Filtering
Edge detectionSobel, Canny, Prewitt
SegmentationImage Thresholding, Otsu's Method
Frequency domainFourier Transform, Low-pass/High-pass Filters
Detail enhancementUnsharp Masking, High-pass Filtering
Noise-specific solutionsMedian (salt-and-pepper), Gaussian filter (Gaussian noise)

Self-Check Questions

  1. Which two techniques both improve contrast but differ in whether they adapt to the image's histogram distribution? Explain when you'd choose one over the other.

  2. You're given an image corrupted by salt-and-pepper noise. Compare the effectiveness of mean filtering versus median filtering, and justify which you'd select.

  3. Explain how unsharp masking achieves sharpening without directly computing derivatives. What parameter controls the strength of the effect?

  4. An image has uneven lighting across the frame. Compare global thresholding with adaptive thresholding. Which would you recommend and why?

  5. Describe the relationship between spatial domain convolution and frequency domain multiplication. Why might you choose frequency domain filtering for a very large smoothing kernel?