👁️Computer Vision and Image Processing Unit 7 Review

7.5 Generative Adversarial Networks (GANs)

👁️Computer Vision and Image Processing
Unit 7 Review

7.5 Generative Adversarial Networks (GANs)

Written by the Fiveable Content Team • Last updated September 2025

👁️Computer Vision and Image Processing

Unit & Topic Study Guides

7.1 Artificial neural networks

7.2 Convolutional Neural Networks (CNN)

7.3 CNN architectures

7.4 Transfer learning with CNNs

7.5 Generative Adversarial Networks (GANs)

7.6 Object detection with deep learning

Generative Adversarial Networks (GANs) are revolutionizing image synthesis in computer vision. By pitting two neural networks against each other, GANs create realistic images from random noise, enabling applications like image enhancement and style transfer.

GANs consist of a generator and discriminator network, trained through an adversarial process. The generator aims to create convincing fake images, while the discriminator tries to distinguish real from fake. This competition drives both networks to improve, resulting in high-quality synthetic images.

Fundamentals of GANs

Generative Adversarial Networks revolutionize image synthesis in computer vision by creating realistic images from random noise
GANs consist of two neural networks competing against each other, enabling the generation of high-quality, diverse visual content
Applications of GANs in computer vision include image enhancement, style transfer, and data augmentation for improved model training

GAN architecture

Two-network structure comprises a generator and a discriminator
Generator network transforms random noise into synthetic images
Discriminator network distinguishes between real and generated images
Networks are typically implemented as deep convolutional neural networks (DCNNs)
Adversarial training process improves both networks iteratively

Generator vs discriminator

Generator aims to create increasingly realistic images to fool the discriminator
Discriminator acts as a binary classifier, predicting whether an image is real or fake
Generator learns to map from latent space to image space
Discriminator improves its ability to detect subtle differences between real and generated images
Balance between generator and discriminator crucial for successful training

Adversarial training process

Alternating training steps between generator and discriminator
Generator minimizes the probability of the discriminator correctly classifying generated images
Discriminator maximizes its ability to distinguish between real and fake images
Backpropagation updates network parameters based on the adversarial loss
Nash equilibrium reached when generator produces indistinguishable fake images

GAN loss functions

Loss functions guide the optimization process in GANs, influencing the quality and stability of generated images
Different loss functions address various challenges in GAN training, such as mode collapse and vanishing gradients
Choosing the appropriate loss function depends on the specific GAN architecture and application in computer vision tasks

Minimax loss

Original loss function proposed in the GAN paper by Goodfellow et al.
Formulated as a two-player minimax game between generator and discriminator
Generator minimizes $\log(1 - D(G(z)))$ while discriminator maximizes $\log(D(x)) + \log(1 - D(G(z)))$
Can lead to vanishing gradients for the generator when discriminator becomes too strong
Often replaced by the non-saturating loss in practice to mitigate training instability

Wasserstein loss

Addresses limitations of the original GAN loss by using Wasserstein distance
Improves stability and reduces mode collapse in GAN training
Requires enforcement of 1-Lipschitz constraint on the discriminator (critic)
Loss function for generator: $-E[D(G(z))]$ , for discriminator: $E[D(G(z))] - E[D(x)]$
Gradient penalty or weight clipping used to satisfy Lipschitz constraint

Least squares loss

Proposed to overcome vanishing gradients problem in original GAN loss
Replaces log loss with L2 loss for both generator and discriminator
Generator loss: $\frac{1}{2}E[(D(G(z)) - 1)^2]$
Discriminator loss: $\frac{1}{2}E[(D(x) - 1)^2] + \frac{1}{2}E[D(G(z))^2]$
Provides more stable gradients during training, leading to improved image quality

Types of GANs

Various GAN architectures have been developed to address specific challenges in image generation and manipulation
Different types of GANs extend the capabilities of the original model for diverse computer vision applications
Specialized GAN architectures enable more controlled and targeted image synthesis

Conditional GANs

Incorporate additional input information to guide the generation process
Condition both generator and discriminator on class labels, text descriptions, or other attributes
Enable more controlled image generation based on specific conditions
Applications include text-to-image synthesis and attribute-based image editing
Improved diversity and relevance of generated images compared to unconditional GANs

Progressive GANs

Generate high-resolution images by incrementally growing both generator and discriminator
Start with low-resolution images and progressively increase resolution during training
Allows for stable training of high-quality, high-resolution image generation
Reduces training time and improves image quality compared to training at full resolution from the start
Enables generation of realistic images at resolutions up to 1024x1024 pixels

CycleGANs

Perform unpaired image-to-image translation without requiring paired training data
Consist of two generators and two discriminators for bidirectional translation
Utilize cycle consistency loss to maintain content consistency across translations
Applications include style transfer, season transfer, and object transfiguration
Enable learning of mappings between domains with limited or no paired examples

GAN training challenges

Training GANs presents unique difficulties due to the adversarial nature of the learning process
Overcoming these challenges is crucial for generating high-quality, diverse images in computer vision applications
Addressing training issues improves the stability, convergence, and overall performance of GANs

Mode collapse

Occurs when the generator produces limited varieties of samples, failing to capture the full data distribution
Results in lack of diversity in generated images
Caused by generator finding a few modes that consistently fool the discriminator
Mitigation strategies include minibatch discrimination, unrolled GANs, and Wasserstein loss
Regularization techniques (spectral normalization) help prevent mode collapse

Vanishing gradients

Problem arises when discriminator becomes too powerful, providing little useful feedback to the generator
Leads to slow or stalled training progress for the generator
Occurs particularly with the original GAN loss function
Alternative loss functions (Wasserstein loss, least squares loss) help alleviate this issue
Techniques like gradient penalty and spectral normalization improve gradient flow

Training instability

Manifests as oscillations in loss values and image quality during training
Caused by the adversarial nature of GAN training and sensitivity to hyperparameters
Can result in failure to converge or sudden collapse of training progress
Stabilization techniques include two-timescale update rule (TTUR) and gradient penalty
Careful hyperparameter tuning and architectural choices crucial for stable training

Applications in computer vision

GANs have revolutionized various tasks in computer vision by enabling high-quality image synthesis and manipulation
GAN-based techniques improve existing computer vision algorithms and enable new applications
Integration of GANs in computer vision pipelines enhances overall system performance and capabilities

Image-to-image translation

Transforms images from one domain to another while preserving content and structure
Applications include style transfer, colorization, and domain adaptation
Pix2Pix architecture uses conditional GANs for paired image translation
CycleGAN enables unpaired translation between domains
Enhances cross-domain learning and data augmentation in computer vision tasks

Super-resolution

Increases the resolution and quality of low-resolution images
GAN-based methods (SRGAN, EnhanceNet) produce sharper and more realistic high-resolution images
Outperforms traditional super-resolution techniques in terms of perceptual quality
Applications in medical imaging, satellite imagery, and video enhancement
Improves downstream computer vision tasks by providing higher quality input images

Inpainting

Reconstructs missing or damaged parts of an image
GAN-based inpainting methods generate contextually appropriate and visually coherent content
Applications include image restoration, object removal, and image editing
Context Encoders and Globally and Locally Consistent Image Completion utilize GANs for inpainting
Enhances image processing pipelines in computer vision systems

Evaluation metrics for GANs

Quantitative evaluation of GANs is challenging due to the lack of a single ground truth for generated images
Metrics assess various aspects of GAN performance, including image quality, diversity, and realism
Combination of multiple metrics provides a more comprehensive evaluation of GAN models

Inception Score

Measures both quality and diversity of generated images
Utilizes a pre-trained Inception v3 network to classify generated images
Higher scores indicate better quality and diversity of generated samples
Computed as $exp(E_x[KL(p(y|x) || p(y))])$ , where p(y|x) is the conditional class distribution
Limitations include sensitivity to image artifacts and lack of consideration for intra-class diversity

Fréchet Inception Distance

Compares the statistics of real and generated images in the feature space of a pre-trained network
Lower FID scores indicate higher similarity between real and generated image distributions
Computed as $||\mu_r - \mu_g||^2 + Tr(\Sigma_r + \Sigma_g - 2(\Sigma_r\Sigma_g)^{1/2})$
More robust to image artifacts compared to Inception Score
Captures both quality and diversity of generated images

Perceptual Path Length

Measures the smoothness and consistency of the GAN's latent space
Calculates the average perceptual difference between consecutive images on a random walk in latent space
Lower PPL values indicate a more disentangled and smooth latent space
Computed using a perceptual similarity metric (LPIPS) between image pairs
Helps assess the quality of the learned latent representation in GANs

Recent advancements

Continuous research in GANs has led to significant improvements in image quality, stability, and controllability
Recent advancements address limitations of earlier GAN architectures and enable new applications in computer vision
State-of-the-art GAN models push the boundaries of realistic image synthesis and manipulation

StyleGAN architecture

Introduces style-based generator for improved control over generated image attributes
Separates high-level attributes from stochastic variation in generated images
Utilizes adaptive instance normalization (AdaIN) for style mixing and transfer
Enables fine-grained control over generated image characteristics
Produces state-of-the-art results in high-resolution image generation (faces, objects)

BigGAN for high-resolution images

Scales up GAN training to generate high-quality, high-resolution images
Utilizes large batch sizes and parallel training across multiple GPUs
Incorporates self-attention mechanisms and spectral normalization for improved stability
Achieves state-of-the-art results on ImageNet at 128x128 and 256x256 resolutions
Demonstrates the potential of GANs for generating complex, diverse images at scale

Self-attention in GANs

Incorporates self-attention mechanisms to capture long-range dependencies in images
Improves coherence and global consistency in generated images
Self-Attention GAN (SAGAN) applies self-attention layers in both generator and discriminator
Enhances the ability to generate complex scenes with multiple objects
Combines local and global information for more realistic image synthesis

Ethical considerations

The powerful capabilities of GANs in image synthesis and manipulation raise important ethical concerns
Addressing these ethical issues is crucial for responsible development and deployment of GAN technologies
Researchers and practitioners must consider the potential societal impacts of GAN applications

Deepfakes and misinformation

GANs enable creation of highly realistic fake images and videos (deepfakes)
Potential for misuse in spreading misinformation and manipulating public opinion
Challenges in detecting and combating deepfake content
Need for robust detection algorithms and public awareness campaigns
Ethical guidelines and regulations required for responsible use of deepfake technology

Privacy concerns

GANs can potentially reconstruct or infer private information from limited data
Risk of generating realistic faces or other identifiable features without consent
Potential for misuse in surveillance and identity theft
Need for privacy-preserving GAN architectures and data protection measures
Ethical considerations in collecting and using personal data for GAN training

Bias in generated images

GANs can perpetuate and amplify biases present in training data
Risk of underrepresentation or misrepresentation of certain groups in generated images
Potential for reinforcing stereotypes and discrimination in downstream applications
Need for diverse and representative training datasets
Importance of fairness and bias evaluation metrics for GAN-generated content

👁️Computer Vision and Image Processing Unit 7 Review