upgrade
upgrade

👁️Computer Vision and Image Processing

Key Convolutional Neural Network Architectures

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Convolutional Neural Networks (CNNs) are key in computer vision, transforming how machines interpret images. From early models like LeNet-5 to advanced architectures like YOLO, these networks have evolved to tackle complex tasks like object detection and image segmentation efficiently.

  1. LeNet-5

    • One of the earliest convolutional neural networks, developed by Yann LeCun in 1998.
    • Primarily designed for handwritten digit recognition (e.g., MNIST dataset).
    • Consists of 7 layers, including convolutional, subsampling, and fully connected layers.
    • Introduced concepts like pooling layers and activation functions (tanh).
    • Paved the way for deeper networks and established the foundation for modern CNNs.
  2. AlexNet

    • Developed by Alex Krizhevsky in 2012, it won the ImageNet competition.
    • Features a deeper architecture with 8 layers, including 5 convolutional and 3 fully connected layers.
    • Utilizes ReLU activation function, which speeds up training compared to tanh.
    • Introduced dropout for regularization to prevent overfitting.
    • Demonstrated the effectiveness of GPUs for training deep networks.
  3. VGGNet

    • Introduced by the Visual Geometry Group in 2014, known for its simplicity and depth.
    • Consists of 16 to 19 layers, using small 3x3 convolutional filters throughout.
    • Emphasizes uniform architecture with a consistent use of max pooling layers.
    • Achieved high accuracy on ImageNet, showcasing the importance of depth in CNNs.
    • Serves as a backbone for many transfer learning applications.
  4. GoogLeNet (Inception)

    • Developed by Google in 2014, it introduced the Inception module for efficient computation.
    • Features a 22-layer deep architecture with a focus on multi-scale feature extraction.
    • Uses global average pooling instead of fully connected layers to reduce parameters.
    • Incorporates auxiliary classifiers to improve gradient flow during training.
    • Achieved state-of-the-art performance on ImageNet while being computationally efficient.
  5. ResNet

    • Introduced by Kaiming He et al. in 2015, it addresses the vanishing gradient problem.
    • Features skip connections (residual connections) that allow gradients to flow through the network.
    • Can be extremely deep, with architectures like ResNet-50, ResNet-101, and ResNet-152.
    • Demonstrated that deeper networks can achieve better performance without overfitting.
    • Widely used in various computer vision tasks due to its robustness and accuracy.
  6. DenseNet

    • Proposed by Gao Huang et al. in 2017, it connects each layer to every other layer in a feed-forward manner.
    • Each layer receives inputs from all preceding layers, promoting feature reuse.
    • Reduces the number of parameters while maintaining high accuracy.
    • Addresses the vanishing gradient problem effectively through dense connections.
    • Suitable for tasks requiring detailed feature extraction, such as segmentation.
  7. U-Net

    • Developed for biomedical image segmentation, particularly in 2015 by Olaf Ronneberger et al.
    • Features a symmetric encoder-decoder architecture with skip connections.
    • The encoder captures context while the decoder enables precise localization.
    • Highly effective for tasks with limited training data due to its data augmentation techniques.
    • Widely adopted in medical imaging and other segmentation tasks.
  8. YOLO (You Only Look Once)

    • Introduced by Joseph Redmon et al. in 2016, it revolutionized real-time object detection.
    • Treats object detection as a single regression problem, predicting bounding boxes and class probabilities simultaneously.
    • Achieves high speed and accuracy, making it suitable for real-time applications.
    • Utilizes a single neural network to predict multiple bounding boxes and class scores from full images.
    • Continues to evolve with versions like YOLOv3 and YOLOv4, improving performance and efficiency.
  9. Faster R-CNN

    • Developed by Shaoqing Ren et al. in 2015, it improves upon previous R-CNN models for object detection.
    • Introduces a Region Proposal Network (RPN) to generate high-quality region proposals.
    • Combines the RPN with Fast R-CNN for end-to-end training, enhancing speed and accuracy.
    • Achieves state-of-the-art results on various object detection benchmarks.
    • Widely used in applications requiring precise object localization and classification.
  10. MobileNet

    • Introduced by Google in 2017, designed for mobile and edge devices with limited computational resources.
    • Utilizes depthwise separable convolutions to reduce the number of parameters and computations.
    • Balances accuracy and efficiency, making it suitable for real-time applications on mobile devices.
    • Supports various model sizes and configurations, allowing for flexibility based on resource availability.
    • Popular in applications like image classification and object detection on mobile platforms.