Fiveable

🧐Deep Learning Systems Unit 3 Review

QR code for Deep Learning Systems practice questions

3.2 Loss functions for regression and classification tasks

3.2 Loss functions for regression and classification tasks

Written by the Fiveable Content Team • Last updated August 2025
Written by the Fiveable Content Team • Last updated August 2025
🧐Deep Learning Systems
Unit & Topic Study Guides

Loss functions are crucial in deep learning, measuring how well a model performs and guiding the learning process. They provide a scalar value to optimize, enabling backpropagation for weight updates. Good loss functions are differentiable, have a convex shape, and are sensitive to model improvements.

For regression, Mean Squared Error (MSE) and Mean Absolute Error (MAE) are common choices. Classification tasks often use Binary Cross-Entropy (BCE) for binary problems and Categorical Cross-Entropy (CCE) for multi-class scenarios. Selecting the right loss function depends on the problem type, data distribution, and desired model behavior.

Understanding Loss Functions

Concept of loss functions

  • Loss function quantifies model performance measuring difference between predicted and actual outputs
  • Guides learning process providing scalar value to optimize enabling backpropagation for weight updates
  • Good loss function characteristics: differentiable, convex/near-convex shape, sensitive to model improvements
  • Also called cost function or objective function (MSE, cross-entropy)

Loss functions for regression

  • Mean Squared Error (MSE) squares differences between predicted and actual values
    • Formula: MSE=1ni=1n(yiy^i)2MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2
    • Sensitive to outliers penalizing larger errors more heavily
  • Mean Absolute Error (MAE) uses absolute differences between predicted and actual values
    • Formula: MAE=1ni=1nyiy^iMAE = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|
    • More robust to outliers than MSE providing consistent scale with original output
  • Huber Loss combines MSE and MAE
    • Less sensitive to outliers than MSE more sensitive to small errors than MAE
    • Balances between MSE and MAE characteristics (house price prediction, stock market forecasting)
Concept of loss functions, Introduction to the concept of Cross Entropy and its application — Pavan Mirla

Loss functions for classification

  • Binary Cross-Entropy (BCE) used for binary classification problems
    • Formula: BCE=1ni=1n[yilog(y^i)+(1yi)log(1y^i)]BCE = -\frac{1}{n} \sum_{i=1}^n [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)]
    • Measures dissimilarity between true and predicted probability distributions
    • Outputs value between 0 and 1 (spam detection, sentiment analysis)
  • Categorical Cross-Entropy (CCE) used for multi-class classification problems
    • Formula: CCE=i=1nj=1myijlog(y^ij)CCE = -\sum_{i=1}^n \sum_{j=1}^m y_{ij} \log(\hat{y}_{ij})
    • Generalizes BCE for multiple classes often used with softmax activation in output layer
    • Suitable for image classification handwritten digit recognition
  • Focal Loss variant of cross-entropy addressing class imbalance
    • Downweights loss contribution from easy examples
    • Useful in object detection tasks with many background instances

Selection of appropriate loss functions

  • Regression problems:
    1. Choose MSE for general cases
    2. Use MAE when dealing with outliers or need for interpretability
    3. Apply Huber loss for balance between MSE and MAE
  • Binary classification: BCE with sigmoid activation in output layer or Hinge loss for support vector machines
  • Multi-class classification: CCE with softmax activation or Sparse categorical cross-entropy for integer labels
  • Multi-label classification: BCE applied to each label independently
  • Consider problem nature distribution of target variable presence of outliers/class imbalance computational efficiency and result interpretability when selecting loss function