Gradient descent algorithms and learning rate scheduling | Deep Learning Systems Class Notes