Learning rate schedules and warm-up strategies | Deep Learning Systems Class Notes