Learning Rate schedule

AdamW for a ResNet56v2 – VI – Super-Convergence after improving the ResNetV2

by eremo
15 Aug 202417 Aug 2024
ResNets

In previous posts of this series I have shown that a Resnet56V2 with AdamW can converge to acceptable values of the validation accuracy for the CIFAR10 dataset – within less than 26 epochs. An optimal schedule of the learning rate [LR] and optimal values for the weight decay parameter [WD] were required. My network – a variation of the ResNetV2-structure… Read More »AdamW for a ResNet56v2 – VI – Super-Convergence after improving the ResNetV2

AdamW for a ResNet56v2 – IV – better accuracy and shorter training by pure weight decay and large scale fluctuations of the validation loss

by eremo
15 Jun 202414 Aug 2024
ResNets

Among other thins this post series is about efforts to reduce the number of training epochs for ResNets. We test our ideas with a ResNet applied to CIFAR10. So far we have tried out rather simple methods as modifying the schedule for the learning rate [LR]. In this post I describe experiments regarding a model using the AdamW optimizer, without… Read More »AdamW for a ResNet56v2 – IV – better accuracy and shorter training by pure weight decay and large scale fluctuations of the validation loss