Skip to content

convergence

AdamW for a ResNet56v2 – II – Adam with weight decay vs. AdamW, linear LR-schedule and L2-regularization

This series is about a ResNetv56v2 tested on the CIFAR10 dataset. In the last post AdamW for a ResNet56v2 – I – a detailed look at results based on the Adam optimizer we investigated a piecewise constant reduction schedule for the Learning Rate [LR] over 200 epochs. We found that we could reproduce results of R. Atienza, who had claimed validation accuracy values of with the Adam optimizer. We saw a dependency on the batch size [BS] – and concluded that BS=64 was a good… Read More »AdamW for a ResNet56v2 – II – Adam with weight decay vs. AdamW, linear LR-schedule and L2-regularization