AdamW for a ResNet56v2 – III – excursion: weight decay vs. L2 regularization in Adam and AdamW
A major topic of this post series is the investigation of methods to reduce the number of required training epochs for ResNets. In particular with respect to image analysis. Our test case is defined by a ResNet56v2 neural network trained on the CIFAR10 dataset. For intermediate results of numerical experiments see the first two posts During the last week I… Read More »AdamW for a ResNet56v2 – III – excursion: weight decay vs. L2 regularization in Adam and AdamW