Skip to content

eremo

Probability density function of a Bivariate Normal Distribution – derived from assumptions on marginal distributions and functional factorization

For a better understanding of ML experiments regarding a generator of human faces based on a convolutional autoencoder we need an understanding of multivariate and bivariate normal distributions and their probability densities. This post is about the probability density function of a bivariate normal distribution depending on two correlated random variables X and Y. Most derivations of the mathematical form… Read More »Probability density function of a Bivariate Normal Distribution – derived from assumptions on marginal distributions and functional factorization

AdamW for a ResNet56v2 – IV – better accuracy and shorter training by pure weight decay and large scale fluctuations of the validation loss

Among other thins this post series is about efforts to reduce the number of training epochs for ResNets. We test our ideas with a ResNet applied to CIFAR10. So far we have tried out rather simple methods as modifying the schedule for the learning rate [LR]. In this post I describe experiments regarding a model using the AdamW optimizer, without… Read More »AdamW for a ResNet56v2 – IV – better accuracy and shorter training by pure weight decay and large scale fluctuations of the validation loss

AdamW for a ResNet56v2 – III – excursion: weight decay vs. L2 regularization in Adam and AdamW

A major topic of this post series is the investigation of methods to reduce the number of required training epochs for ResNets. In particular with respect to image analysis. Our test case is defined by a ResNet56v2 neural network trained on the CIFAR10 dataset. For intermediate results of numerical experiments see the first two posts During the last week I… Read More »AdamW for a ResNet56v2 – III – excursion: weight decay vs. L2 regularization in Adam and AdamW

AdamW for a ResNet56v2 – II – linear LR-schedules, Adam, L2-regularization, weight decay and a reduction of training epochs

This series is about a ResNetv56v2 tested on the CIFAR10 dataset. In the last post AdamW for a ResNet56v2 – I – a detailed look at results based on the Adam optimizer we investigated a piecewise constant reduction schedule for the Learning Rate [LR] over 200 epochs. We found that we could reproduce results of R. Atienza, who had claimed… Read More »AdamW for a ResNet56v2 – II – linear LR-schedules, Adam, L2-regularization, weight decay and a reduction of training epochs