Skip to content

eremo

Short ResNet training on CIFAR10 over 21 epochs

AdamW for a ResNet56v2 – V – weight decay and cosine shaped schedule of the learning rate

In this post series we try to find methods to reduce the number of epochs for the training of ResNets on image datasets. Our test case is CIFAR10. In this post we will test a modified cosine shaped schedule for a systematic and fast reduction of the learning rate LR. This supplements the approaches described in previous posts of this… Read More »AdamW for a ResNet56v2 – V – weight decay and cosine shaped schedule of the learning rate

Bivariate Normal Distribution from face data encoded by a CAE

Bivariate Normal Distribution – derivation of the covariance and correlation by integration of the probability density

In a previous post of this blog we have derived the functional form of a bivariate normal distribution [BND] of a two 1-dimensional random variables X and Y). By rewriting the probability density function [pdf] in terms of vectors (x, y)T and a matrix Σ-1 we recognized that a coefficient appearing in a central exponential of the pdf could be… Read More »Bivariate Normal Distribution – derivation of the covariance and correlation by integration of the probability density

Probability density function of a Bivariate Normal Distribution – derived from assumptions on marginal distributions and functional factorization

For a better understanding of ML experiments regarding a generator of human faces based on a convolutional autoencoder we need an understanding of multivariate and bivariate normal distributions and their probability densities. This post is about the probability density function of a bivariate normal distribution depending on two correlated random variables X and Y. Most derivations of the mathematical form… Read More »Probability density function of a Bivariate Normal Distribution – derived from assumptions on marginal distributions and functional factorization

AdamW for a ResNet56v2 – IV – better accuracy and shorter training by pure weight decay and large scale fluctuations of the validation loss

Among other thins this post series is about efforts to reduce the number of training epochs for ResNets. We test our ideas with a ResNet applied to CIFAR10. So far we have tried out rather simple methods as modifying the schedule for the learning rate [LR]. In this post I describe experiments regarding a model using the AdamW optimizer, without… Read More »AdamW for a ResNet56v2 – IV – better accuracy and shorter training by pure weight decay and large scale fluctuations of the validation loss