For a better understanding of ML experiments regarding a generator of human faces based on a convolutional autoencoder we need an understanding of multivariate and bivariate normal distributions and their probability densities.
This post is about the probability density function of a bivariate normal distribution depending on two correlated random variables X and Y. Most derivations of the mathematical form of this function two-dimensional function start from a general definition including the variance-covariance matrix of two dimensional distributions and further assumptions. With this post I want to motivate the general functional form of the probability density by symmetry arguments and a factorization approach.
Assumption – the marginal distributions are normalized 1-dimensional normal distributions
Let us name the function for the probability density of the bivariate normal distribution g(x, y). x and y are concrete values X and Y may assume. We want to derive the form of g(x, y). The basic assumption we make is that the two marginal distributions are 1-dimensional normal distributions.
Equ.s (1) indicate already some symmetry of g(x, y) regarding the dependencies on x and y.
Probability density of conditional distributions
We can look at the whole thing from the perspective of conditional probabilities. Let us denote the conditional probability for Y taking a value y under the condition that X has a value x as cy(y|x).
For the conditional probability for X becoming x under the condition Y=y value we analogously write cx(x|y). Then we have:
Due to symmetry reasons we could already now assume a symmetry of f in the sense that f(x, y) = f(y, x). But we wait a bit until we get further indications from other relations.
Why does the factorization in (6) make sense? Well, in case of independent distributions X and Y we must fulfill
So, if we can make f(x,y) dependent on the correlation ρ or equivalently the covariance cov(X,Y) such that it gets 1 for ρ=0, we would be able to reproduce independence in a simple manner.
Guessing the form of f(x, y) from further conditions
The marginal distributions must fulfill (1) and should in addition result from (6) by integration:
This means nothing else than that the density function cy(y|x) of the conditional distributions Y|X must be normalized. The same holds for X|Y and cx(x|y) :
How could we get this to be true? Well, if the conditional distributions were (shifted) Gaussians themselves we could get this to work. The reason is that if we could bring e.g. cy(y|x) into a fully quadratic form like
Note that σxy and σyx must be constants – independent of the respective x and y values ! Our approach to fulfill normalization means that f(x, y) must provide fitting terms to complete the terms in the exponentials to a fully quadratic term. f(x, y) must therefore provide squares in x and y as well as some term containing x*y in an exponential. We also must get some symmetry in f(x, y) regarding x and y. Taking all into account we try a very simple approach keeping us to quadratic terms :
Vector form and relation to the inverse of the variance-covariance matrix
For those of my readers who are used to vector distributions and respective matrix representations we define a random vector V containing X and Y and further vectors :
\[ \pmb{V} = \begin{pmatrix} X \\ Y \end{pmatrix}, \quad \mbox{concrete values}: \, \pmb{v} = \begin{pmatrix} x \\ y \end{pmatrix}, \quad \pmb{\mu} = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad \pmb{v}_{\mu} = \begin{pmatrix} x – \mu_x \\ y – \mu_y \end{pmatrix}
\]
T symbolizes the transposition operation. The reader maybe recognizes the variance-covariance matrix as the inverse of Σ-1 and that ρ actually is the correlation coefficient coupling X and Y :
By some simple reasoning we have guessed the functional form of a bivariate normal distribution. We have made the assumption that the marginal distributions of a bivariate normal distribution should be one-dimensional normal distributions whose probability density functions are described by normalized Gaussians.
By looking at conditional probabilities we found that a normalization of the respective probability densities could be achieved by following symmetry arguments and a factorization. This lead us to the assumption that the conditional distributions could be normal Gaussian distributions themselves.
We shall have a look at properties of the bivariate normal distribution, its marginals and conditional sub-distributions in a later post.