For a better understanding of ML experiments regarding a generator of human faces based on a convolutional autoencoder we need an understanding of multivariate and bivariate normal distributions and their probability densities.
This post is about the probability density function [pdf] of a bivariate normal distribution of two correlated Gaussian random variables X and Y. Most derivations of the mathematical form of this two-dimensional function start from a general definition of a random vector composed of independent 1-dimensional Gaussian (= normal) distributions and applying a linear transformation onto such a vector. With this post, I want to motivate the general functional form of the probability density in a different way, namely by symmetry arguments and a factorization approach.
In a later post I will derive the general form by applying linear operation on the random vector (X, Y)T of two independent Gaussian distributions.
Assumption – the marginal distributions are normalized 1-dimensional normal distributions
Let us name the function for the probability density of a bivariate normal distribution g2(x, y) – and of a centered one g2c(x, y). x and y are concrete values which the components X and Y of a random vector (X,Y)T may assume. We want to derive the form of g2c(x, y) and g2(x, y). X and Y may be correlated.
Centered distributions: We work in a 2-dimensional Cartesian coordinate system [CCS]. Without loosing much of a generalization we center the density distributions such that their mean values of the density distributions for are zero:
These assumption can at this point be regarded as conditions imposed on g2c(x, y). The motivation is 2-fold:
We want our bivariate normal distribution to be composed of correlated 1-dimensional (= univariate) normal Gaussian distributions. At some place we have to put respective conditions onto g2c(x, y). We take it that the marginals represent the basic Gaussians X and Y. Note, however, that we do not assume that X and Y are independent. How the correlation can be expressed will be investigated in further posts. See e.g here and here.
For the case of independent X and Y we need to reproduce g2c(x, y) = gxc(x) * gyc(y). See below.
To be on the safe side, we will later check that the imposed conditions will be reproduced by our final combined pdf g2c(x, y).
Side remark: In other posts in this blog we will later show that it is a general property of general (regular) Multivariate Normal (= Gaussian) distributions [MVDs] that all of its marginals are Multivariate Normal distributions for a lower number of dimensions. In the extreme case of 1-dimensional marginal of a general MVD we arrive at a simple Gaussian probability density function.
Equ.s (1) indicate already some symmetry of g2c(x, y) regarding the dependencies on x and y.
Probability density of conditional distributions
We can look at the whole thing from the perspective of conditional probabilities. Let us denote the conditional probability density for Y taking a value y under the condition that X has a value x as cy(y|x). For the conditional probability for X becoming x under the condition Y = y value we analogously write cx(x|y).
Due to the meaning of conditional probabilities, we have:
Due to symmetry reasons we could already now assume a symmetry of f(x,y) in the sense that f(x, y) = f(y, x). But, let us wait a bit until we get further indications from other relations.
Why does the factorization in (6) make sense? Well, in case of independent distributions X and Y we would have to fulfill
So, if we can make f(x,y) dependent on the correlation ρ or equivalently the covariance cov(X,Y) such that it gets 1 for ρ = 0, we would be able to reproduce independence in a simple manner.
Guessing the form of f(x, y) from further conditions
The marginal distributions must fulfill (1) and should in addition result from (6) by integration:
This means nothing else than that the density function cy(y|x) of the conditional distribution Y|X must be normalized, too. The same holds for X|Y and cx(x|y) :
How could we make this to become true? Well, if the conditional distributions were shifted Gaussians themselves, we could get this to work. The reason is the following:
If we could bring e.g. cy(y|x) into a fully quadratic form like
Note that σxy and σyx must be constants – independent of the respective x and y values!
Our approach to fulfill normalization means that f(x, y) must provide fitting terms to complete the terms in the exponents to a fully quadratic term. What does this in turn mean for our yet unknown function f(x, y)?
f(x, y) must provide squares in x and y as well as some term containing x*y in the exponent. We also must get some symmetry in f(x, y) regarding x and y. Taking all into account we try the most simple approach, namely restraining us to quadratic terms only:
Vector form and relation to the inverse of the variance-covariance matrix
For those of my readers who are used to vector distributions and respective matrix operations let us define a random vector V containing X and Y and further vectors :
\[ \pmb{V} = \begin{pmatrix} X \\ Y \end{pmatrix}, \quad \mbox{concrete values}: \, \pmb{v} = \begin{pmatrix} x \\ y \end{pmatrix}, \quad \pmb{\mu} = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad \pmb{v}_{\mu} = \begin{pmatrix} x – \mu_x \\ y – \mu_y \,. \end{pmatrix}
\]
T symbolizes the transposition operation. The reader maybe recognizes the variance-covariance matrix as the inverse of Σ-1 and that ρ actually is the correlation coefficient coupling X and Y :
As I am a bit lazy, let us perform the integration for the special case σx = σy = 1. The essential steps for the integration become obvious also for this simplification. I leave it to the reader to perform the integration for the general case. The trick is a kind of substitution similar to the one which we ave performed already for the conditional probability densities. We rewrite the exponent as follows:
By some simple reasoning we have guessed the functional form of a bivariate normal distribution. We have made the assumption that the marginal distributions of a bivariate normal distribution should be one-dimensional normal distributions whose probability density functions are described by normalized Gaussians.
By looking at conditional probabilities we found that a normalization of the respective probability densities could be achieved by symmetry arguments and a factorization approach. This lead us to the assumption that the conditional distributions could be normal Gaussian distributions themselves.
We shall have a look at properties of the bivariate normal distribution, its marginals and conditional sub-distributions in later posts.