Processing math: 100%
Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Skip to content

Bivariate Normal Distribution – derivation of the covariance and correlation by integration of the probability density

In a previous post of this blog we have derived the functional form of a bivariate normal distribution [BND] of a two 1-dimensional random variables X and Y). By rewriting the probability density function [pdf] in terms of vectors (x, y)T and a matrix Σ-1 we recognized that a coefficient appearing in a central exponential of the pdf could be identified as the so called correlation coefficient describing a linear coupling of our two random variables X and Y. In this post I shall show how one can derive the covariance and the correlation coefficient the hard way by directly integrating the BND’s probabilty density function.

A bit of motivation first

Just to motivate those of my readers a bit who may ask themselves what bivariate distributions have to do with Machine Learning. The following feature image shows data points forming a bivariate normal distribution whose pdf can be described by the type of function discussed in the preceding post and below.

The data points correspond to vectors in the multidimensional latent space of a Convolutional AutoEncoder [CAE]. Each vector represents an image of a face after having been encoded by the encoder part of the trained CAE. The vector’s endpoints were projected onto a 2-dim coordinate plane of the basic Euclidean Coordinate System [ECS] spanning the latent space. We see that the contours of the resulting (projected) two 2-dimensional probability density are concentric ellipses whose main axes all point in the same directions. The axes of the ellipses obviously are rotated against the ECS axes. In future posts of this blog we will proof that the rotation angle of such ellipses directly depend on the correlation coefficient between underlying 1-dimensional distributions. The math of bivariate and multivariate distributions is also of further importance in ML-applications applied to data of industrial processes.

Formal definition of the covariance of two random variables X and Y

Let us abbreviate the expectation values of the random variables X and Y as E[X]) and E[Y], respectively. Then the formal definition of the covariance of X and Y is:

Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]

XY is a combined two-dimensional distribution, which is defined by a probability density function g(x,y) resulting from a probability gx(x) of X assuming x and a conditional probability cy(y|x) of Y assuming y under the condition of X=x.

g(x,y)=cy(y|x)gx(x)=cx(x|y)gy(y)

The second part giving rise to symmetry arguments. Then in general we have

E[XY]=xyg(x,y)dxdy.

The correlation coefficient ρ for the two random distributions is defined by looking at the standardized distributions :

Xn=XE[X]σx,Xn=XE[X]σxρX,Y=Cov(Xn,Yn)

which after some elementary consideration gives us

ρX,Y=Cov(X,Y)Var[X]Var[Y]=Cov(X,Y)σx σy

To simplify the following steps we choose a 2-dimensional Euclidean coordinate system [ECS] centered such that E[X] = 0 and E[Y] = 0.

centered ECS :E[X]=E[Y]=0Cov(X,Y)=E[XY]

In the case of a bivariate normal distribution we call the respective probability function [pdf] g2(x,y) Using this pdf we find that we have to solve the following integral:

Cov(X,Y)=E[XY]=xyg2(x,y)dxdy

Useful integral formulas

We need formulas for two integrals :

yexp(A(yC)2)dy=CπA
y2exp(ay2)dy=12πa3

Performing the integration

Assumptions on the marginal distributions X and Y (having variances σx2 and σy2) of a BND and the application of symmetry arguments plus normalization conditions have revealed the general form of g2(x,y). See the preceding post in this blog. In our centered ECS the pdf of the BND is given by

g2(x,y)=12πσxσy11ρ2exp(1211ρ2[x2σ2x2ρxσxyσy+y2σ2y])

ρ is a parameter and constant in this formula. We know already that it might have to to with the correlation of the random variables X and Y. To prove this analytically we now perform the integration to get an analytical expression for the covariance.

Cov(X,Y)=12πσxσy11ρ2xyexp(1211ρ2[x2σ2x2ρxσxyσy+y2σ2y])dxdy

The trick to solve the integral over y is to complete the expression in the exponential such that we get a full square. First we redefine some variables

yσ=yσy,xσ=xσx,Cx=ρxσ,A=1211ρ2

The term in the exponential thus can be rewritten as

1211ρ2[x2σ2x2ρxσxyσy+y2σ2y]=(yσ Cx)2+(1ρ2)x2σ

This gives us

Cov(X,Y)=12πσxσy1ρ2xσexp(A(1ρ2)x2σ[yσexp(A(yσCx)2)dyσ]dxσ

We solve the inner integral with the help of (5) to get:

Cov(X,Y)=12πσxσyρx2σexp(12x2σ)dxσ

By application of (6) we arrive at

Cov(X,Y)=σxσyρ

This is the result we have hoped for. We obviously can identify the parameter ρ appearing in the probability density function of a bivariate normal distribution as the correlation coefficient between two underlying 1-dimensional normal distributions.

In the preceding post we have seen that ρ appears in the matrix Σ-1 defining the quadratic form in the exponential. With v = (x, y)T we have

centered ECS :g2(x,y)=12πσxσy11ρ2exp(12vvTΣΣ1vv)
ΣΣ=(σ2xρσxσyρσxσyσ2y),ΣΣΣΣ1=In

For standardized Gaussian distributions X and Y (with σx = σy = 1) we get a simple covariance matrix completely determined by ρ

ΣΣst=(1ρρ1),

Conclusion

One can directly derive the correlation coefficient of the Gaussian distributions underlying a Bivariate Normal Distribution by deriving the expectation value for the product xy from the BND’s probability density function. The integrals over x and y have analytical solutions.

In reverse we can say that a Bivariate Normal Distribution [BND] can be constructed via defining a correlation between two 1-dimensional normal distributions and using the inverse of the covariance matrix to define a quadratic form of x and y in an exponential, which after a normalization gives you the probability density function of the BND. We shall see in further posts that such a construction has a direct geometrical interpretation. This will later on also motivate the abstract definition of general multidimensional or Multivariate Normal Distributions and their probability density functions.

But in the next post on BNDs I will first look at some specific properties characterizing the hyper-curves (of constant probability density) and the expectation values of the BND’s conditional distributions.