Bivariate normal distribution – explicit reconstruction of a BVD random vector via Cholesky decomposition of the covariance matrix

In other posts of this blog I have discussed the general form of a Bivariate Normal Distribution [BVD] . For a centered Cartesian coordinate system [CCS] (see below), we have already seen the following:

A BVD can be understood as the result of a linear transformation of a two-dimensional random vector Z = (Z₁, Z₂)^T . The random variables Z₁ and Z₂ represent two independent normal Gaussian distributions. Z and the resulting distribution of concrete (z₁, z₂)^T vectors has a simple probability density function, namely the product of the Gaussian probability density functions for Z₁ and Z₂. See below.
In a centered Cartesian coordinate system [CCS], the result of the linear transformation of Z (via a (2×2)-matrix M) is a new random vector V = (X, Y)^T = M • Z. It has a typical BVD probability density function g_v(v) = g_v(x, y) that depends on the inverse of the covariance matrix Σ for the transformed random vector V. X and Y represent the marginal distributions of V.

In this post I will give you a recipe to explicitly construct two random variables X, Y of a BVD from 1-dimensional Gaussians Z₁, Z₂ with the help

of predefined values of the desired variances σ_x, σ_y of X and Y
and of a predefined value for the Pearson correlation coefficient ρ of X and Y.

To achieve this objective we will use the so called Cholesky decomposition of the covariance matrix Σ.

Related posts:

Hints regarding symbols: A bullet in the equations above indicates the standard matrix multiplication. A superscript T symbolizes the transposition operation for a matrix.

Reminder 1: BVD and its probability density function as the result of a linear transformation

We only regard centered distributions below. I.e., we choose a CCS such that each of the constituting component random variables has an expectation value of zero. This is no major restriction. The transition to a non-centered distribution is trivial.

We get a (centered) BVD by applying a linear transformation (2×2-matrix M) to a random vector Z of two independent, centered and standardized Gaussian normal distributions Z₁ and Z₂ :

\[ \begin{align} \pmb{Z} \, &=\, \begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix} , \quad \pmb{V} \, =\, \begin{pmatrix} X \\ Y \end{pmatrix}, \\[10pt] \pmb{V} \, &= \, \operatorname{\pmb{M}} \bullet \, \pmb{Z}, \, \quad \pmb{Z} \,=\, \operatorname{\pmb{M}}^{-1} \bullet \pmb{V} \,, \\[10pt] {\partial \,\pmb{V} \over \partial \, \pmb{Z} }\,&=\, \operatorname{\pmb{M}} \,, \quad {\partial \,\pmb{Z} \over \partial \, \pmb{V} }\,=\, \operatorname{\pmb{M}}^{-1} \,, \quad \left| \operatorname{\pmb{M}} \right| \, \ne \, 0 \,. \end{align} \tag{1}\]

To avoid a degenerate target distribution, we assume that M is invertible. X and Y are marginal distributions of V.

The pdf g_z(z) of Z is

\[ \begin{align} g_z(\pmb{z}) \,&=\, {1 \over 2 \pi } \operatorname{exp} \left( – {1\over2} \, z_1^2 \,+\, z_2^2 \, \right) \\[10pt] &=\, {1 \over 2 \pi } \operatorname{exp} \left( – {1\over2} \, {\pmb{z}^{\operatorname{T}} \bullet \, \operatorname{\pmb{I}}_2} \bullet \pmb{z} \, \right) \,=\, {1 \over 2 \pi } \operatorname{exp} \left( – {1\over2} \, \pmb{z}^{\operatorname{T}} \pmb{z} \, \right) \,. \tag{8} \end{align} \]

Contours lines g_z(z) =const. are given by z • z^T = |z|² = const. With a symmetric matrix Σ

\[ \operatorname{ \pmb{\Sigma}} \:=\: \operatorname{\pmb{M}} \bullet \, \operatorname{\pmb{M}}^{\operatorname{T}} \,, \quad \operatorname{ \pmb{\Sigma}}^{-1} \:=\: \left[\operatorname{\pmb{M}}^{-1}\right]^{\operatorname{T}} \bullet \operatorname{\pmb{M}}^{-1} \,, \tag{2} \]

the probability density g_v(v) for a concrete vector v = (x, y)^T of the transformed (centered) random vector V becomes

\[ \begin{align} \pmb{V} \,&\sim\, \mathcal{N}_2 \left[ \pmb{0},\, \pmb{\Sigma} \right] \,=\, \mathcal{N}_2 \left[ \begin{pmatrix} \mu_x =0 \\ \mu_y=0\end{pmatrix}, \, \operatorname{\pmb{\Sigma}} \right] \, \Leftrightarrow \\[10pt] g_v ( \pmb{v} ) \, &=\, {1 \over 2\, \pi \, \left| \operatorname{\pmb{\Sigma}} \right|^{1/2} } * \operatorname{exp} \left[ \,-\, {1 \over 2} \, \left( \, \pmb{v}^{\operatorname{T}} \bullet \operatorname{\pmb{\Sigma}}^{-1} \bullet \pmb{v} \right)\, \right] \:=\: g_{2c}(x,\,y) \,. \end{align} \tag{3} \]

Which, obviously, is the pdf of a BVD (see e.g. this post). Due to M being invertible, Σ is invertible, too. Note that both Σ and Σ^-1 are symmetric. Σ is also positive definite. Furthermore:

\[ \left| \operatorname{\pmb{\Sigma}} \right| \:=\: \left| \operatorname{\pmb{M}} \right| * \left| \operatorname{\pmb{M}}^{\operatorname{T}} \right| \:=\: \left( \left| \operatorname{\pmb{M}} \right| \right)^2 \,. \tag{4} \]

I have already shown in other posts that Σ indeed is the covariance matrix of the random vector V. The coefficients of Σ can be written in terms of the variances and the covariance of the (marginal) distributions X and Y and their Pearson correlation coefficient ρ. Σ and its inverse Σ^-1 have the following matrix elements :

\[ \pmb{\Sigma}^{-1} \,=\, {1 \over \sigma_x^2\, \sigma_y^2\, \left( 1\,-\, \rho^2\right) } \, \begin{pmatrix} \sigma_y^2 &-\rho\, \sigma_x\sigma_y \\ -\rho\, \sigma_x\sigma_y & \sigma_x^2 \end{pmatrix}, \tag{5} \]

\[ \pmb{\Sigma} \,=\, \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix}, \quad \pmb{\Sigma} \bullet \pmb{\Sigma}^{-1} = \mathbf I_2 \,. \tag{6} \]

I₂ is just the 2-dim identity matrix. For ρ one can show (see here) that it really is related to the covariance cov(X, Y) of the underlying random variables of V:

\[ \rho \,=\, { \operatorname{cov} (X,Y) \over \sigma_x\,\sigma_y} \,. \tag{7} \]

Mixing and correlation of the original Gaussians

We started with two independent Gaussian distributions. However, the linear transformation mixes the original random variables. Therefore, the transformed components X, Y of the resulting random vector V are correlated. This correlation is expressed by the off-diagonal coefficients of the covariance matrix Σ.

So far, our approach was rather abstract. The matrix elements of M were not specified. What we would like to have is an explicit recipe how we can combine and correlate the original Z₁ and Z₂ Gaussians with the help of the correlation coefficient ρ and variances σ_x and σ_y to get the BVD’s marginals X and Y. Why would such an explicit transformation be useful? There are three reasons:

It would deepen our mathematical understanding of BVDs – and show us an explicit way of constructing a BVD from independent 1-dim Gaussians by simple analytic terms.
We would better understand the relation between the linear transformation and the resulting correlation of X and Y (expressed by ρ).
If we could describe X and Y as linear combinations of the standardized Z₁, Z₂ with explicit parameters/factors, then we could later also find a parameterization of a BVD’s contour lines. Such contour lines result from the transformation of circles defined by z • z^T = const., whose generating vectors can be parameterized by some radius r = (z • z^T)^1/2 and an angle θ. We could then transform such z-vectors via the defined linear combination of their components. I will use this parameterization for contours of a BVD in a forthcoming post.

Different transformations to get to the same BVD distribution?

It is clear that we have to use the information contained in Σ – and somehow revert the linear transformation in terms of ρ, σ_x, σ_y. What could help us is a decomposition of the symmetric matrix Σ into two matrices (being the transposed of each other). Note that such a decomposition is not unique.

One could e.g. use a standard eigendecomposition. We will see in further posts that an eigendecomposition of Σ leads to a very simple geometric interpretation of how to create a BVD. However, regarding an explicit reconstruction recipe for X, Y in linear terms of Z₁, Z₂ (with parameters ρ, σ_x, σ_y), we get a convincing simple result only for the so called Cholesky decomposition.

But, wait a minute? Why can a different decomposition of Σ than Σ = M • M^T help us at all to recreate a certain defined distribution for a random vector? Are there different ways to create one and the same distribution for a random vector? Looking at g_v(v) (eq. 5), we indeed see that its pfd depends on the inverse of Σ, only! I.e., as long as we find an invertible matrix B such that

\[ \operatorname{\pmb{\Sigma}} \,=\, \operatorname{\pmb{B}} \bullet \, \operatorname{\pmb{B}}^{\operatorname{T}} \,, \tag{8} \]

we could apply B to Z – and generate one and the same probability density function for the target random vector.

Where does this ambiguity come from?

By our linear transformation we map a probability density value for a vector z into another pdf-value one for v = M•z. The reason for an ambiguity stems from the independence of the components of a centered Z and the simplicity of g_z(z) = g_z1(z₁) * g_z2(z₂). z-vectors which have the same length (z^T•z)^1/2 are all equivalent regarding the probability density value g_z(z). The end points of these vectors reside on circles. This gives us a degree of freedom: We can just rotate any initial z’-vector to become z (z = R•z‘). Such an initial rotation could be included in the linear transformation of the random vector – without changing the outcome: For a given z and v = Mz there is an equivalent z‘, such that v = Uz‘ (= M•R•z) – leading from g_z(z) = g_z(z‘) to the same value g_v(v). M and U will indeed generate the same g_v(v).

Cholesky decomposition of the covariance matrix Σ

For a symmetric, matrix Σ with real values the Cholesky decomposition is:

\[ \operatorname{\pmb{\Sigma}} \,=\, \operatorname{\pmb{A}}^{\operatorname{T}} \bullet \operatorname{\pmb{A}} \,. \tag{9} \]

A is an upper triangular matrix with real values. Note: In a Cholesky decomposition the first matrix always is a lower triangular matrix. So, with U = A^T, we can alternatively write:

\[ \operatorname{\pmb{\Sigma}} \,=\, \operatorname{\pmb{U}} \bullet \, \operatorname{\pmb{U}}^{\operatorname{T}} \,. \tag{10} \]

U is a lower triangular matrix. As our Σ is positive definite, the Cholesky decomposition into triangular matrices is even unique – and U‘s diagonal contains positive values, only. Note that the inverse (if it exists) of an upper/lower triangular matrix is again a upper/lower triangular matrix.

But note also: In general, U is not equal to M, as we have nowhere requested that M should be triangular! But, as we have seen above, an application of the specific matrix U to random vector Z will still create the same pdf for the resulting BVD.

Elements of the matrices resulting from the Cholesky decomposition of Σ

The point, which makes life easy now, is that we can determine the matrix elements u₁, u₂, u₃ of matrix U. The relevant condition is:

\[ \operatorname{\pmb{U}} \bullet \, \operatorname{\pmb{U}}^{\operatorname{T}} \:=\: \begin{pmatrix} u_1 & 0 \\ u_2 & u_3 \end{pmatrix} \bullet \, \begin{pmatrix} u_1 & u_2 \\ 0 & u_3 \end{pmatrix} \:=\: \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix} \,. \tag{11} \]

The resulting equation system gives us (after some rearrangements):

\[ \begin{align} u_1 \,&=\, \sigma_x\,, \\[8pt] \quad u_2 \,&=\, \rho * \sigma_y\,, \\[8pt] \quad u_3 \,&=\, \sigma_y * \left( \, 1 \,-\, \rho^2\, \right)^{1/2} \,. \end{align} \tag{12} \]

Reconstruction of the bivariate random vector V

The Cholesky decomposition tells us that we should get a centered V (with centered marginals X and Y) as follows:

\[ \pmb{V} \:=\: \begin{pmatrix} X \\ Y \end{pmatrix} \:=\: \pmb{\operatorname{U}} \bullet \, \begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix} \,, \tag{13} \]

\[ \begin{align} X \:&=\: \sigma_x \, Z_1 \,, \\[10pt] Y \:&=\: \sigma_y \, \left[ \, \rho \, Z_1 \,+ \, \left( 1 \,-\, \rho^2\right)^{1/2} \, Z_2 \, \right] \,. \end{align} \tag{14} \]

This is a simple formula, but it shows us how we must use the Pearson correlation coefficient ρ to get the right correlation of the marginals X, Y of our BVD. With some simple algebra we can also invert these relations:

\[ \begin{align} Z_1 \:&=\: {1 \over \sigma_x } \, X \,, \\[10pt] Z_2 \:&=\: {1 \over \sqrt{ \, 1\,-\,\rho^2 \, }} \, \left[\, {1 \over \sigma_y}\, Y \,-\, \rho \, {1 \over \sigma_x}\, X \, \right] \,. \tag{15} \end{align} \]

For concrete vectors we can write these relations as reverse functions of the vector components:

\[ \begin{align} z_1 \,& =\, f_{z1} (x,y) \,=\, { 1\over \sigma_x} \, x \,, \\[10pt] z_2 \,&=\, f_{z2} (x,y) \,=\, {1 \over \sqrt{ \, 1\,-\,\rho^2 \, }} \, \left[\, {1 \over \sigma_y}\, y \,-\, \rho \, {1 \over \sigma_x}\, x \, \right] \,. \end{align} \tag{16}\]

The determinant of the Jacobian matrix of this back-transformation is:

\[ \left|J_{f} \right| \:=\: \ \left|\, \begin{pmatrix} \partial f_{z1}/ \partial x & \partial f_{z1}/ \partial y \\ \partial f_{z2}/ \partial x & \partial f_{z2}/ \partial y \end{pmatrix} \right| \:=\: { 1 \over \sigma_x \, \sigma_y \, \left(\, 1 \,-\, \rho^2\, \right)^{1/2}} \,. \tag{17} \]

I leave it to the reader to prove that by following the steps in my last post one can prove that

\[ \begin{align} g_v(\pmb{v}) \, &=\, g_{2c}(x, \, y) \:=\: g_z(\pmb{z}) * \left|J_f\right| \:=\: g_z(z_1, \, z_2) * \left|J_f\right| \\[10pt] &= \: { 1 \over \sigma_x \, \sigma_y \, \left(\, 1 \,-\, \rho^2\, \right)^{1/2}} * {1\over 2\, \pi} \, \operatorname{exp} \left( – {1\over2} \, z_1^2 \,+\, z_2^2 \, \right) \,, \end{align} \tag{18} \]

indeed will give us the right terms of the exponent of g_v(v) after a replacement of z₁, z₂ by the terms of eqs. (16).

For the sake of completeness, I write down the non-centered version with expectation values μ_x and μ_y for the marginals :

\[ \begin{align} X \:&=\: \sigma_x \, Z_1 \,+\, \mu_x \,, \\[10pt] Y \:&=\: \sigma_y \, \left[ \, \rho \, Z_1 \,+ \, \left( 1 \,-\, \rho^2\right)^{1/2} \, Z_2 \, \right] \,+\, \mu_y \,. \tag{19} \end{align} \]

One can show that the marginals X and Y really are normal distributions with the expected variances. Below, I use a somewhat sloppy notation and directly replace Z₁, Z₂ by their normal distributions:

\[ \begin{align} X \,&=\, \sigma_x Z_1 + \mu_x \:=\: \sigma_x \, \mathcal{N}_1 \left[ 0,\, 1 \right] + \mu_x \,= \, \mathcal{N}_1 \left[ \mu_x, \, \sigma_x^2 \right] \,, \tag{20}\\[10pt] Y \,&=\, \sigma_y \, \left[ \, \rho \, Z_1 \,+ \, \left( 1 \,-\, \rho^2\right)^{1/2} \, Z_2 \, \right] \,+\, \mu_y \, \\[10pt] &=\, \sigma_y \, \left[ \, \rho \, \mathcal{N}_1 \left[ 0,\, 1 \right] \,+\, \left( 1 \,-\, \rho^2\right)^{1/2} \, \mathcal{N}_1 \left[0,\, 1 \right] \, \right] \,+\, \mu_y \, \\[10pt] &=\, \sigma_y \, \left[ \, \mathcal{N}_1 \left[ 0,\, \rho^2 \right] \,+\, \mathcal{N}_1 \left[ 0,\, \left( 1 \,-\, \rho^2\right) \right] \, \right] \,+\, \mu_y \, \\[10pt] &=\, \sigma_y \, \mathcal{N}_1 \left[ 0,\, 1 \right] \,+\, \mu_y \:=\: \mathcal{N}_1 \left[ \mu_y,\, \sigma_y^2 \right] \,. \tag{21} \end{align}\]

This shows the consistency of our argumentation.

Cholesky decomposition and Multivariate Normal Distributions

Without proof, let me just add the following fact: The (re-) construction of a multivariate normal distribution [MVD] in multiple dimensions k can also be achieved by a lower triangle matrix U_k = Chol(Σ_k) resulting from a Cholesky decomposition of a multidimensional covariance matrix Σ_k :

\[ \mathcal{N}_k \left[ \pmb{0},\, \pmb{\Sigma}_k \right] \,\sim\, \pmb{V}_k\:=\: \pmb{\mu}_k \,+\, \operatorname{\pmb{Chol}}\left(\, \operatorname{\pmb{\Sigma}}_k \,\right) \bullet \pmb{Z}_k \,=\, \pmb{\mu}_k \,+\, \operatorname{\pmb{U}}_k \bullet \pmb{Z}_k \,. \tag{20} \]

Z_k = ( Z₁, Z₂, …Z_k) is a k-dimensional random vector of k independent Gaussians. μ_k is the expectation value of the multivariate normal random vector V_k.

Conclusion

In this post we have found an explicit, lower traingular (2×2) matrix U to transform a 2-dimensional random vector Z = (Z₁, Z₂)^T (with two independent Gaussians) into a random vector V =(X, Y)^T = UZ, such that V shows the probability density of a Bivariate Normal Distribution with a (variance-) covariance matrix Σ.

The coefficients of U were derived from a Cholesky decomposition of the covariance matrix Σ – and could be descroibed in terms of the variances σ_x and σ_y of the BVD’s marginal distributions and their Pearson correlation coefficient ρ.

In forthcoming posts I will use this transformation matrix to parameterize the contour lines of the BVD’s probability density.
Stay tuned …