In other posts of this blog I have discussed the general form of a Bivariate Normal Distribution [BVD] . For a centered Cartesian coordinate system [CCS] (see below), we have already seen the following:
- A BVD can be understood as the result of a linear transformation of a two-dimensional random vector Z = (Z1, Z2)T . The random variables Z1 and Z2 represent two independent normal Gaussian distributions. Z and the resulting distribution of concrete (z1, z2)T vectors has a simple probability density function, namely the product of the Gaussian probability density functions for Z1 and Z2. See below.
- In a centered Cartesian coordinate system [CCS], the result of the linear transformation of Z (via a (2×2)-matrix M) is a new random vector V = (X, Y)T = M • Z. It has a typical BVD probability density function gv(v) = gv(x, y) that depends on the inverse of the covariance matrix Σ for the transformed random vector V. X and Y represent the marginal distributions of V.
In this post I will give you a recipe to explicitly construct two random variables X, Y of a BVD from 1-dimensional Gaussians Z1, Z2 with the help
- of predefined values of the desired variances σx, σy of X and Y
- and of a predefined value for the Pearson correlation coefficient ρ of X and Y.
To achieve this objective we will use the so called Cholesky decomposition of the covariance matrix Σ.
Related posts:
- Bivariate normal distribution – derivation by linear transformation of a random vector for two independent Gaussians
- Bivariate Normal Distribution – derivation of the covariance and correlation by integration of the probability density
- Probability density function of a Bivariate Normal Distribution – derived from assumptions on marginal distributions and functional factorization
Hints regarding symbols: A bullet in the equations above indicates the standard matrix multiplication. A superscript T symbolizes the transposition operation for a matrix.
Reminder 1: BVD and its probability density function as the result of a linear transformation
We only regard centered distributions below. I.e., we choose a CCS such that each of the constituting component random variables has an expectation value of zero. This is no major restriction. The transition to a non-centered distribution is trivial.
We get a (centered) BVD by applying a linear transformation (2×2-matrix M) to a random vector Z of two independent, centered and standardized Gaussian normal distributions Z1 and Z2 :
To avoid a degenerate target distribution, we assume that M is invertible. X and Y are marginal distributions of V.
The pdf gz(z) of Z is
Contours lines gz(z) =const. are given by z • zT = |z|2 = const. With a symmetric matrix Σ
the probability density gv(v) for a concrete vector v = (x, y)T of the transformed (centered) random vector V becomes
Which, obviously, is the pdf of a BVD (see e.g. this post). Due to M being invertible, Σ is invertible, too. Note that both Σ and Σ-1 are symmetric. Σ is also positive definite. Furthermore:
I have already shown in other posts that Σ indeed is the covariance matrix of the random vector V. The coefficients of Σ can be written in terms of the variances and the covariance of the (marginal) distributions X and Y and their Pearson correlation coefficient ρ. Σ and its inverse Σ-1 have the following matrix elements :
I2 is just the 2-dim identity matrix. For ρ one can show (see here) that it really is related to the covariance cov(X, Y) of the underlying random variables of V:
Mixing and correlation of the original Gaussians
We started with two independent Gaussian distributions. However, the linear transformation mixes the original random variables. Therefore, the transformed components X, Y of the resulting random vector V are correlated. This correlation is expressed by the off-diagonal coefficients of the covariance matrix Σ.
So far, our approach was rather abstract. The matrix elements of M were not specified. What we would like to have is an explicit recipe how we can combine and correlate the original Z1 and Z2 Gaussians with the help of the correlation coefficient ρ and variances σx and σy to get the BVD’s marginals X and Y. Why would such an explicit transformation be useful? There are three reasons:
- It would deepen our mathematical understanding of BVDs – and show us an explicit way of constructing a BVD from independent 1-dim Gaussians by simple analytic terms.
- We would better understand the relation between the linear transformation and the resulting correlation of X and Y (expressed by ρ).
- If we could describe X and Y as linear combinations of the standardized Z1, Z2 with explicit parameters/factors, then we could later also find a parameterization of a BVD’s contour lines. Such contour lines result from the transformation of circles defined by z • zT = const., whose generating vectors can be parameterized by some radius r = (z • zT)1/2 and an angle θ. We could then transform such z-vectors via the defined linear combination of their components. I will use this parameterization for contours of a BVD in a forthcoming post.
Different transformations to get to the same BVD distribution?
It is clear that we have to use the information contained in Σ – and somehow revert the linear transformation in terms of ρ, σx, σy. What could help us is a decomposition of the symmetric matrix Σ into two matrices (being the transposed of each other). Note that such a decomposition is not unique.
One could e.g. use a standard eigendecomposition. We will see in further posts that an eigendecomposition of Σ leads to a very simple geometric interpretation of how to create a BVD. However, regarding an explicit reconstruction recipe for X, Y in linear terms of Z1, Z2 (with parameters ρ, σx, σy), we get a convincing simple result only for the so called Cholesky decomposition.
But, wait a minute? Why can a different decomposition of Σ than Σ = M • MT help us at all to recreate a certain defined distribution for a random vector? Are there different ways to create one and the same distribution for a random vector? Looking at gv(v) (eq. 5), we indeed see that its pfd depends on the inverse of Σ, only! I.e., as long as we find an invertible matrix B such that
we could apply B to Z – and generate one and the same probability density function for the target random vector.
Where does this ambiguity come from?
By our linear transformation we map a probability density value for a vector z into another pdf-value one for v = M•z. The reason for an ambiguity stems from the independence of the components of a centered Z and the simplicity of gz(z) = gz1(z1) * gz2(z2). z-vectors which have the same length (zT•z)1/2 are all equivalent regarding the probability density value gz(z). The end points of these vectors reside on circles. This gives us a degree of freedom: We can just rotate any initial z’-vector to become z (z = R•z‘). Such an initial rotation could be included in the linear transformation of the random vector – without changing the outcome: For a given z and v = Mz there is an equivalent z‘, such that v = Uz‘ (= M•R•z) – leading from gz(z) = gz(z‘) to the same value gv(v). M and U will indeed generate the same gv(v).
Cholesky decomposition of the covariance matrix Σ
For a symmetric, matrix Σ with real values the Cholesky decomposition is:
A is an upper triangular matrix with real values. Note: In a Cholesky decomposition the first matrix always is a lower triangular matrix. So, with U = AT, we can alternatively write:
U is a lower triangular matrix. As our Σ is positive definite, the Cholesky decomposition into triangular matrices is even unique – and U‘s diagonal contains positive values, only. Note that the inverse (if it exists) of an upper/lower triangular matrix is again a upper/lower triangular matrix.
But note also: In general, U is not equal to M, as we have nowhere requested that M should be triangular! But, as we have seen above, an application of the specific matrix U to random vector Z will still create the same pdf for the resulting BVD.
Elements of the matrices resulting from the Cholesky decomposition of Σ
The point, which makes life easy now, is that we can determine the matrix elements u1, u2, u3 of matrix U. The relevant condition is:
The resulting equation system gives us (after some rearrangements):
Reconstruction of the bivariate random vector V
The Cholesky decomposition tells us that we should get a centered V (with centered marginals X and Y) as follows:
This is a simple formula, but it shows us how we must use the Pearson correlation coefficient ρ to get the right correlation of the marginals X, Y of our BVD. With some simple algebra we can also invert these relations:
For concrete vectors we can write these relations as reverse functions of the vector components:
The determinant of the Jacobian matrix of this back-transformation is:
I leave it to the reader to prove that by following the steps in my last post one can prove that
indeed will give us the right terms of the exponent of gv(v) after a replacement of z1, z2 by the terms of eqs. (16).
For the sake of completeness, I write down the non-centered version with expectation values μx and μy for the marginals :
One can show that the marginals X and Y really are normal distributions with the expected variances. Below, I use a somewhat sloppy notation and directly replace Z1, Z2 by their normal distributions:
This shows the consistency of our argumentation.
Cholesky decomposition and Multivariate Normal Distributions
Without proof, let me just add the following fact: The (re-) construction of a multivariate normal distribution [MVD] in multiple dimensions k can also be achieved by a lower triangle matrix Uk = Chol(Σk) resulting from a Cholesky decomposition of a multidimensional covariance matrix Σk :
Zk = ( Z1, Z2, …Zk) is a k-dimensional random vector of k independent Gaussians. μk is the expectation value of the multivariate normal random vector Vk.
Conclusion
In this post we have found an explicit, lower traingular (2×2) matrix U to transform a 2-dimensional random vector Z = (Z1, Z2)T (with two independent Gaussians) into a random vector V =(X, Y)T = UZ, such that V shows the probability density of a Bivariate Normal Distribution with a (variance-) covariance matrix Σ.
The coefficients of U were derived from a Cholesky decomposition of the covariance matrix Σ – and could be descroibed in terms of the variances σx and σy of the BVD’s marginal distributions and their Pearson correlation coefficient ρ.
In forthcoming posts I will use this transformation matrix to parameterize the contour lines of the BVD’s probability density.
Stay tuned …