Bivariate normal distribution – derivation by linear transformation of a random vector of two independent Gaussians

In an another post on properties of a Bivariate Normal Distribution [BVD] I have motivated the form of its probability density function [pdf] by symmetry arguments and the underlying probability density functions of its marginals, namely 1-dimensional Gaussians. In this post we will derive the probability density function by following the line of argumentation for a general Multivariate Normal Distribution [MVD]. We regard the a BVD as the result of a linear transformation applied to a random vector of two independent 1-dimensional Gaussian random variables.

Related posts:

Reminder: 1-dimensional centered and standardized Gaussian distributions

Let us focus on two independent 1-dimensional normal distributions. Each has a probability density function g_j(w_j) given by a Gaussian:

\[ \begin{align} W_j \, &\sim\, \mathcal{N}_1 \left(\mu_j,\,\sigma_j^{2} \right) \,, \\[10pt] g_j(w_j, \, \mu_j, \, \sigma_j) \: &= \: {1 \over {\sigma_j * \sqrt{2\pi} } } \, {\large e}^{ – \, {\Large 1 \over \Large 2} \left( {\Large w_j \, – \, \Large \mu_j \over \Large \sigma_j} \right)^2 } \,. \end{align} \tag{1} \]

μ_j is the mean value and σ_j is the square root of the variance of the W_j distribution.

We use a 2-dimansional Cartesian coordinate system [CCS] to work with random vectors W = (W1, W2)^T composed of such distributions – and respective concrete vectors w. To make things simpler, we center the distributions by choosing an appropriate location of the origin of our CCS. We also standardize each of the distributions W_j. Then we get distributions Z_j and a respective random vector Z, which assumes concrete vectors z (i.e, Z = z) according to a pdf g_z(z):

\[ \begin{align} \mu_j \, &=\, 0\,, \quad \sigma_j \,=\, 1 \,, \quad Z_j \, \sim\, \mathcal{N}_1 \left( 0,\,1 \right) \,, \\[10pt] g_{z, j} \left( z_j \right) \: &= \: {1 \over \sqrt{2\, \pi \,} } \, \operatorname{exp} \left( \,-\, {1\over 2} \, z_j^2 \right) \,. \end{align} \tag{2} \]

\[ \pmb{Z} \,=\, \begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix}\,, \,\, \Rightarrow g_z \,=\, g_z(\pmb{z}), \quad \pmb{z} \,=\, (z_1, \, z_2)^T \,=\, \begin{pmatrix} z_1 \\ z_2 \end{pmatrix}\,. \tag{3} \]

Due to the independence (!) of the distributions the value of the combined probability density function for a vector z = (z₁, z₂)^T is given by :

\[ g_{\small Z}(\pmb{z}, \pmb{0}, \pmb{\operatorname{I}} ) \, = \, {1 \over (2\, \pi) } \, {\large e}^{ – \, {\Large 1 \over \Large 2} \left( {\Large \pmb{z}^T} \, \bullet \, {\Large \pmb{z}} \right) } \, = \, {1 \over (2\, \pi) } \, {\large e}^{ – \, {\Large 1 \over \Large 2} \left( {\Large \pmb{z}^T} \,\bullet\, {\Large \operatorname{\pmb{I}}} \,\bullet\, {\Large \pmb{z}} \right) } \,. \tag{4} \]

For the purpose of a later generalization I have introduced a coupling matrix. For our random vector Z, it is just the identity matrix I.

Reminder 2: Probabilty density function of a BVD

In another post we have already seen that we could write the probability density function g₂(x, y) function of a BVD in a characteristic vector notation for two correlated (Gaussian) random variables X and Y:

\[ \pmb{V} = \begin{pmatrix} X \\ Y \end{pmatrix}, \quad \mbox{concrete values}: \, \pmb{v} = \begin{pmatrix} x \\ y \end{pmatrix}, \quad \pmb{\mu} = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad \pmb{v}_{\mu} = \begin{pmatrix} x – \mu_x \\ y – \mu_y \,. \end{pmatrix} \tag{5} \]

\[ \mbox{centered CCS :} \quad g_{2c}(x, y) \,=\, {1 \over 2 \pi \, \sigma_x \, \sigma_y } {1 \over { \sqrt{\, 1\,-\, \rho^2}} } \operatorname{exp} \left( – {1\over2} \, {\pmb{v}}^T \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} \, \right) \,, \tag{6} \]

\[ \mbox{generell CCS :} \quad g_2(x, y) \,=\, {1 \over 2 \pi \, \sigma_x \, \sigma_y } {1 \over { \sqrt{\, 1\,-\, \rho^2}} } \operatorname{exp} \left( – {1\over2} \, {\pmb{v}_{\mu}}^T \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}_{\mu}} \, \right) \,. \tag{7} \]

g_2c(x, y) is the probability density in centered CCS, in which we have μ = 0. The coupling matrix in this case is the inverse of the so called variance-covariance matrix (or just covariance matrix), describing the correlation of X and Y:

\[ \pmb{\Sigma}^{-1} \,=\, {1 \over \sigma_x^2\, \sigma_y^2\, \left( 1\,-\, \rho^2\right) } \, \begin{pmatrix} \sigma_y^2 &-\rho\, \sigma_x\sigma_y \\ -\rho\, \sigma_x\sigma_y & \sigma_x^2 \end{pmatrix} \,. \tag{8} \]

\[ \pmb{\Sigma} \,=\, \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix}, \quad \pmb{\Sigma} \bullet \pmb{\Sigma}^{-1} = \mathbf I_n \,. \tag{9} \]

Note that this matrix is symmetric. In another post I have already shown that ρ indeed is the Pearson correlation coefficient of X and Y.

The notation often used to describe a two-dimensional normal distribution is

\[ \pmb{V} \,=\, \mathcal{N}_2 \left[ \pmb{v}_{\mu},\, \pmb{\Sigma} \right] \,=\, \mathcal{N}_2 \left[ \begin{pmatrix} \mu_x \\ \mu_y\end{pmatrix}, \, \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix} \right] \,. \tag{11} \]

Below I just regard centered distributions with zero expectation values.

From two independent standardized Gaussian distributions to a BVD

We work in a centered CCS. We assume that we have a random vector of two independent standardized Gaussian normal distributions Z₁ and Z₂, each with the standardized Gaussian probability densities defined above. We then apply a linear transformation via a 2×2-matrix M to get another centered random vector V_m :

\[ \begin{align} \pmb{Z} \, &=\, \begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix} , \quad \pmb{V}_m \, =\, \begin{pmatrix} X_m \\ Y_m \end{pmatrix}, \\[10pt] \pmb{V}_m \, &= \, \operatorname{\pmb{M}} \bullet \, \pmb{Z}, \, \quad \pmb{Z} \,=\, \operatorname{\pmb{M}}^{-1} \bullet \pmb{V}_m \,, \\[10pt] {\partial \,\pmb{V}_m \over \partial \, \pmb{Z} }\,&=\, \operatorname{\pmb{M}} \,, \quad {\partial \,\pmb{Z} \over \partial \, \pmb{V}_m }\,=\, \operatorname{\pmb{M}}^{-1} \,, \quad \left| \operatorname{\pmb{M}} \right| \, \ne \, 0 \,. \end{align} \tag{12}\]

We assume that M is invertible (to avoid cases of degradation). The bullet above indicates the standard matrix multiplication.

Regarding the probability density g_v(v_m) for a concrete vector v_m = (x, y)^T of the transformed random vector V_M we must take into account a change of volume elements by this transformation. I.e., we have to take into account the Jacobian determinant of the transformation:

\[ g_v ( \pmb{v}_m ) * dx\, dy \,=\, g_z \left( \operatorname{\pmb{M}}^{-1} \, \pmb{x} \right) * \left| \operatorname{\pmb{M}}^{-1} \right| * d z_1 \, dz_2 \,. \tag{13} \]

Hence

\[ \begin{align} g_v ( \pmb{v}_m ) \, &=\, {1 \over \left| \operatorname{\pmb{M}} \right| } * g_z \left( \operatorname{\pmb{M}}^{-1} \, \pmb{x} \right) \\[10pt] &=\, {1 \over 2\, \pi \, \left| \operatorname{\pmb{M}} \right| } * \operatorname{exp} \left( \,-\, {1 \over 2} * \left(\, \left[ \operatorname{\pmb{M}}^{-1} \pmb{x}\right]^T \bullet \operatorname{\pmb{M}}^{-1} \pmb{x} \, \right) \,\right) \\[10pt] &=\, {1 \over 2\, \pi \, \left| \operatorname{\pmb{M}} \right| } * \operatorname{exp} \left( \,-\, {1 \over 2} * \left( \, \pmb{x}^T \bullet \left[\, \left[ \operatorname{\pmb{M}}^{-1}\right]^T \bullet \operatorname{\pmb{M}}^{-1} \, \right] \bullet \pmb{x} \right)\, \right) \,. \\[10pt] \end{align} \tag{14} \]

With the help of our (2×2)-matrix M we define a new symmetric matrix Σ_m :

\[ \begin{align} \operatorname{ \pmb{\Sigma}}_m \:&=\: \operatorname{\pmb{M}} \bullet \, \operatorname{\pmb{M}}^T \,, \\[10pt] \operatorname{ \pmb{\Sigma}}_m^{-1} \:&=\: \left[\operatorname{\pmb{M}}^{-1}\right]^T \bullet \operatorname{\pmb{M}}^{-1} \,. \\[10pt] \end{align} \tag{15} \]

Due to M being invertible, Σ_m is invertible, too. Note that both Σ_m and (Σ_m)^-1 are symmetric. Furthermore:

\[ \left| \operatorname{\pmb{\Sigma}}_m \right| \:=\: \left| \operatorname{\pmb{M}} \right| * \left| \operatorname{\pmb{M}}^T \right| \:=\: \left( \left| \operatorname{\pmb{M}} \right| \right)^2 \,. \tag{16} \]

This leads to a very compact form of the probability density function for our transformed random vector

\[ \begin{align} g_v ( \pmb{v}_m ) \, &=\, {1 \over 2\, \pi \, \left| \operatorname{\pmb{\Sigma}} \right|^{1/2} } * \operatorname{exp} \left[ \,-\, {1 \over 2} \, \left( \, \pmb{v}_m^T \bullet \operatorname{\pmb{\Sigma}}_m^{-1} \bullet \pmb{v}_m \right)\, \right] \,. \end{align} \tag{17} \]

By comparison with the results in the above section “Reminder 2”, we see that we have already reached our desired form and that

we should identify V_m with V,
we should identify g_v(v_m) with g_2c(x, y),
we should identify Σ_m with the covariance matrix Σ .

With M being reversible, one can show that Σ_m indeed is a positive definite symmetric matrix. A thing that is still open is to prove that Σ_m really represents a covariance matrix of the transformed random vector.

Variance and respective matrix

Let us determine the variance of our transformed random vector V_m. We take a formal road according to general definitions. Two aspects are important:

(1) An expectation value of a random vector is defined via the expectation values of its components. Thus, the expectation vector of a random vector S is just the vector composed of the expectation values of its (marginal) component distributions:

\[ \pmb{\mu}\left(\pmb{S}\right) \: = \: \operatorname{\mathbb{E}}\left(\pmb{S} \right) \, = \, \left( \operatorname{\mathbb{E}}(S_1), \, \operatorname{\mathbb{E}}(S_2), …, \, \operatorname{\mathbb{E}}(S_n) \right)^T \, . \tag{18} \]

(2) The covariance of a random vector is a generalization of the standard covariance for two 1-dim distributions X and Y:

\[ \begin{align} \operatorname{cov}(X, Y) &= \operatorname{\mathbb{E}}\left[\,\left(\,X \,-\, \operatorname{\mathbb{E}}\left(X\right)\,\right) \, \left(\,Y \,-\, \operatorname{\mathbb{E}}\left(Y\right)\, \right)\,\right] \\ &= \operatorname{\mathbb{E}}\left(X Y\right) \,-\, \operatorname{\mathbb{E}}\left(X\right) \operatorname{\mathbb{E}}\left(Y\right) \,. \end{align} \tag{19} \]

The generalization almost naturally leads to a matrix of expectation values for all combinations of the random vector components:

\[ \begin{align} \operatorname{cov}\left(\pmb{S}\right) \: &:= \: \operatorname{\mathbb{E}}\left[\left(\pmb{S} – \operatorname{\mathbb{E}}(\pmb{S}) \right)\, \left(\pmb{S} – \operatorname{\mathbb{E}}(\pmb{S}) \right)^T \right] \,. \end{align} \tag{20} \]

Note the order of transposition in the definition of cov! A (vertical) vector is combined with a transposed (horizontal) vector. The rules of a matrix-multiplication then give you a matrix as the result! It contains all combinations of the components.

The expectation value has to be determined for every element of the matrix. Thus, the interpretation of the notation above for the 2-dimenional case is: (a) Pick all pairwise combinations (S_j, S_k) of the component distributions. (b) Calculate the covariance of the pair cov(S_j, S_k) and put it at the (j,k)-place inside the matrix. See this post for more details. Note that the covariance of a distribution with itself is identical to the variance of the distribution: cov(S₁, S₁) = var(S₁).

For a 2-dimensional case we get

\[ \begin{align} \operatorname{cov}\left(\pmb{S}\right)\:&=\quad {\begin{pmatrix} \operatorname{var} (S_{1})&\operatorname{cov} (S_{1},S_{2}) \\ \operatorname{cov} (S_{2},S_{1}) &\operatorname {var} (S_{2}) \end{pmatrix}} \,. \end{align} \tag{21} \]

The above matrix is the (variance-) covariance matrix of a 2-dim random vector S, which we also abbreviate with Σ_S. For a general squared transformation matrix M one can show that

\[ \operatorname{cov} \left( \operatorname{\pmb{M}} \bullet \, \pmb{S} \right) \:=\: \operatorname{\pmb{M}} \bullet \operatorname{\mathbb{E}} \left[ \pmb{S} \pmb{S}^T\right] \bullet \operatorname{\pmb{M}}^T \,. \tag{22} \]

From the definition of the covariance it is easy to derive the following relations for our centered, standardized special random vector Z (with independent component distributions):

\[ \operatorname{\mathbb{E}}\left(\pmb{Z} \right) \, = \, 0, \quad \operatorname{cov} \left( \operatorname{\pmb{Z}} \right) \:=\: \operatorname{\mathbb{E}} \left( \operatorname{\pmb{Z}} \operatorname{\pmb{Z}}^T \right) \,=\, \operatorname{\pmb{I}} \,. \tag{23} \]

We use this to determine the covariance of our transformed random vector V_m:

\[ \begin{align} \operatorname{cov}\left( \pmb{V}_m \right) \: &= \: \operatorname{\mathbb{E}} \left[ \left( \pmb{V}_m \,-\, \operatorname{\mathbb{E}}\left[\pmb{V}_m) \right] \right) \, \left( \pmb{V}_m \,-\, \operatorname{\mathbb{E}} \left[ \pmb{V}_m) \right] \right)^T \, \right] \\[10pt] &amp:= \: \operatorname{\mathbb{E}} \left[ \, \left( \operatorname{\pmb{M}} \operatorname{\pmb{Z}} \right) \, \left( \operatorname{\pmb{M}} \operatorname{\pmb{Z}} \right)^T \, \right] \\[10pt] &amp:= \: \operatorname{\pmb{M}} \, \operatorname{\mathbb{E}} \left[ \, \left( \operatorname{\pmb{Z}} \right) \, \left( \operatorname{\pmb{Z}} \right)^T \, \right] \, \operatorname{\pmb{M}}^T \\[10pt] &amp:= \: \operatorname{\pmb{M}} \, \operatorname{\pmb{I}} \, \operatorname{\pmb{M}}^T \:=\: \operatorname{\pmb{\Sigma}}_m \,. \end{align} \tag{24} \]

So, Σ_m really is the (symmetric) covariance matrix of V_m. With

\[\sigma_x = \operatorname{var} \left(X\right) \,, \quad \sigma_y = \operatorname{var} \left(Y\right) \tag{25} \]

and by the freedom to set

\[ \operatorname{cov}\left(X,\, Y\right) \,=\, \rho * \sigma_x \, \sigma_y \tag{26} \]

this means

\[ \begin{align} \operatorname{cov}\left(\pmb{V}_m\right)\:&=\quad {\begin{pmatrix} \sigma_x^2 & \rho * \sigma_x \, \sigma_y \\ \rho * \sigma_x \, \sigma_y &\sigma_y^2 \end{pmatrix}} \,=\, \operatorname{\pmb{\Sigma}}_m \,. \end{align} \tag{27} \]

The proof that ρ is the Pearson correlation coefficient for cov(X, Y) has already been given in another post by explicit integration of g_v(v_m).

Conclusion

In this post we have shown that a general centered Bivariate Normal Distribution can be regarded as the result of a linear transformation of a random vector Z for two independent Gaussian distributions Z₁ and Z₂. We have confirmed the general form of the probability density function with the exponent written in vector form. The central matrix appearing is the inverse of the covariance matrix of the transformed random vector (X, Y)^T = M • Z.

Unfortunately, we have so far neither an explicit rule based on the correlation coefficient ρ for constructing correlated X, Y distributions based on Z₁, Z₂. We will solve this problem in a forthcoming post on BVDs in this blog. Such a construction rule will later allow for an explicit parameterization of the contour lines of a BVD.

Stay tuned ….