Skip to content

Bivariate normal distribution – derivation by linear transformation of a random vector for two independent Gaussians

In an another post on properties of a Bivariate Normal Distribution [BVD] I have motivated the form of its probability density function [pdf] by symmetry arguments and the underlying probability density functions of its marginals, namely 1-dimensional Gaussians. In this post we will derive the probability density function by following the line of argumentation for a general Multivariate Normal Distribution [MVD]. We regard the a BVD as the result of a linear transformation applied to a random vector of two independent 1-dimensional Gaussian random variables.

Related posts:

Reminder: 1-dimensional centered and standardized Gaussian distributions

Let us focus on two independent 1-dimensional normal distributions. Each has a probability density function gj(wj) given by a Gaussian:

\[ \begin{align} W_j \, &\sim\, \mathcal{N}_1 \left(\mu_j,\,\sigma_j^{2} \right) \,, \\[10pt] g_j(w_j, \, \mu_j, \, \sigma_j) \: &= \: {1 \over {\sigma_j * \sqrt{2\pi} } } \, {\large e}^{ – \, {\Large 1 \over \Large 2} \left( {\Large w_j \, – \, \Large \mu_j \over \Large \sigma_j} \right)^2 } \,. \end{align} \tag{1} \]

μj is the mean value and σj is the square root of the variance of the Wj distribution.

We use a 2-dimansional Cartesian coordinate system [CCS] to work with random vectors W = (W1, W2)T composed of such distributions – and respective concrete vectors w. To make things simpler, we center the distributions by choosing an appropriate location of the origin of our CCS. We also standardize each of the distributions Wj. Then we get distributions Zj and a respective random vector Z, which assumes concrete vectors z (i.e, Z = z) according to a pdf gz(z):

\[ \begin{align} \mu_j \, &=\, 0\,, \quad \sigma_j \,=\, 1 \,, \quad Z_j \, \sim\, \mathcal{N}_1 \left( 0,\,1 \right) \,, \\[10pt] g_{z, j} \left( z_j \right) \: &= \: {1 \over \sqrt{2\, \pi \,} } \, \operatorname{exp} \left( \,-\, {1\over 2} \, z_j^2 \right) \,. \end{align} \tag{2} \]
\[ \pmb{Z} \,=\, \begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix}\,, \,\, \Rightarrow g_z \,=\, g_z(\pmb{z}), \quad \pmb{z} \,=\, (z_1, \, z_2)^T \,=\, \begin{pmatrix} z_1 \\ z_2 \end{pmatrix}\,. \tag{3} \]

Due to the independence (!) of the distributions the value of the combined probability density function for a vector z = (z1, z2)T is given by :

\[ g_{\small Z}(\pmb{z}, \pmb{0}, \pmb{\operatorname{I}} ) \, = \, {1 \over (2\, \pi) } \, {\large e}^{ – \, {\Large 1 \over \Large 2} \left( {\Large \pmb{z}^T} \, \bullet \, {\Large \pmb{z}} \right) } \, = \, {1 \over (2\, \pi) } \, {\large e}^{ – \, {\Large 1 \over \Large 2} \left( {\Large \pmb{z}^T} \,\bullet\, {\Large \operatorname{\pmb{I}}} \,\bullet\, {\Large \pmb{z}} \right) } \,. \tag{4} \]

For the purpose of a later generalization I have introduced a coupling matrix. For our random vector Z, it is just the identity matrix I.

Reminder 2: Probabilty density function of a BVD

In another post we have already seen that we could write the probability density function g2(x, y) function of a BVD in a characteristic vector notation for two correlated (Gaussian) random variables X and Y:

\[ \pmb{V} = \begin{pmatrix} X \\ Y \end{pmatrix}, \quad \mbox{concrete values}: \, \pmb{v} = \begin{pmatrix} x \\ y \end{pmatrix}, \quad \pmb{\mu} = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad \pmb{v}_{\mu} = \begin{pmatrix} x – \mu_x \\ y – \mu_y \,. \end{pmatrix} \tag{5} \]
\[ \mbox{centered CCS :} \quad g_{2c}(x, y) \,=\, {1 \over 2 \pi \, \sigma_x \, \sigma_y } {1 \over { \sqrt{\, 1\,-\, \rho^2}} } \operatorname{exp} \left( – {1\over2} \, {\pmb{v}}^T \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} \, \right) \,, \tag{6} \]
\[ \mbox{generell CCS :} \quad g_2(x, y) \,=\, {1 \over 2 \pi \, \sigma_x \, \sigma_y } {1 \over { \sqrt{\, 1\,-\, \rho^2}} } \operatorname{exp} \left( – {1\over2} \, {\pmb{v}_{\mu}}^T \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}_{\mu}} \, \right) \,. \tag{7} \]

g2c(x, y) is the probability density in centered CCS, in which we have μ = 0. The coupling matrix in this case is the inverse of the so called variance-covariance matrix (or just covariance matrix), describing the correlation of X and Y:

\[ \pmb{\Sigma}^{-1} \,=\, {1 \over \sigma_x^2\, \sigma_y^2\, \left( 1\,-\, \rho^2\right) } \, \begin{pmatrix} \sigma_y^2 &-\rho\, \sigma_x\sigma_y \\ -\rho\, \sigma_x\sigma_y & \sigma_x^2 \end{pmatrix} \,. \tag{8} \]
\[ \pmb{\Sigma} \,=\, \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix}, \quad \pmb{\Sigma} \bullet \pmb{\Sigma}^{-1} = \mathbf I_n \,. \tag{9} \]

Note that this matrix is symmetric. In another post I have already shown that ρ indeed is the Pearson correlation coefficient of X and Y.

The notation often used to describe a two-dimensional normal distribution is

\[ \pmb{V} \,=\, \mathcal{N}_2 \left[ \pmb{v}_{\mu},\, \pmb{\Sigma} \right] \,=\, \mathcal{N}_2 \left[ \begin{pmatrix} \mu_x \\ \mu_y\end{pmatrix}, \, \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix} \right] \,. \tag{11} \]

Below I just regard centered distributions with zero expectation values.

From two independent standardized Gaussian distributions to a BVD

We work in a centered CCS. We assume that we have a random vector of two independent standardized Gaussian normal distributions Z1 and Z2, each with the standardized Gaussian probability densities defined above. We then apply a linear transformation via a 2×2-matrix M to get another centered random vector Vm :

\[ \begin{align} \pmb{Z} \, &=\, \begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix} , \quad \pmb{V}_m \, =\, \begin{pmatrix} X_m \\ Y_m \end{pmatrix}, \\[10pt] \pmb{V}_m \, &= \, \operatorname{\pmb{M}} \bullet \, \pmb{Z}, \, \quad \pmb{Z} \,=\, \operatorname{\pmb{M}}^{-1} \bullet \pmb{V}_m \,, \\[10pt] {\partial \,\pmb{V}_m \over \partial \, \pmb{Z} }\,&=\, \operatorname{\pmb{M}} \,, \quad {\partial \,\pmb{Z} \over \partial \, \pmb{V}_m }\,=\, \operatorname{\pmb{M}}^{-1} \,, \quad \left| \operatorname{\pmb{M}} \right| \, \ne \, 0 \,. \end{align} \tag{12}\]

We assume that M is invertible (to avoid cases of degradation). The bullet above indicates the standard matrix multiplication.

Regarding the probability density gv(vm) for a concrete vector vm = (x, y)T of the transformed random vector VM we must take into account a change of volume elements by this transformation. I.e., we have to take into account the Jacobian determinant of the transformation:

\[ g_v ( \pmb{v}_m ) * dx\, dy \,=\, g_z \left( \operatorname{\pmb{M}}^{-1} \, \pmb{x} \right) * \left| \operatorname{\pmb{M}}^{-1} \right| * d z_1 \, dz_2 \,. \tag{13} \]

Hence

\[ \begin{align} g_v ( \pmb{v}_m ) \, &=\, {1 \over \left| \operatorname{\pmb{M}} \right| } * g_z \left( \operatorname{\pmb{M}}^{-1} \, \pmb{x} \right) \\[10pt] &=\, {1 \over 2\, \pi \, \left| \operatorname{\pmb{M}} \right| } * \operatorname{exp} \left( \,-\, {1 \over 2} * \left(\, \left[ \operatorname{\pmb{M}}^{-1} \pmb{x}\right]^T \bullet \operatorname{\pmb{M}}^{-1} \pmb{x} \, \right) \,\right) \\[10pt] &=\, {1 \over 2\, \pi \, \left| \operatorname{\pmb{M}} \right| } * \operatorname{exp} \left( \,-\, {1 \over 2} * \left( \, \pmb{x}^T \bullet \left[\, \left[ \operatorname{\pmb{M}}^{-1}\right]^T \bullet \operatorname{\pmb{M}}^{-1} \, \right] \bullet \pmb{x} \right)\, \right) \,. \\[10pt] \end{align} \tag{14} \]

With the help of our (2×2)-matrix M we define a new symmetric matrix Σm :

\[ \begin{align} \operatorname{ \pmb{\Sigma}}_m \:&=\: \operatorname{\pmb{M}} \bullet \, \operatorname{\pmb{M}}^T \,, \\[10pt] \operatorname{ \pmb{\Sigma}}_m^{-1} \:&=\: \left[\operatorname{\pmb{M}}^{-1}\right]^T \bullet \operatorname{\pmb{M}}^{-1} \,. \\[10pt] \end{align} \tag{15} \]

Due to M being invertible, Σm is invertible, too. Note that both Σm and (Σm)-1 are symmetric. Furthermore:

\[ \left| \operatorname{\pmb{\Sigma}}_m \right| \:=\: \left| \operatorname{\pmb{M}} \right| * \left| \operatorname{\pmb{M}}^T \right| \:=\: \left( \left| \operatorname{\pmb{M}} \right| \right)^2 \,. \tag{16} \]

This leads to a very compact form of the probability density function for our transformed random vector

\[ \begin{align} g_v ( \pmb{v}_m ) \, &=\, {1 \over 2\, \pi \, \left| \operatorname{\pmb{\Sigma}} \right|^{1/2} } * \operatorname{exp} \left[ \,-\, {1 \over 2} \, \left( \, \pmb{v}_m^T \bullet \operatorname{\pmb{\Sigma}}_m^{-1} \bullet \pmb{v}_m \right)\, \right] \,. \end{align} \tag{17} \]

By comparison with the results in the above section “Reminder 2”, we see that we have already reached our desired form and that

  • we should identify Vm with V,
  • we should identify gv(vm) with g2c(x, y),
  • we should identify Σm with the covariance matrix Σ .

With M being reversible, one can show that Σm indeed is a positive definite symmetric matrix. A thing that is still open is to prove that Σm really represents a covariance matrix of the transformed random vector.

Variance and respective matrix

Let us determine the variance of our transformed random vector Vm. We take a formal road according to general definitions. Two aspects are important:

(1) An expectation value of a random vector is defined via the expectation values of its components. Thus, the expectation vector of a random vector S is just the vector composed of the expectation values of its (marginal) component distributions:

\[ \pmb{\mu}\left(\pmb{S}\right) \: = \: \operatorname{\mathbb{E}}\left(\pmb{S} \right) \, = \, \left( \operatorname{\mathbb{E}}(S_1), \, \operatorname{\mathbb{E}}(S_2), …, \, \operatorname{\mathbb{E}}(S_n) \right)^T \, . \tag{18} \]

(2) The covariance of a random vector is a generalization of the standard covariance for two 1-dim distributions X and Y:

\[ \begin{align} \operatorname{cov}(X, Y) &= \operatorname{\mathbb{E}}\left[\,\left(\,X \,-\, \operatorname{\mathbb{E}}\left(X\right)\,\right) \, \left(\,Y \,-\, \operatorname{\mathbb{E}}\left(Y\right)\, \right)\,\right] \\ &= \operatorname{\mathbb{E}}\left(X Y\right) \,-\, \operatorname{\mathbb{E}}\left(X\right) \operatorname{\mathbb{E}}\left(Y\right) \,. \end{align} \tag{19} \]

The generalization almost naturally leads to a matrix of expectation values for all combinations of the random vector components:

\[ \begin{align} \operatorname{cov}\left(\pmb{S}\right) \: &:= \: \operatorname{\mathbb{E}}\left[\left(\pmb{S} – \operatorname{\mathbb{E}}(\pmb{S}) \right)\, \left(\pmb{S} – \operatorname{\mathbb{E}}(\pmb{S}) \right)^T \right] \,. \end{align} \tag{20} \]

Note the order of transposition in the definition of cov! A (vertical) vector is combined with a transposed (horizontal) vector. The rules of a matrix-multiplication then give you a matrix as the result! It contains all combinations of the components.

The expectation value has to be determined for every element of the matrix. Thus, the interpretation of the notation above for the 2-dimenional case is: (a) Pick all pairwise combinations (Sj, Sk) of the component distributions. (b) Calculate the covariance of the pair cov(Sj, Sk) and put it at the (j,k)-place inside the matrix. See this post for more details. Note that the covariance of a distribution with itself is identical to the variance of the distribution: cov(S1, S1) = var(S1).

For a 2-dimensional case we get

\[ \begin{align} \operatorname{cov}\left(\pmb{S}\right)\:&=\quad {\begin{pmatrix} \operatorname{var} (S_{1})&\operatorname{cov} (S_{1},S_{2}) \\ \operatorname{cov} (S_{2},S_{1}) &\operatorname {var} (S_{2}) \end{pmatrix}} \,. \end{align} \tag{21} \]

The above matrix is the (variance-) covariance matrix of a 2-dim random vector S, which we also abbreviate with ΣS. For a general squared transformation matrix M one can show that

\[ \operatorname{cov} \left( \operatorname{\pmb{M}} \bullet \, \pmb{S} \right) \:=\: \operatorname{\pmb{M}} \bullet \operatorname{\mathbb{E}} \left[ \pmb{S} \pmb{S}^T\right] \bullet \operatorname{\pmb{M}}^T \,. \tag{22} \]

From the definition of the covariance it is easy to derive the following relations for our centered, standardized special random vector Z (with independent component distributions):

\[ \operatorname{\mathbb{E}}\left(\pmb{Z} \right) \, = \, 0, \quad \operatorname{cov} \left( \operatorname{\pmb{Z}} \right) \:=\: \operatorname{\mathbb{E}} \left( \operatorname{\pmb{Z}} \operatorname{\pmb{Z}}^T \right) \,=\, \operatorname{\pmb{I}} \,. \tag{23} \]

We use this to determine the covariance of our transformed random vector Vm:

\[ \begin{align} \operatorname{cov}\left( \pmb{V}_m \right) \: &= \: \operatorname{\mathbb{E}} \left[ \left( \pmb{V}_m \,-\, \operatorname{\mathbb{E}}\left[\pmb{V}_m) \right] \right) \, \left( \pmb{V}_m \,-\, \operatorname{\mathbb{E}} \left[ \pmb{V}_m) \right] \right)^T \, \right] \\[10pt] &amp:= \: \operatorname{\mathbb{E}} \left[ \, \left( \operatorname{\pmb{M}} \operatorname{\pmb{Z}} \right) \, \left( \operatorname{\pmb{M}} \operatorname{\pmb{Z}} \right)^T \, \right] \\[10pt] &amp:= \: \operatorname{\pmb{M}} \, \operatorname{\mathbb{E}} \left[ \, \left( \operatorname{\pmb{Z}} \right) \, \left( \operatorname{\pmb{Z}} \right)^T \, \right] \, \operatorname{\pmb{M}}^T \\[10pt] &amp:= \: \operatorname{\pmb{M}} \, \operatorname{\pmb{I}} \, \operatorname{\pmb{M}}^T \:=\: \operatorname{\pmb{\Sigma}}_m \,. \end{align} \tag{24} \]

So, Σm really is the (symmetric) covariance matrix of Vm. With

\[\sigma_x = \operatorname{var} \left(X\right) \,, \quad \sigma_y = \operatorname{var} \left(Y\right) \tag{25} \]

and by the freedom to set

\[ \operatorname{cov}\left(X,\, Y\right) \,=\, \rho * \sigma_x \, \sigma_y \tag{26} \]

this means

\[ \begin{align} \operatorname{cov}\left(\pmb{V}_m\right)\:&=\quad {\begin{pmatrix} \sigma_x^2 & \rho * \sigma_x \, \sigma_y \\ \rho * \sigma_x \, \sigma_y &\sigma_y^2 \end{pmatrix}} \,=\, \operatorname{\pmb{\Sigma}}_m \,. \end{align} \tag{27} \]

The proof that ρ is the Pearson correlation coefficient for cov(X, Y) has already been given in another post by explicit integration of gv(vm).

Conclusion

In this post we have shown that a general centered Bivariate Normal Distribution can be regarded as the result of a linear transformation of a random vector Z for two independent Gaussian distributions Z1 and Z2. We have confirmed the general form of the probability density function with the exponent written in vector form. The central matrix appearing is the inverse of the covariance matrix of the transformed random vector (X, Y)T = MZ.

Unfortunately, we have so far neither an explicit rule based on the correlation coefficient ρ for constructing correlated X, Y distributions based on Z1, Z2. We will solve this problem in a forthcoming post on BVDs in this blog. Such a construction rule will later allow for an explicit parameterization of the contour lines of a BVD.

Stay tuned ….