Covariance matrix of a cut-off Multivariate Normal Distribution – II – integrals over volume and surface of an n-dimensional sphere

In the 1st post of this series, we have posed the following problem: Take the probability density of a Multivariate Normal Distribution [MVN], but set it to zero at Mahalanobis distances bigger than a finite distance D. Take a respective volume region enclosed by a contour surface of constant probability density (at the Mahalanobis distance d_m=D). We have called such a finite ellipsoidal volume in the n-dimensional space around the MVN’s center a “core” of a MVN. Can we derive the covariance matrix of the original MVN by only using information about data points inside such a central and limited ellipsoidal core?

I have started to discussed cut-off variants of MVN distributions because real word data samples may exhibit a MVN-like probability density only inside a central inner core of a broader distribution, which overall violates MVN-criteria. Actually, the distribution outside an identified core may strongly deviate from a MVN. For practical purposes and numerical analysis, we would nevertheless like to estimate the MVN’s covariance matrix Cov_Y from the distribution of data points confined to space of the ellipsoidal core at the distribution’s center.

In this post, we look a bit closer at the mathematics and in particular the integrals characterizing our problem. We use definitions and conventions as described in the first post.

We will show that the covariance matrix Cov_Y of the infinitely extended MVN differs from the covariance matrix Cov_C of the core by just a factor – common for all matrix elements. In addition, we bring required the covariance integrals into a manageable form and indicate why and how we can use spherical polar coordinates to determine the covariance matrix of the n-dim ellipsoidal core.

Previous posts:

Covariance matrix of a cut-off Multivariate Normal Distribution – I – intricate integrals with exponentials over the volumes and surfaces of n-dimensional ellipsoids?

Reminder: Covariance of a multidimensional random vector

In [1] I have discussed a basic formula, which defines the elements of the variance-covariance matrix [“cov-matrix”] of a multivariate random vector S

\[ \pmb{S} \: = \: \left( \, S_1, \, S_2, \, ….\, S_n\, \right)^T \,. \tag{1} \]

\[ \begin{align} \pmb{\mu}\left(\pmb{S}\right) \,=\, \pmb{\mu}_S \: &= \: \operatorname{\mathbb{E}}\left(\pmb{S} \right) \, = \, \left( \operatorname{\mathbb{E}}(S_1), \, \operatorname{\mathbb{E}}(S_2), …, \, \operatorname{\mathbb{E}}(S_n) \right)^T \, , \tag{2} \\[8pt] \text{centered : } \quad \pmb{\mu}_S \,&=\, 0 \,. \tag{3} \end{align} \]

For the time being, we work in a Cartesian Coordinate System [CCS] to describe concrete elements of S, i.e. vectors s, and the distribution of related data points in the ℝⁿ. For reasons of simplicity, let us consider a distribution whose center coincides with the origin of the CCS.

Note that the one-dimensional components of the random vector may be correlated with each other. The basic definition formula for the covariance of two random variables X and Y is:

\[ \begin{align} \operatorname{Cov}(X, Y) &= \operatorname{\mathbb{E}}\left[\,\left(\,X \,-\, \operatorname{\mathbb{E}}\left(X\right)\,\right) \, \left(\,Y \,-\, \operatorname{\mathbb{E}}\left(Y\right)\, \right)\,\right] \\ &= \operatorname{\mathbb{E}}\left(X Y\right) \,-\, \operatorname{\mathbb{E}}\left(X\right) \operatorname{\mathbb{E}}\left(Y\right) \,. \tag{4} \end{align} \]

The multivariate extension to the components of a random vector S is given by the following term:

\[ \begin{align} \operatorname{Cov}\left(\pmb{S}\right) \: &:= \: \operatorname{\mathbb{E}}\left[\left(\pmb{S} – \operatorname{\mathbb{E}}(\pmb{S}) \right)\, \left(\pmb{S} – \operatorname{\mathbb{E}}(\pmb{S}) \right)^T \right] \, . \tag{5} \end{align} \]

\[ \begin{align} \text{centered : } \quad \operatorname{Cov}\left(\pmb{S}\right) \: &:= \: \operatorname{\mathbb{E}} \left[ \pmb{S} \circ \pmb{S}^T \right] \, . \tag{6} \end{align} \]

The circ in (6) symbolizes the matrix multiplication of Linear Algebra. Note that due to multiplying a standard vertical line vector with a horizontal column vector in equation (6) actually defines a matrix :

\[ \begin{align} \pmb{\Sigma}_S \,=\, \operatorname{Cov}\left(\pmb{S}\right)\:&=\:\operatorname{\mathbb{E}} {\begin{pmatrix} (S_{1}-\mu_{1})^{2}&(S_{1}-\mu_{1})(S_{2}-\mu_{2})&\cdots &(S_{1}-\mu_{1})(S_{n}-\mu_{n})\\(S_{2}-\mu_{2})(S_{1}-\mu_{1})&(S_{2}-\mu_{2})^{2}&\cdots &(S_{2}-\mu _{2})(S_{n}-\mu_{n})\\\vdots &\vdots &\ddots &\vdots \\(S_{n}-\mu_{n})(S_{1}-\mu_{1})&(S_{n}-\mu_{n})(S_{2}-\mu _{2})&\cdots &(X_{n}-\mu_{n})^{2} \end{pmatrix}} \tag{7} \\[16pt] &\rightarrow \quad \pmb{\Sigma}_S \,\, =\,\, {\begin{pmatrix} \operatorname{Var} (S_{1})&\operatorname{Cov} (S_{1},S_{2})&\cdots &\operatorname{Cov} (S_{1},S_{n})\\ \operatorname{Cov} (S_{2},S_{1})&\operatorname{Var} (S_{2})&\cdots &\operatorname{Cov} (S_{2},S_{n})\\\vdots &\vdots &\ddots &\vdots \\ \operatorname{Cov} (S_{n},S_{1})&\operatorname{Cov} (S_{n},S_{2})&\cdots &\operatorname {Var} (S_{n}) \end{pmatrix}} \,\,. \tag{8} \end{align} \]

On the level of the matrix elements the multidimensional case is reduced to the evaluation of the usual variance/covariance between two components of S.

Reminder: MVN

In [2], we have defined a MVN of a random vector Y_N – as the result of a linear transformation of a standardized random vector Z. The components Z_i of Z are continuous independent and standardized 1-dim Gaussians. Z is a special form of a n-dim MVN. We denote these distributions as

\[ \begin{align} \text{general MVN : } \,\,\,\, \pmb{Y}_N \,&\sim\, \pmb{\mathcal{N}}_n \left(\pmb{\mu}_{\small Y},\, \pmb{\Sigma}_{\small Y} \right) \,, \tag{9} \\[10pt] \text{standardized, centered MVN : } \,\,\,\, \pmb{Z} \, &\sim \pmb{\mathcal{N}}_n \left( \pmb{0}, \, \pmb{\operatorname{I}} \right), \,\,\, {\small with} \,\,\, Z_j \,\sim\, \mathcal{N}_1 \left( 0, \,1 \right) \,. \tag{10} \end{align} \]

Z has a very simple probability density function

\[ g_{\small Z}(\pmb{z}) \,=\, {1 \over (2\pi)^{n/2} } \, e^{ – \, {\Large 1 \over \Large 2} \, \left[ \left( {\Large \pmb{z} } \right)^{\Large T} \, \left( {\Large \pmb{z} } \right) \right] } \,=\, {1 \over (2\pi)^{n/2} } \, e^{ – \, {\Large 1 \over \Large 2} \, { \Large |\pmb{z}|}^2 } \,. \tag{10.1} \]

We create a random vector Y_N of a MVN via a linear transformation of Z

\[ \pmb{Y_N} \,=\, \left(Y_1, \, Y_2, \, ….., \, Y_n \right)^T \: = \: \pmb{\operatorname{M}}_{\small Y} \, {\small \bullet} \, \pmb{Z} + \pmb{\mu}_y \,. \tag{11} \]

To a avoid infinitesimally thin degenerations, we choose M_Y in our context. I.e.:

\[ \pmb{\operatorname{ M }}_{\small Y} \, {\small \bullet} \, \pmb{\operatorname{ M }}_{\small Y}^{-1} \,=\, \pmb{\operatorname{ I }} \,. \tag{12} \]

The transformed probability density g_Y(y) at the end-point of a concrete vector y of the continuous non-degenerate distribution Y_N is

\[ g_{\small Y}(\pmb{y}) \,=\, {1 \over (2\pi)^{n/2} \, \left(\operatorname{det}\pmb{\operatorname{\Sigma}}_{\small Y}\right)^{1/2} } \, {\Large e}^{ – \, {\Large 1 \over \Large 2} \, \left[ \left( {\Large \pmb{y} \,-\, \pmb{\mu}_y } \right)^{\Large T} \, { \Large \pmb{\operatorname{\Sigma}}}_Y^{\Large -1} \, \left( {\Large \pmb{y} \,-\, \pmb{\mu}_y } \right) \right] } \,, \tag{13} \]

\[ \pmb{y} \,=\, \left(\,y_1,\, y_2, \, …\, y_n \right)^T \,. \tag{14} \]

Below, we will work with centered distributions, only. I.e., we assume

\[ \pmb{\mu}_y \,=\, 0 \,. \]

Σ_Y is an invertible, symmetric and positive-definite matrix. Actually, it is the covariance matrix Cov_Y of Y_N :

\[ \begin{align} \pmb{\operatorname{\Sigma}}_{\small Y} \, &= \, \pmb{\operatorname{M}}_{\small Y}\, \pmb{\operatorname{M}}_{\small Y}^T \,=\, \pmb{\operatorname{Cov_Y}} \,, \tag{15} \\[10pt] \pmb{\operatorname{\Sigma}}_{\small Y}^{-1} \,&=\, \left(\pmb{\operatorname{M}}_{\small Y}^T\right)^{-1} \pmb{\operatorname{M}}_{\small Y}^{-1} \,=\, \left(\pmb{\operatorname{M}}_{\small Y}^{-1}\right)^T \pmb{\operatorname{M}}_{\small Y}^{-1} \,. \tag{16} \end{align} \, . \]

For a given Σ_Y we can get a valid M_Y by a Cholesky decomposition of Σ_Y. We also have

\[\operatorname{det} \left(\pmb{\operatorname{M}}_Y\right) \,= \, \left|\pmb{\operatorname{M}}_Y\right| \,=\, \operatorname{det} \left(\pmb{\operatorname{M}}_Y^T\right) \,= \, \sqrt{ \operatorname{det} \left(\pmb{\Sigma}_Y\right) \,} \, \gt 0\,. \tag{17} \]

Due to the fact that we can also perform an eigendecomposition of the invertible M_Y, the linear transformation of Z can be thought to be composed of a scaling of the vectors (and their components), followed by a rotation. The Mahalanobis distance of a vector y is defined by

\[ d_m(\pmb{y}) \,=\, \sqrt{ \, \pmb{y}^T\, \pmb{\Sigma}_Y^{-1} \, \pmb{y} \, } \,. \tag{18} \]

Vectors y fulfilling

\[ {\small{\text{Surface of n-dim ellipsoid : }}} \quad (d_m(\pmb{y}))^2 \,=\, D^2 \,=\, \pmb{y}^T\, \pmb{\Sigma}_Y^{-1} \, \pmb{y} \,=\, const. \tag{19} \]

reach out from the CCS-origin and end on the (n-1)-dim surface of a n-dim ellipsoid. (Reason: (19) defines a quadratic form!).

Important Note: When I speak of an ellipsoid, I usually refer to the voluminous body enclosed in an ellipsoidal surface and not the surface, itself. When a clearer distinction gets necessary, I will say this explicitly in the text.

Random vector Y is just the random vector Z – but seen from a different CCS

At this point I want to get you a different perspective onto the linear transformation of eq. (11). Remember, there is always a symmetry between an active and a passive vector rotation. Let us call the CCS in which we describe the random vector Z more specifically “CCS_Z“. The scaling operation and forward rotation mediated by M_Y is equivalent to a description of z-vectors in a backward rotated CCS_Y with scaled coordinate axes. In this rotated CCS_Y the vectors get y_i-coordinates. Therefore, the vector Y_N can just be understood as the vector Z described in a different coordinate system CCS_Y.

Normally, we evaluate the covariance matrix Cov_Y according to (8) in a selected coordinate system CCS_Y. But: With the help of a proper coordinate transformation we could also work in the original CCS_Z. I will use this fact later on.

Re-evaluation of the covariance matrix via integrals

According to the general definitions (5) to (8), we can determine the covariance matrix Σ_Y of Y_N via the evaluation of a pretty complex integral in n dimensions. To calculate the the expectation value (8) for a centered Y_N we use Y_N‘s probability density (13) – and get

\[ \operatorname{\mathbb{E}} \left(\, \pmb{Y}_N\, \pmb{Y}^T_N\,\right) \,=\, C_n \,*\, \int \int … \int \, \pmb{Y}_N\, \pmb{Y}^T_N \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Y}_N^T \,\pmb{\Sigma}_{\small Y}^{-1} \, \pmb{Y}_N \,\right) \, dy_1\,dy_2 \, ….\, dy_n \,. \tag{20} \]

C_n covers the constants in the the expression for the probability density g_Y(y); see (13):

\[ C_n \,=\, {1 \over (2\pi)^{n/2} \, \left(\operatorname{det}\pmb{\operatorname{\Sigma}}_{\small Y}\right)^{1/2} } \,. \tag{21} \]

The notation with random vectors in (20) may appear a bit strange at first sight. But remember: This is only an abbreviation. The integration has to be performed over the continuous probability distribution of concrete position vectors y covering the whole ℝⁿ. And once again note: The term Y_N Y_N^T actually defines a (nxn)- matrix and its elements – and not a scalar product.

If our claim (15) that Σ_Y actually is the covariance matrix of Y_N, we should be able to properly resolve its self-reference in

\[ \pmb{\Sigma}_Y \,=\, \text{INT} \,=\, C_n \,*\, \int \int … \int \, \pmb{Y}_N\, \pmb{Y}^T_N \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Y}^T \,\pmb{\Sigma}_{\small Y}^{-1} \, \pmb{Y} \,\right) \, dy_1\,dy_2 \, ….\, dy_n \,, \tag{22} \]

without any contradictions. As a MVN stretches out to infinity, the integral has to be evaluated over the whole of the ℝⁿ to get Y_N ‘s covariance matrix Σ_Y.

To perform the integration efficiently, we could make use of the fact that the exponent in the probability density gets constant on ellipsoidal surfaces. We could, therefore, also be tempted to transform everything to ellipsoidal coordinates. However and as we will see, we can actually avoid elliptical coordinates by using a back-transformation of Y_N to the original Z-vector defined in (11).

Integrals of Z-dependent functions

Regarding the integral (22) we also want to switch to z_i -variables instead of y_i -variables. The reasons are

that Z-components can easily be expressed in spherical coordinates and
that the unavoidable exponential terms gets substantially simpler.

For a given symmetric positive-definite Σ_Y, we can use a Cholesky decomposition to find a valid (invertible) M_Y. This allows us to replace Y_N by Z (composed of independent Gaussians):

\[ \begin{align} \pmb{Y}_N^T \,\pmb{\Sigma}_Y^{-1} \, \pmb{Y}_N \,&=\, \left[\pmb{\operatorname{M}}_Y\, \pmb{Z} \right]^T \, \left[ \pmb{\operatorname{M}}_Y \, \pmb{\operatorname{\operatorname{M}}}_Y^T \right]^{-1} \, \left[\pmb{\operatorname{M}}_Y\, \pmb{Z} \right] \\[10pt] &= \, \pmb{Z}^T \,\, \pmb{\operatorname{M}}_Y^T \left[ \pmb{\operatorname{M}}_Y^T \right]^{-1} \,\, \left[\pmb{\operatorname{M}}_Y\right]^{-1} \pmb{\operatorname{M}}_Y\,\, \pmb{Z} \\[10pt] &=\, \pmb{Z}^T \pmb{Z} \,. \tag{23} \end{align} \]

What does this switch of a description of our y-vectors to different coordinates, more precisely z_i coordinates mean? Well, we now pick up our idea discussed two sections above:

By replacing the y_i-coordinates by x_i-coordinates, we perform a (reverse) coordinate transformation to our original coordinate system CCS_Z from which we started with our Z. How do we have to compensate for this back-transformation in our integral? This is easy: Multidimensional calculus tells us that we have to take into account the Jacobi determinant of the original coordinate transformation. In the following sense:

\[ \iint .. \int \, f(y_1, y_2, …y_n) \, dy_1 \, dy_2 \, … \, dy_n \, =\, \\[10pt] \iint .. \int \, f\left(\, y_1(x_1, ..x_n), …, \, y_n(x_1, …x_n)\,\right) \, \left| \pmb{\operatorname{J_{y,x} }} \right| \, dx_1\,dx_2, … dx_n \,, \]

with

\[ \left| \pmb{\operatorname{J_{y,x}}} \right| \,= \, \left| \begin{pmatrix} {\partial y_1 \over \partial x_1} & {\partial y_1 \over \partial x_2} & … & {\partial y_1 \over \partial x_n} \\ …& … & … & … \\ {\partial y_n \over \partial x_1} & {\partial y_n \over \partial x_2} & … & {\partial y_n \over \partial x_n} \end{pmatrix} \right| \]

In our linear case this simply boils down to

\[ \left| \pmb{\operatorname{J_{y,x}}} \right| \,= \, \left|\pmb{\operatorname{M}}_Y\right| \, . \]

Our integral expressed in z-coordinates therefore becomes

\[ \begin{align} \text{INT} \,&=\, C_n \,*\, \int \int … \int \, \pmb{\operatorname{M}}_Y \pmb{Z}\, \left[ \pmb{\operatorname{M}}_Y \pmb{Z}\right]^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, \left| \pmb{\operatorname{M}}_Y \right| \, dz_1\,dz_2 \, ….\, dz_n \,, \\[10pt] &=\, C_n \,*\, \int \int … \int \, \pmb{\operatorname{M}}_Y \,\, \pmb{Z}\, \pmb{Z}^T \,\, \pmb{\operatorname{M}}_Y^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, \left| \pmb{\operatorname{M}}_Y \right| \, dz_1\,dz_2 \, ….\, dz_n \,. \tag{24} \end{align}\]

For an evaluation of the MVN’s cov-matrix, the integration has to be performed over the full ℝⁿ, but now in a CCS in which the y-vectors are seen as the original z-vectors.

Note A: If we had finite integrals the integral limits would have to become subject of the inverse transformation, too.

Note B: The term in the large square brackets defines a matrix via the ZZ^T -term – it gives variance and covariance terms of Z‘s various marginal distributions. Also note the difference in order to the term Z^TZ appearing in the exponential, which is a scalar product and gives a scalar value.

The term

\[ {\small{\text{Coord. trafo of matrix} : }} \quad \pmb{\operatorname{M}}_Y \,\, \pmb{Z}\, \pmb{Z}^T \,\, \pmb{\operatorname{M}}_Y^T \]

just corresponds to a coordinate transformation of the matrix ZZ^T to CCS_Y. It is the scalar matrix elements which get affected by the integration over the z-space. A short consideration shows that we can move the linear operators out of the integral – which eventually results in

\[ \text{INT} \,=\, \pmb{\operatorname{M}}_Y \left[ {1\over (2\,\pi)^{n/2} } \, \int \int … \int \, \pmb{Z} \pmb{Z}^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, dz_1\,dz_2 \, ….\, dz_n \right] \pmb{\operatorname{M}}_Y^T \,. \tag{25} \]

This is an interesting formula, because it restricts the integration to the rather simple Z-distribution of standardized independent 1-dim Gaussians.

Why is the integral evaluated over the ℝⁿ equal to the covariance matrix?

Now, we need some additional tricky considerations to understand why equations (15) and (22) actually are correct:

Because of (10.1), the probability density for data points residing on n-dim spherical shells is constant. The reason is that the zz^T terms just give us the length r = |z| of the respective position vectors z.
The integrals for combinations (z_iz_j) over such shells will vanish – due to symmetries and appearing terms with opposite signs. This means that the off-diagonal terms of the matrix are zero.
Therefore the central matrix in the rectangular brackets of (25) will always get diagonal by the integration over shells up to any radius r=R.
In addition the exponential can be factorized and written as a product of exponentials, each for squares of coordinate values.
For the remaining diagonal matrix terms, we, therefore, get a product of integrals over exponentials exp(-1/2 z_i²). When integrating over the full ℝⁿ, each integral of exp(-1/2 z_i²) from -∞ to +∞ contributes with a factor (2π)^1/2 . So, together they compensate for the factor in front of the integral (25).
Thus, the integration eventually gives us the n-dimensional diagonal identity matrix I_n in between the M-matrices – and therefore, indeed

\[\pmb{\Sigma}_Y \,=\, \pmb{M}_Y \, \left[ \pmb{I}_n \right] \, \pmb{M}_Y^T \,. \tag{26 } \]

Our result strengthens our trust in the term inside the square brackets of (25).

Remember: The off-diagonal matrix elements in (25) get zero – independent of whether we integrate to radial infinity or just to a finite radius R of the z-vectors r=|z|=R.

Application in the context of our problem

What do we learn from the above integral for a modified MVN, which starts normally around the CCS-origin, but gets cut-off at a certain Mahalanobis distance D? I.e., we look at a distribution g_C(y) whose density distribution is given by

\[ \begin{align*} d_m(\pmb{y}) \,\le\, D \,:\,\,\, g_{\small C}(\pmb{y}) \,&= \, g_{\small Y}(\pmb{y}) \,, \\[10pt] d_m(\pmb{y}) \,\gt\, D \,:\,\,\, g_{\small C}(\pmb{y}) \,&= \, 0,\,. \end{align*} \tag{27} \]

We have called the ellipsoidal volume inside of D a “core”. The core is limited by by vectors having a Mahalanobis distance measure d_m = D. An important feature of MVNs is the following:

The measure d_m is not affected by the transformations M_Y or its inverse, i.e. d_m(y) = d_m(z).

We can see this clearly by applying the same steps as in (23) to a concrete vector y and by taking into account that the covariance matrix of Z is the identity matrix:

\[ \begin{align} d_m^2\left(\pmb{y}\right) \,& =\, \pmb{y}^T \, \pmb{\operatorname{\Sigma}}_Y^{-1} \, \pmb{y} \,=\, \left[\pmb{\operatorname{M}}_Y\, \pmb{z} \right]^T \, \, \left[ \pmb{\operatorname{M}}_Y \, \pmb{\operatorname{\operatorname{M}}}_Y^T \right]^{-1} \, \left[ \pmb{\operatorname{M}}_Y \pmb{z} \right] \\[10pt] &=\, \pmb{z}^T \, \pmb{\operatorname{I}_n} \, \pmb{z} \,=\, d_m^2\left(\pmb{z}\right) \,. \end{align} \tag{28} \]

This means that all y-vectors, which have a Mahalanobis distance d_m ≤ D are mapped to z-vectors having a length |z| ≤ D:

\[ \left\{ \, \pmb{y} \,\,|\,\,\, d_m(\pmb{y}) \le D \, \right\} \quad \Rightarrow \quad \left\{ \, \pmb{z} \,\,|\,\, |\pmb{z}| \le D \, \right\} \,. \tag{29} \]

Note: The length r = |z| of a z-vector is defined by its Euclidean norm as

\[ r \,=\, |\pmb{z}| \,=\, \sqrt{ z_1^2 + z_1^2 + … + z_n^2} \,. \tag{30} \]

Integration simplified

We use the following symbols for an ellipsoidal volume limited by D, the volume of the corresponding multidimensional sphere and its surface area:

\[ \begin{align} {\small{\text{Volume of n-dim ellipsoid given by }}} d_m=D \,:\, & \quad V_{n,D}^{ell} \,, \\[10pt] {\small{\text{Volume of n-dim sphere given by }}} d_m=D \,:\, & \quad V_{n,D}^{sph} \,, \\[10pt] {\small{\text{Surface of n-dim sphere given by }}} d_m=D \,:\, & \quad A_{n,D}^{sph} \,, \\[10pt] {\small{\text{Surface of n-dim sphere given by }}} r=R \,:\, & \quad A_{n,R}^{sph} \,. \end{align} \]

Note: Some authors in the literature prefer to indicate the fact that the dimensionality of a spherical surface manifold in the ℝⁿ actually is (n-1) by a respective subscript. I do not do this here. The subscript n always refers to the dimensionality of the embedding ℝⁿ.

To get the covariance matrix of the n-dim ellipsoidal core, we just have to limit our integration in (22) to the n-dim volume of the ellipsoid. Due to the mapping (29), the ellipsoid (i.e. the core) is obviously mapped to a sphere when we shift to z-coordinates (in CCS_Z):

\[ V_{n,D}^{ell} \quad \Rightarrow \quad V_{n,D}^{sph} \,. \tag{31} \]

Therefore, we get the following integral for the core’s cov-matrix Cov_C:

\[ \pmb{\operatorname{Cov_C}} \,=\, C_n \,*\, \underset{V_{n,D}^{ell}}{\int \int … \int} \, \pmb{Y}_N\, \pmb{Y}^T_N \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Y}^T \,\pmb{\Sigma}_{\small Y}^{-1} \, \pmb{Y} \,\right) \, dy_1\,dy_2 \, ….\, dy_n \,, \tag{32} \]

which due to (25) and (31) results in

\[ \pmb{\operatorname{Cov_C}} \,=\, \pmb{M}_Y \left[ {1\over (2\,\pi)^{n/2} } \, \underset{ V_{n,D}^{sph}}{\iint …\int} \, \pmb{Z} \pmb{Z}^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, dz_1\,dz_2 \, ….\, dz_n \right] \pmb{M}_Y^T \,. \tag{33} \]

Again: To avoid any misconceptions, please, note the following fact:

\[ \underset{ V_{n,D}^{sph}}{\iint …\int} \, \pmb{Z} \pmb{Z}^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, dz_1\,dz_2 \, ….\, dz_n \\ \ne \, \underset{ V_{n,D}^{sph}}{\iint …\int} \, \pmb{Z}^T \pmb{Z} \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, dz_1\,dz_2 \, ….\, dz_n \,. \]

The integrand of the second integral corresponds to the probability function of a chi-square distribution. But this is not what we are interested here. However, the topic of chi-squared distributions will come up again in a later post of this series.

Another topic which we will ignore for the time being is the question how we can normalize our cut-off distribution, which we have confined to the core and whose probability density we have set to zero outside the core. Normalization typically means that we should multiply the probability density with a certain constant factor to ensure that the integral over all space gets the value 1. We ignore such a factor for the cut-off distribution for the time being. But see eq. (39) below and await the upcoming 4th post of this series.

Using symmetries on the surfaces of n-dim spheres …

We can obviously try to solve integral (33) in spherical coordinates by working with n-dim spherical shells and respective n-dim polar coordinates (r, Φ₁, Φ₂, …Φ_n). Let us call an infinitesimal surface element expressed in polar coordinates dσ

\[ \begin{align} d\pmb{\sigma} \,& =\, d\pmb{\sigma}_{r, \phi_1, \phi_2, …\phi_n} \\[10pt] & =\, d\pmb{\sigma}(r, \phi_1, \phi_2, …\phi_n) \\[10pt] & =\, r^2\, |\pmb{J}_{polar}| \, d\phi_1, d\phi_2, …\phi_n \,. \end{align} \tag{34} \]

It depends, of course, on angles and r – and involves the Jacobi-determinant of a (nxn)-matrix, which contains the partial derivatives of the functional dependencies of polar coordinates on their Cartesian counterparts. We are not interested in the details of this Jacobi determinant, yet.

Now, let us look a bit closer at integral (33). We already know that the matrix inside the square brackets becomes a diagonal (nxn)-matrix. Let us call this matrix W. The non-zero elements W_ii of W are obviously given by integrals of the form

\[ \begin{align} \operatorname{W}_{ii} \,&=\, {1\over (2\,\pi)^{n/2} } \, \underset{ V_{n,D}^{sph}}{\iint …\int} \, \left[z_i (r, \phi_1, \phi_2,,,\phi_n)\right]^2 \, \exp\left(-1/2 \, r^2\right) \, d\pmb{\sigma} \, dr \\[10pt] &=\, {1\over (2\,\pi)^{n/2} } \, \int_0^D \, {\huge{[}} \underset{ A_{n,r}^{sph}}{\iint} \, \left[z_i (r, \phi_1, \phi_2,,,\phi_n)\right]^2 \, d\pmb{\sigma}_{r, \phi_1,..,\phi_n} {\huge{]}} \, \exp\left(-1/2 \, r^2\right) \, dr \,. \end{align} \tag{35} \]

Here, we have legitimately split the volume integral into an inner integral over the surface of a n-dim sphere of radius r, followed by an integral along the radial dimension. Note, by the way, that this integral gives us the variance terms of cov-matrix for the spherical core of a Z-distribution.

Now, think of an analogous situation in 3 dimensions. Due to symmetries on the surface of a sphere, you will come to the conclusion:

The inner surface integral will give us the very same values for all [z_i]² !

The radial integration does not add any differences to the W_ii. I.e., also for the full integral, we get :

\[ W_{n,D} \,:=\, \operatorname{W}_{11} \,=\, \operatorname{W}_{22} \,=\, …\,=\, \operatorname{W}_{nn} \,. \tag{36} \]

An important consequence of this fact is

\[ \begin{align} \pmb{\operatorname{Cov_C}} \,&=\, \pmb{\operatorname{M}}_Y \bullet \begin{pmatrix} W_{n,D} & & \\ & \ddots & \\ & & W_{n,D} \end{pmatrix} \bullet \pmb{\operatorname{M}}_Y^T \\[10pt] &=\, W_{n,D} * \pmb{\operatorname{M}}_Y \, \pmb{\operatorname{M}}_Y^T \,. \tag{37} \end{align} \]

Due to (15), our final result is:

\[ \pmb{\operatorname{Cov_C}} \,=\, W_{n,D} * \pmb{\operatorname{\Sigma}}_Y \,. \tag{38} \]

This is a noteworthy result! It explains one of the numerical findings for a 2-dim example in the 1st post of this series.

Just to be on the safe side we add a yet unknown factor f_c,n,D to compensate for any differences between the Cov_C as we have defined it and some numerically evaluated Cov_C_,num for a core region:

\[ \pmb{\operatorname{\Sigma}}_Y \,=\, {1 \over W_{n,D}} * f_{c,n,D} * \pmb{\operatorname{Cov}}_{\pmb{\operatorname{C}},num} \,. \tag{39} \]

The friendly reader probably has an idea, already, what such an additional factor could be good for and by what kind of integral it would be provided … If not, wait for next two posts.

Conclusion

We have made good progress regarding our mathematical challenge defined in the 1st post of this series. We first used the definition of the covariance matrix of a multidimensional random vector and studied the nature of related integrals over a volume enclosed within a (n-1)-dimensional ellipsoidal surface. Afterwards, a back-transformation of the MVN’s random vector Y to the standardized random vector Z (composed of n simple independent Gaussians) helped us to reduce our problem down to an integral over the volume of a n-dimensional sphere. After a check of the infinite case, we then regarded situations in which the n-dim sphere had a limited radius D.

We have used n-dimensional polar coordinates to split the required volume integral into an inner integral over the surface area of a n-dimensional sphere (of radius r) and an outer integral over the radial coordinate up to a given limited value r=D. Symmetry considerations have shown that the elements of the covariance matrix of a such a core region of a MVN, i.e. the cov-matrix of the cut-off MVN, differ from the elements of the cov-matrix of the full and infinitely extended MVN by a common factor W_n,D. This factor depends, however, both on the number n of dimensions and the Mahalanobis distance d_m=D limiting the core. In a way, which we still do not know much about.

So, there is more to be done to get an analytical expression for the factor W_n,D. We must fully evaluate our integrals for spheres (or “balls”, if you like) in the ℝ_n. In the next post of this series

Covariance matrix of a cut-off Multivariate Normal Distribution – III – results for a 2-dimensional BVN-core and proper normalization of its cut-off distribution

we will perform this task for a relatively simple 2-dim case. We will afterwards also compare the theoretical results with numerical ones. A bit surprisingly, we will find out that we have to take yet another factor into account. Stay tuned …

Links

[1] R. Mönchmeyer, 2024, “Multivariate Normal Distributions – I – Basics and a random vector of independent Gaussians”,
https://machine-learning.anracom.com/2024/07/19/multivariate-normal-distributions-i-basics-and-a-random-vector-of-independent-gaussians/

[2] R. Mönchmeyer, 2024, “Multivariate Normal Distributions – II – Linear transformation of a random vector with independent standardized normal components”,
https://machine-learning.anracom.com/2024/07/21/multivariate-normal-distributions-ii-linear-transformation-of-a-random-vector-with-independent-standardized-normal-components/