Skip to content

Covariance matrix of a cut-off Multivariate Normal Distribution – II – integrals over volume and surface of an n-dimensional sphere

In the 1st post of this series, we have posed the following problem: Take the probability density of a Multivariate Normal Distribution [MVN], but set it to zero at Mahalanobis distances bigger than a finite distance D. Take a respective volume region enclosed by a contour surface of constant probability density (at the Mahalanobis distance dm=D). We have called such a finite ellipsoidal volume in the n-dimensional space around the MVN’s center a “core” of a MVN. Can we derive the covariance matrix of the original MVN by only using information about data points inside such a central and limited ellipsoidal core?

I have started to discussed cut-off variants of MVN distributions because real word data samples may exhibit a MVN-like probability density only inside a central inner core of a broader distribution, which overall violates MVN-criteria. Actually, the distribution outside an identified core may strongly deviate from a MVN. For practical purposes and numerical analysis, we would nevertheless like to estimate the MVN’s covariance matrix CovY from the distribution of data points confined to space of the ellipsoidal core at the distribution’s center.

In this post, we look a bit closer at the mathematics and in particular the integrals characterizing our problem. We use definitions and conventions as described in the first post.

We will show that the covariance matrix CovY of the infinitely extended MVN differs from the covariance matrix CovC of the core by just a factor – common for all matrix elements. In addition, we bring required the covariance integrals into a manageable form and indicate why and how we can use spherical polar coordinates to determine the covariance matrix of the n-dim ellipsoidal core.

Previous posts:

Reminder: Covariance of a multidimensional random vector

In [1] I have discussed a basic formula, which defines the elements of the variance-covariance matrix [“cov-matrix”] of a multivariate random vector S

\[ \pmb{S} \: = \: \left( \, S_1, \, S_2, \, ….\, S_n\, \right)^T \,. \tag{1} \]
\[ \begin{align} \pmb{\mu}\left(\pmb{S}\right) \,=\, \pmb{\mu}_S \: &= \: \operatorname{\mathbb{E}}\left(\pmb{S} \right) \, = \, \left( \operatorname{\mathbb{E}}(S_1), \, \operatorname{\mathbb{E}}(S_2), …, \, \operatorname{\mathbb{E}}(S_n) \right)^T \, , \tag{2} \\[8pt] \text{centered : } \quad \pmb{\mu}_S \,&=\, 0 \,. \tag{3} \end{align} \]

For the time being, we work in a Cartesian Coordinate System [CCS] to describe concrete elements of S, i.e. vectors s, and the distribution of related data points in the ℝn. For reasons of simplicity, let us consider a distribution whose center coincides with the origin of the CCS.

Note that the one-dimensional components of the random vector may be correlated with each other. The basic definition formula for the covariance of two random variables X and Y is:

\[ \begin{align} \operatorname{Cov}(X, Y) &= \operatorname{\mathbb{E}}\left[\,\left(\,X \,-\, \operatorname{\mathbb{E}}\left(X\right)\,\right) \, \left(\,Y \,-\, \operatorname{\mathbb{E}}\left(Y\right)\, \right)\,\right] \\ &= \operatorname{\mathbb{E}}\left(X Y\right) \,-\, \operatorname{\mathbb{E}}\left(X\right) \operatorname{\mathbb{E}}\left(Y\right) \,. \tag{4} \end{align} \]

The multivariate extension to the components of a random vector S is given by the following term:

\[ \begin{align} \operatorname{Cov}\left(\pmb{S}\right) \: &:= \: \operatorname{\mathbb{E}}\left[\left(\pmb{S} – \operatorname{\mathbb{E}}(\pmb{S}) \right)\, \left(\pmb{S} – \operatorname{\mathbb{E}}(\pmb{S}) \right)^T \right] \, . \tag{5} \end{align} \]
\[ \begin{align} \text{centered : } \quad \operatorname{Cov}\left(\pmb{S}\right) \: &:= \: \operatorname{\mathbb{E}} \left[ \pmb{S} \circ \pmb{S}^T \right] \, . \tag{6} \end{align} \]

The circ in (6) symbolizes the matrix multiplication of Linear Algebra. Note that due to multiplying a standard vertical line vector with a horizontal column vector in equation (6) actually defines a matrix :

\[ \begin{align} \pmb{\Sigma}_S \,=\, \operatorname{Cov}\left(\pmb{S}\right)\:&=\:\operatorname{\mathbb{E}} {\begin{pmatrix} (S_{1}-\mu_{1})^{2}&(S_{1}-\mu_{1})(S_{2}-\mu_{2})&\cdots &(S_{1}-\mu_{1})(S_{n}-\mu_{n})\\(S_{2}-\mu_{2})(S_{1}-\mu_{1})&(S_{2}-\mu_{2})^{2}&\cdots &(S_{2}-\mu _{2})(S_{n}-\mu_{n})\\\vdots &\vdots &\ddots &\vdots \\(S_{n}-\mu_{n})(S_{1}-\mu_{1})&(S_{n}-\mu_{n})(S_{2}-\mu _{2})&\cdots &(X_{n}-\mu_{n})^{2} \end{pmatrix}} \tag{7} \\[16pt] &\rightarrow \quad \pmb{\Sigma}_S \,\, =\,\, {\begin{pmatrix} \operatorname{Var} (S_{1})&\operatorname{Cov} (S_{1},S_{2})&\cdots &\operatorname{Cov} (S_{1},S_{n})\\ \operatorname{Cov} (S_{2},S_{1})&\operatorname{Var} (S_{2})&\cdots &\operatorname{Cov} (S_{2},S_{n})\\\vdots &\vdots &\ddots &\vdots \\ \operatorname{Cov} (S_{n},S_{1})&\operatorname{Cov} (S_{n},S_{2})&\cdots &\operatorname {Var} (S_{n}) \end{pmatrix}} \,\,. \tag{8} \end{align} \]

On the level of the matrix elements the multidimensional case is reduced to the evaluation of the usual variance/covariance between two components of S.

Reminder: MVN

In [2], we have defined a MVN of a random vector YN – as the result of a linear transformation of a standardized random vector Z. The components Zi of Z are continuous independent and standardized 1-dim Gaussians. Z is a special form of a n-dim MVN. We denote these distributions as

\[ \begin{align} \text{general MVN : } \,\,\,\, \pmb{Y}_N \,&\sim\, \pmb{\mathcal{N}}_n \left(\pmb{\mu}_{\small Y},\, \pmb{\Sigma}_{\small Y} \right) \,, \tag{9} \\[10pt] \text{standardized, centered MVN : } \,\,\,\, \pmb{Z} \, &\sim \pmb{\mathcal{N}}_n \left( \pmb{0}, \, \pmb{\operatorname{I}} \right), \,\,\, {\small with} \,\,\, Z_j \,\sim\, \mathcal{N}_1 \left( 0, \,1 \right) \,. \tag{10} \end{align} \]

Z has a very simple probability density function

\[ g_{\small Z}(\pmb{z}) \,=\, {1 \over (2\pi)^{n/2} } \, e^{ – \, {\Large 1 \over \Large 2} \, \left[ \left( {\Large \pmb{z} } \right)^{\Large T} \, \left( {\Large \pmb{z} } \right) \right] } \,=\, {1 \over (2\pi)^{n/2} } \, e^{ – \, {\Large 1 \over \Large 2} \, { \Large |\pmb{z}|}^2 } \,. \tag{10.1} \]

We create a random vector YN of a MVN via a linear transformation of Z

\[ \pmb{Y_N} \,=\, \left(Y_1, \, Y_2, \, ….., \, Y_n \right)^T \: = \: \pmb{\operatorname{M}}_{\small Y} \, {\small \bullet} \, \pmb{Z} + \pmb{\mu}_y \,. \tag{11} \]

To a avoid infinitesimally thin degenerations, we choose MY in our context. I.e.:

\[ \pmb{\operatorname{ M }}_{\small Y} \, {\small \bullet} \, \pmb{\operatorname{ M }}_{\small Y}^{-1} \,=\, \pmb{\operatorname{ I }} \,. \tag{12} \]

The transformed probability density gY(y) at the end-point of a concrete vector y of the continuous non-degenerate distribution YN is

\[ g_{\small Y}(\pmb{y}) \,=\, {1 \over (2\pi)^{n/2} \, \left(\operatorname{det}\pmb{\operatorname{\Sigma}}_{\small Y}\right)^{1/2} } \, {\Large e}^{ – \, {\Large 1 \over \Large 2} \, \left[ \left( {\Large \pmb{y} \,-\, \pmb{\mu}_y } \right)^{\Large T} \, { \Large \pmb{\operatorname{\Sigma}}}_Y^{\Large -1} \, \left( {\Large \pmb{y} \,-\, \pmb{\mu}_y } \right) \right] } \,, \tag{13} \]
\[ \pmb{y} \,=\, \left(\,y_1,\, y_2, \, …\, y_n \right)^T \,. \tag{14} \]

Below, we will work with centered distributions, only. I.e., we assume

\[ \pmb{\mu}_y \,=\, 0 \,. \]

ΣY is an invertible, symmetric and positive-definite matrix. Actually, it is the covariance matrix CovY of YN :

\[ \begin{align} \pmb{\operatorname{\Sigma}}_{\small Y} \, &= \, \pmb{\operatorname{M}}_{\small Y}\, \pmb{\operatorname{M}}_{\small Y}^T \,=\, \pmb{\operatorname{Cov_Y}} \,, \tag{15} \\[10pt] \pmb{\operatorname{\Sigma}}_{\small Y}^{-1} \,&=\, \left(\pmb{\operatorname{M}}_{\small Y}^T\right)^{-1} \pmb{\operatorname{M}}_{\small Y}^{-1} \,=\, \left(\pmb{\operatorname{M}}_{\small Y}^{-1}\right)^T \pmb{\operatorname{M}}_{\small Y}^{-1} \,. \tag{16} \end{align} \, . \]

For a given ΣY we can get a valid MY by a Cholesky decomposition of ΣY. We also have

\[\operatorname{det} \left(\pmb{\operatorname{M}}_Y\right) \,= \, \left|\pmb{\operatorname{M}}_Y\right| \,=\, \operatorname{det} \left(\pmb{\operatorname{M}}_Y^T\right) \,= \, \sqrt{ \operatorname{det} \left(\pmb{\Sigma}_Y\right) \,} \, \gt 0\,. \tag{17} \]

Due to the fact that we can also perform an eigendecomposition of the invertible MY, the linear transformation of Z can be thought to be composed of a scaling of the vectors (and their components), followed by a rotation. The Mahalanobis distance of a vector y is defined by

\[ d_m(\pmb{y}) \,=\, \sqrt{ \, \pmb{y}^T\, \pmb{\Sigma}_Y^{-1} \, \pmb{y} \, } \,. \tag{18} \]

Vectors y fulfilling

\[ {\small{\text{Surface of n-dim ellipsoid : }}} \quad (d_m(\pmb{y}))^2 \,=\, D^2 \,=\, \pmb{y}^T\, \pmb{\Sigma}_Y^{-1} \, \pmb{y} \,=\, const. \tag{19} \]

reach out from the CCS-origin and end on the (n-1)-dim surface of a n-dim ellipsoid. (Reason: (19) defines a quadratic form!).

Important Note: When I speak of an ellipsoid, I usually refer to the voluminous body enclosed in an ellipsoidal surface and not the surface, itself. When a clearer distinction gets necessary, I will say this explicitly in the text.

Random vector Y is just the random vector Z – but seen from a different CCS

At this point I want to get you a different perspective onto the linear transformation of eq. (11). Remember, there is always a symmetry between an active and a passive vector rotation. Let us call the CCS in which we describe the random vector Z more specifically “CCSZ“. The scaling operation and forward rotation mediated by MY is equivalent to a description of z-vectors in a backward rotated CCSY with scaled coordinate axes. In this rotated CCSY the vectors get yi-coordinates. Therefore, the vector YN can just be understood as the vector Z described in a different coordinate system CCSY.

Normally, we evaluate the covariance matrix CovY according to (8) in a selected coordinate system CCSY. But: With the help of a proper coordinate transformation we could also work in the original CCSZ. I will use this fact later on.

Re-evaluation of the covariance matrix via integrals

According to the general definitions (5) to (8), we can determine the covariance matrix ΣY of YN via the evaluation of a pretty complex integral in n dimensions. To calculate the the expectation value (8) for a centered YN we use YN‘s probability density (13) – and get

\[ \operatorname{\mathbb{E}} \left(\, \pmb{Y}_N\, \pmb{Y}^T_N\,\right) \,=\, C_n \,*\, \int \int … \int \, \pmb{Y}_N\, \pmb{Y}^T_N \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Y}_N^T \,\pmb{\Sigma}_{\small Y}^{-1} \, \pmb{Y}_N \,\right) \, dy_1\,dy_2 \, ….\, dy_n \,. \tag{20} \]

Cn covers the constants in the the expression for the probability density gY(y); see (13):

\[ C_n \,=\, {1 \over (2\pi)^{n/2} \, \left(\operatorname{det}\pmb{\operatorname{\Sigma}}_{\small Y}\right)^{1/2} } \,. \tag{21} \]

The notation with random vectors in (20) may appear a bit strange at first sight. But remember: This is only an abbreviation. The integration has to be performed over the continuous probability distribution of concrete position vectors y covering the whole ℝn. And once again note: The term YN YNT actually defines a (nxn)- matrix and its elements – and not a scalar product.

If our claim (15) that ΣY actually is the covariance matrix of YN, we should be able to properly resolve its self-reference in

\[ \pmb{\Sigma}_Y \,=\, \text{INT} \,=\, C_n \,*\, \int \int … \int \, \pmb{Y}_N\, \pmb{Y}^T_N \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Y}^T \,\pmb{\Sigma}_{\small Y}^{-1} \, \pmb{Y} \,\right) \, dy_1\,dy_2 \, ….\, dy_n \,, \tag{22} \]

without any contradictions. As a MVN stretches out to infinity, the integral has to be evaluated over the whole of the ℝn to get YN ‘s covariance matrix ΣY.

To perform the integration efficiently, we could make use of the fact that the exponent in the probability density gets constant on ellipsoidal surfaces. We could, therefore, also be tempted to transform everything to ellipsoidal coordinates. However and as we will see, we can actually avoid elliptical coordinates by using a back-transformation of YN to the original Z-vector defined in (11).

Integrals of Z-dependent functions

Regarding the integral (22) we also want to switch to zi -variables instead of yi -variables. The reasons are

  • that Z-components can easily be expressed in spherical coordinates and
  • that the unavoidable exponential terms gets substantially simpler.

For a given symmetric positive-definite ΣY, we can use a Cholesky decomposition to find a valid (invertible) MY. This allows us to replace YN by Z (composed of independent Gaussians):

\[ \begin{align} \pmb{Y}_N^T \,\pmb{\Sigma}_Y^{-1} \, \pmb{Y}_N \,&=\, \left[\pmb{\operatorname{M}}_Y\, \pmb{Z} \right]^T \, \left[ \pmb{\operatorname{M}}_Y \, \pmb{\operatorname{\operatorname{M}}}_Y^T \right]^{-1} \, \left[\pmb{\operatorname{M}}_Y\, \pmb{Z} \right] \\[10pt] &= \, \pmb{Z}^T \,\, \pmb{\operatorname{M}}_Y^T \left[ \pmb{\operatorname{M}}_Y^T \right]^{-1} \,\, \left[\pmb{\operatorname{M}}_Y\right]^{-1} \pmb{\operatorname{M}}_Y\,\, \pmb{Z} \\[10pt] &=\, \pmb{Z}^T \pmb{Z} \,. \tag{23} \end{align} \]

What does this switch of a description of our y-vectors to different coordinates, more precisely zi coordinates mean? Well, we now pick up our idea discussed two sections above:

By replacing the yi-coordinates by xi-coordinates, we perform a (reverse) coordinate transformation to our original coordinate system CCSZ from which we started with our Z. How do we have to compensate for this back-transformation in our integral? This is easy: Multidimensional calculus tells us that we have to take into account the Jacobi determinant of the original coordinate transformation. In the following sense:

\[ \iint .. \int \, f(y_1, y_2, …y_n) \, dy_1 \, dy_2 \, … \, dy_n \, =\, \\[10pt] \iint .. \int \, f\left(\, y_1(x_1, ..x_n), …, \, y_n(x_1, …x_n)\,\right) \, \left| \pmb{\operatorname{J_{y,x} }} \right| \, dx_1\,dx_2, … dx_n \,, \]

with

\[ \left| \pmb{\operatorname{J_{y,x}}} \right| \,= \, \left| \begin{pmatrix} {\partial y_1 \over \partial x_1} & {\partial y_1 \over \partial x_2} & … & {\partial y_1 \over \partial x_n} \\ …& … & … & … \\ {\partial y_n \over \partial x_1} & {\partial y_n \over \partial x_2} & … & {\partial y_n \over \partial x_n} \end{pmatrix} \right| \]

In our linear case this simply boils down to

\[ \left| \pmb{\operatorname{J_{y,x}}} \right| \,= \, \left|\pmb{\operatorname{M}}_Y\right| \, . \]

Our integral expressed in z-coordinates therefore becomes

\[ \begin{align} \text{INT} \,&=\, C_n \,*\, \int \int … \int \, \pmb{\operatorname{M}}_Y \pmb{Z}\, \left[ \pmb{\operatorname{M}}_Y \pmb{Z}\right]^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, \left| \pmb{\operatorname{M}}_Y \right| \, dz_1\,dz_2 \, ….\, dz_n \,, \\[10pt] &=\, C_n \,*\, \int \int … \int \, \pmb{\operatorname{M}}_Y \,\, \pmb{Z}\, \pmb{Z}^T \,\, \pmb{\operatorname{M}}_Y^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, \left| \pmb{\operatorname{M}}_Y \right| \, dz_1\,dz_2 \, ….\, dz_n \,. \tag{24} \end{align}\]

For an evaluation of the MVN’s cov-matrix, the integration has to be performed over the full ℝn, but now in a CCS in which the y-vectors are seen as the original z-vectors.

Note A: If we had finite integrals the integral limits would have to become subject of the inverse transformation, too.

Note B: The term in the large square brackets defines a matrix via the ZZT -term – it gives variance and covariance terms of Z‘s various marginal distributions. Also note the difference in order to the term ZTZ appearing in the exponential, which is a scalar product and gives a scalar value.

The term

\[ {\small{\text{Coord. trafo of matrix} : }} \quad \pmb{\operatorname{M}}_Y \,\, \pmb{Z}\, \pmb{Z}^T \,\, \pmb{\operatorname{M}}_Y^T \]

just corresponds to a coordinate transformation of the matrix ZZT to CCSY. It is the scalar matrix elements which get affected by the integration over the z-space. A short consideration shows that we can move the linear operators out of the integral – which eventually results in

\[ \text{INT} \,=\, \pmb{\operatorname{M}}_Y \left[ {1\over (2\,\pi)^{n/2} } \, \int \int … \int \, \pmb{Z} \pmb{Z}^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, dz_1\,dz_2 \, ….\, dz_n \right] \pmb{\operatorname{M}}_Y^T \,. \tag{25} \]

This is an interesting formula, because it restricts the integration to the rather simple Z-distribution of standardized independent 1-dim Gaussians.

Why is the integral evaluated over the ℝn equal to the covariance matrix?

Now, we need some additional tricky considerations to understand why equations (15) and (22) actually are correct:

  1. Because of (10.1), the probability density for data points residing on n-dim spherical shells is constant. The reason is that the zzT terms just give us the length r = |z| of the respective position vectors z.
  2. The integrals for combinations (zi zj) over such shells will vanish – due to symmetries and appearing terms with opposite signs. This means that the off-diagonal terms of the matrix are zero.
  3. Therefore the central matrix in the rectangular brackets of (25) will always get diagonal by the integration over shells up to any radius r=R.
  4. In addition the exponential can be factorized and written as a product of exponentials, each for squares of coordinate values.
  5. For the remaining diagonal matrix terms, we, therefore, get a product of integrals over exponentials exp(-1/2 zi2). When integrating over the full ℝn, each integral of exp(-1/2 zi2) from -∞ to +∞ contributes with a factor (2π)1/2 . So, together they compensate for the factor in front of the integral (25).
  6. Thus, the integration eventually gives us the n-dimensional diagonal identity matrix In in between the M-matrices – and therefore, indeed
\[\pmb{\Sigma}_Y \,=\, \pmb{M}_Y \, \left[ \pmb{I}_n \right] \, \pmb{M}_Y^T \,. \tag{26 } \]

Our result strengthens our trust in the term inside the square brackets of (25).

Remember: The off-diagonal matrix elements in (25) get zero – independent of whether we integrate to radial infinity or just to a finite radius R of the z-vectors r=|z|=R.

Application in the context of our problem

What do we learn from the above integral for a modified MVN, which starts normally around the CCS-origin, but gets cut-off at a certain Mahalanobis distance D? I.e., we look at a distribution gC(y) whose density distribution is given by

\[ \begin{align*} d_m(\pmb{y}) \,\le\, D \,:\,\,\, g_{\small C}(\pmb{y}) \,&= \, g_{\small Y}(\pmb{y}) \,, \\[10pt] d_m(\pmb{y}) \,\gt\, D \,:\,\,\, g_{\small C}(\pmb{y}) \,&= \, 0,\,. \end{align*} \tag{27} \]

We have called the ellipsoidal volume inside of D a “core”. The core is limited by by vectors having a Mahalanobis distance measure dm = D. An important feature of MVNs is the following:

  • The measure dm is not affected by the transformations MY or its inverse, i.e. dm(y) = dm(z).

We can see this clearly by applying the same steps as in (23) to a concrete vector y and by taking into account that the covariance matrix of Z is the identity matrix:

\[ \begin{align} d_m^2\left(\pmb{y}\right) \,& =\, \pmb{y}^T \, \pmb{\operatorname{\Sigma}}_Y^{-1} \, \pmb{y} \,=\, \left[\pmb{\operatorname{M}}_Y\, \pmb{z} \right]^T \, \, \left[ \pmb{\operatorname{M}}_Y \, \pmb{\operatorname{\operatorname{M}}}_Y^T \right]^{-1} \, \left[ \pmb{\operatorname{M}}_Y \pmb{z} \right] \\[10pt] &=\, \pmb{z}^T \, \pmb{\operatorname{I}_n} \, \pmb{z} \,=\, d_m^2\left(\pmb{z}\right) \,. \end{align} \tag{28} \]

This means that all y-vectors, which have a Mahalanobis distance dmD are mapped to z-vectors having a length |z| ≤ D:

\[ \left\{ \, \pmb{y} \,\,|\,\,\, d_m(\pmb{y}) \le D \, \right\} \quad \Rightarrow \quad \left\{ \, \pmb{z} \,\,|\,\, |\pmb{z}| \le D \, \right\} \,. \tag{29} \]

Note: The length r = |z| of a z-vector is defined by its Euclidean norm as

\[ r \,=\, |\pmb{z}| \,=\, \sqrt{ z_1^2 + z_1^2 + … + z_n^2} \,. \tag{30} \]

Integration simplified

We use the following symbols for an ellipsoidal volume limited by D, the volume of the corresponding multidimensional sphere and its surface area:

\[ \begin{align} {\small{\text{Volume of n-dim ellipsoid given by }}} d_m=D \,:\, & \quad V_{n,D}^{ell} \,, \\[10pt] {\small{\text{Volume of n-dim sphere given by }}} d_m=D \,:\, & \quad V_{n,D}^{sph} \,, \\[10pt] {\small{\text{Surface of n-dim sphere given by }}} d_m=D \,:\, & \quad A_{n,D}^{sph} \,, \\[10pt] {\small{\text{Surface of n-dim sphere given by }}} r=R \,:\, & \quad A_{n,R}^{sph} \,. \end{align} \]

Note: Some authors in the literature prefer to indicate the fact that the dimensionality of a spherical surface manifold in the ℝn actually is (n-1) by a respective subscript. I do not do this here. The subscript n always refers to the dimensionality of the embedding ℝn.

To get the covariance matrix of the n-dim ellipsoidal core, we just have to limit our integration in (22) to the n-dim volume of the ellipsoid. Due to the mapping (29), the ellipsoid (i.e. the core) is obviously mapped to a sphere when we shift to z-coordinates (in CCSZ):

\[ V_{n,D}^{ell} \quad \Rightarrow \quad V_{n,D}^{sph} \,. \tag{31} \]

Therefore, we get the following integral for the core’s cov-matrix CovC:

\[ \pmb{\operatorname{Cov_C}} \,=\, C_n \,*\, \underset{V_{n,D}^{ell}}{\int \int … \int} \, \pmb{Y}_N\, \pmb{Y}^T_N \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Y}^T \,\pmb{\Sigma}_{\small Y}^{-1} \, \pmb{Y} \,\right) \, dy_1\,dy_2 \, ….\, dy_n \,, \tag{32} \]

which due to (25) and (31) results in

\[ \pmb{\operatorname{Cov_C}} \,=\, \pmb{M}_Y \left[ {1\over (2\,\pi)^{n/2} } \, \underset{ V_{n,D}^{sph}}{\iint …\int} \, \pmb{Z} \pmb{Z}^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, dz_1\,dz_2 \, ….\, dz_n \right] \pmb{M}_Y^T \,. \tag{33} \]

Again: To avoid any misconceptions, please, note the following fact:

\[ \underset{ V_{n,D}^{sph}}{\iint …\int} \, \pmb{Z} \pmb{Z}^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, dz_1\,dz_2 \, ….\, dz_n \\ \ne \, \underset{ V_{n,D}^{sph}}{\iint …\int} \, \pmb{Z}^T \pmb{Z} \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, dz_1\,dz_2 \, ….\, dz_n \,. \]

The integrand of the second integral corresponds to the probability function of a chi-square distribution. But this is not what we are interested here. However, the topic of chi-squared distributions will come up again in a later post of this series.

Another topic which we will ignore for the time being is the question how we can normalize our cut-off distribution, which we have confined to the core and whose probability density we have set to zero outside the core. Normalization typically means that we should multiply the probability density with a certain constant factor to ensure that the integral over all space gets the value 1. We ignore such a factor for the cut-off distribution for the time being. But see eq. (39) below and await the upcoming 4th post of this series.

Using symmetries on the surfaces of n-dim spheres …

We can obviously try to solve integral (33) in spherical coordinates by working with n-dim spherical shells and respective n-dim polar coordinates (r, Φ1, Φ2, …Φn). Let us call an infinitesimal surface element expressed in polar coordinates dσ

\[ \begin{align} d\pmb{\sigma} \,& =\, d\pmb{\sigma}_{r, \phi_1, \phi_2, …\phi_n} \\[10pt] & =\, d\pmb{\sigma}(r, \phi_1, \phi_2, …\phi_n) \\[10pt] & =\, r^2\, |\pmb{J}_{polar}| \, d\phi_1, d\phi_2, …\phi_n \,. \end{align} \tag{34} \]

It depends, of course, on angles and r – and involves the Jacobi-determinant of a (nxn)-matrix, which contains the partial derivatives of the functional dependencies of polar coordinates on their Cartesian counterparts. We are not interested in the details of this Jacobi determinant, yet.

Now, let us look a bit closer at integral (33). We already know that the matrix inside the square brackets becomes a diagonal (nxn)-matrix. Let us call this matrix W. The non-zero elements Wii of W are obviously given by integrals of the form

\[ \begin{align} \operatorname{W}_{ii} \,&=\, {1\over (2\,\pi)^{n/2} } \, \underset{ V_{n,D}^{sph}}{\iint …\int} \, \left[z_i (r, \phi_1, \phi_2,,,\phi_n)\right]^2 \, \exp\left(-1/2 \, r^2\right) \, d\pmb{\sigma} \, dr \\[10pt] &=\, {1\over (2\,\pi)^{n/2} } \, \int_0^D \, {\huge{[}} \underset{ A_{n,r}^{sph}}{\iint} \, \left[z_i (r, \phi_1, \phi_2,,,\phi_n)\right]^2 \, d\pmb{\sigma}_{r, \phi_1,..,\phi_n} {\huge{]}} \, \exp\left(-1/2 \, r^2\right) \, dr \,. \end{align} \tag{35} \]

Here, we have legitimately split the volume integral into an inner integral over the surface of a n-dim sphere of radius r, followed by an integral along the radial dimension. Note, by the way, that this integral gives us the variance terms of cov-matrix for the spherical core of a Z-distribution.

Now, think of an analogous situation in 3 dimensions. Due to symmetries on the surface of a sphere, you will come to the conclusion:

  • The inner surface integral will give us the very same values for all [zi]2 !

The radial integration does not add any differences to the Wii. I.e., also for the full integral, we get :

\[ W_{n,D} \,:=\, \operatorname{W}_{11} \,=\, \operatorname{W}_{22} \,=\, …\,=\, \operatorname{W}_{nn} \,. \tag{36} \]

An important consequence of this fact is

\[ \begin{align} \pmb{\operatorname{Cov_C}} \,&=\, \pmb{\operatorname{M}}_Y \bullet \begin{pmatrix} W_{n,D} & & \\ & \ddots & \\ & & W_{n,D} \end{pmatrix} \bullet \pmb{\operatorname{M}}_Y^T \\[10pt] &=\, W_{n,D} * \pmb{\operatorname{M}}_Y \, \pmb{\operatorname{M}}_Y^T \,. \tag{37} \end{align} \]

Due to (15), our final result is:

\[ \pmb{\operatorname{Cov_C}} \,=\, W_{n,D} * \pmb{\operatorname{\Sigma}}_Y \,. \tag{38} \]

This is a noteworthy result! It explains one of the numerical findings for a 2-dim example in the 1st post of this series.

Just to be on the safe side we add a yet unknown factor fc,n,D to compensate for any differences between the CovC as we have defined it and some numerically evaluated CovC,num for a core region:

\[ \pmb{\operatorname{\Sigma}}_Y \,=\, {1 \over W_{n,D}} * f_{c,n,D} * \pmb{\operatorname{Cov}}_{\pmb{\operatorname{C}},num} \,. \tag{39} \]

The friendly reader probably has an idea, already, what such an additional factor could be good for and by what kind of integral it would be provided … If not, wait for next two posts.

Conclusion

We have made good progress regarding our mathematical challenge defined in the 1st post of this series. We first used the definition of the covariance matrix of a multidimensional random vector and studied the nature of related integrals over a volume enclosed within a (n-1)-dimensional ellipsoidal surface. Afterwards, a back-transformation of the MVN’s random vector Y to the standardized random vector Z (composed of n simple independent Gaussians) helped us to reduce our problem down to an integral over the volume of a n-dimensional sphere. After a check of the infinite case, we then regarded situations in which the n-dim sphere had a limited radius D.

We have used n-dimensional polar coordinates to split the required volume integral into an inner integral over the surface area of a n-dimensional sphere (of radius r) and an outer integral over the radial coordinate up to a given limited value r=D. Symmetry considerations have shown that the elements of the covariance matrix of a such a core region of a MVN, i.e. the cov-matrix of the cut-off MVN, differ from the elements of the cov-matrix of the full and infinitely extended MVN by a common factor Wn,D. This factor depends, however, both on the number n of dimensions and the Mahalanobis distance dm=D limiting the core. In a way, which we still do not know much about.

So, there is more to be done to get an analytical expression for the factor Wn,D. We must fully evaluate our integrals for spheres (or “balls”, if you like) in the ℝn. In the next post of this series

we will perform this task for a relatively simple 2-dim case. We will afterwards also compare the theoretical results with numerical ones. A bit surprisingly, we will find out that we have to take yet another factor into account. Stay tuned …

Links

[1] R. Mönchmeyer, 2024, “Multivariate Normal Distributions – I – Basics and a random vector of independent Gaussians”,
https://machine-learning.anracom.com/2024/07/19/multivariate-normal-distributions-i-basics-and-a-random-vector-of-independent-gaussians/

[2] R. Mönchmeyer, 2024, “Multivariate Normal Distributions – II – Linear transformation of a random vector with independent standardized normal components”,
https://machine-learning.anracom.com/2024/07/21/multivariate-normal-distributions-ii-linear-transformation-of-a-random-vector-with-independent-standardized-normal-components/