Skip to content

Covariance matrix of a cut-off Multivariate Normal Distribution – III – results for a 2-dimensional BVN-core and proper normalization of its cut-off distribution

In the math section of this blog, we try to cover interesting aspects of Multivariate Normal Distributions [MVNs]. The topic of this post series is the covariance matrix of a MVN-like distribution confined inside a hyper-surface of constant probability density. Outside of the surface we set the probability density to zero. This gives us a “cut-off” MVN- distribution. Contour surfaces of a MVN in general enclose a central ellipsoidal volume in the MVN’s n-dimensional space. We have called such a volume a core.

We want to answer the following question: How does the covariance matrices of such a core and its cut-off MVN differ from the covariance matrix of the related standard MVN which stretches out to infinite distances in the n-dimensional space? In this post we are going to analyze the 2-dimensional problem for a cut-off Bivariate Normal Distribution [BVN] in detail. We will then use a respective sample of data points inside a finite Mahalanobis distance D to numerically test and verify theoretical predictions.

In a first step we will apply the results of the 2nd post in this series and evaluate required integrals in polar coordinates. The analytical result will confirm that the covariance matrix of the original MVN can be calculated by multiplying the core’s covariance matrix by a factor which is the same for all matrix elements. The core’s cov-matrix can be estimated from a numerical integration of respective sample data. We will, however, learn that we must apply an additional correction factor to compensate for a systematic difference still existing between our theoretical results and numerical integrations. This factor is given by a proper normalization of the cut-off distribution.

We use abbreviations and conventions defined in the 1st post of this series. Note that some authors use the abbreviations BVD, MVD instead of our acronyms BVN, MVN, respectively.

Previous posts

A two dimensional scenario

We reuse the 2-dim BVN-example already presented in the 1st post of this series. It was based on the following (2×2)-cov-matrix Σ2 :

\[ \pmb{\operatorname{{Cov}_{2,Y}}} \,=\, \pmb{\operatorname{\Sigma}}_{2} \,=\, \begin{pmatrix} 21 & 6 \\ 6 & 7 \end{pmatrix} \,. \tag{1}\]
Cut-off BVN limited to an ellipsoidal core

We want to calculate the cov-matrix of the infinite distribution depicted in the upper left quadrant. The data available are, however, only data for the cut-off distribution depicted in the lower left corner. The value of the Mahalanobis distance dm defining the surface of the core is

\[ d_m\, =\, D \,=\, 1.4 \,. \]

Let us in a first step analyze the situation mathematically for a general D. We, of course, use results of post 2.

Integrals to solve for the n-dim case

The random vector YN of a MVN has a probability density g(y) controlled by a symmetric and invertible covariance matrix ΣY = CovY; see post 2. This density g(y) has contour surfaces surrounding the volume of an ellipsoid in the n-dimensional space of the MVN. In post 2, we have shown that we can nevertheless evaluate a volume integral over a spherical volume to get the covariance matrix CovC of the contents of a MVN’s ellipsoidal core:

\[ \pmb{\operatorname{Cov_C}} \,=\, \pmb{\operatorname{M}}_Y \left[ {1\over (2\,\pi)^{n/2} } \, \underset{ V_{n,D}^{sph}}{\iint …\int} \, \pmb{Z} \pmb{Z}^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}^T \, \pmb{Z} \,\right) \, dz_1\,dz_2 \, ….\, dz_n \right] \pmb{\operatorname{M}}_Y^T \,. \tag{2} \]

The spherical volume is just a mapping of the original ellipsoidal core of Y mediated by the coordinate transformation [MY]-1 that maps a general YN to Z. Z is the random vector of a special MVN-distribution build from independent 1-dimensional Gaussian distributions. (The MVN Y can be generated from Z by MY.) The term in the square brackets actually gives us a matrix.

The integration (2) has to be performed over the volume of a n-dimensional sphere of radius r=D :

\[ \begin{align} {\small{\text{Volume of n-dim sphere of radius }}} d_m=D \,:\, & \quad V_{n,D}^{sph} \,, \\[10pt] {\small{\text{Volume of n-dim sphere of radius }}} d_m=r \,:\, & \quad V_{n,r}^{sph} \,, \\[10pt] {\small{\text{Surface of n-dim sphere of radius }}} d_m=r \,:\, & \quad A_{n,r}^{sph} \,. \end{align} \]

Note: I deviate a bit regarding dimensionality subscripts from other authors, who prefer to use (n-1) as a subscript of surfaces to indicate the lower dimensionality of the manifold in the ℝn.

MY is a linear (nxn)-transformation matrix which generates a general n-dim MVN of a random vector Y with correlations between its components (see the 2nd post for details):

\[ \begin{align} \pmb{\operatorname{\Sigma}}_{Y} \, &= \, \pmb{\operatorname{M}}_{\small Y}\, \pmb{\operatorname{M}}_{\small Y}^T \,=\, \pmb{\operatorname{Cov_Y}} \,, \tag{3} \\[10pt] \pmb{\operatorname{\Sigma}}_{Y}^{-1} \,&=\, \left(\pmb{\operatorname{M}}_{\small Y}^T\right)^{-1} \pmb{\operatorname{M}}_{\small Y}^{-1} \,=\, \left(\pmb{\operatorname{M}}_{\small Y}^{-1}\right)^T \pmb{\operatorname{M}}_{\small Y}^{-1} \,. \tag{4} \end{align} \, . \]

CovY = ΣY is the symmetric and invertible covariance matrix of the MVN.

Two dimensions only …

For two dimensions, only, the exponential in integral (2) is the probability density of a special standardized BVN distribution for a respective random vector Z = Z2d = (Z1, Z2)T – with no correlation of its two marginals,

\[ \begin{align} & \text{standardized BVN without correlation of the marginals: } \\[10pt] & \pmb{Z}_{2d} \, \sim \pmb{\mathcal{N}}_2 \left( \pmb{0}, \, \pmb{\operatorname{I}} \right), \,\,\, {\small with} \,\,\, Z_1, Z_2 \,\sim\, \mathcal{N}_1 \left( 0, \,1 \right) \,. \tag{5} \end{align} \]

Z2d has a very simple probability density function, given at the end-point of a vector z2d = (z1, z2)T

\[ \begin{align} & g_{2, {\small Z}}(\pmb{z}_{2d}) \,=\, {1 \over (2\pi) } \, e^{ – \, {\Large 1 \over \Large 2} \, { \Large |\pmb{z}_{2d}|}^2 } \,. \tag{6} \\[10pt] & \pmb{z}_{2d} \, = \, (z_1, \, z_2)^T\,, \,\,\, |\pmb{z}_{2d}| \,=\, r \,=\,\sqrt{z_1^2 + z_2^2\, } \,. \tag{7} \end{align} \]

We name the cov-matrix of a related 2-dim general BVN for a random vector Y2d, which we may have created from Z2 by an invertible (2×2) matrix M2 (see post 2 for details):

\[ \pmb{\operatorname{Cov}}_ {2, \pmb{\operatorname{Y}}} \,=\, \pmb{\Sigma}_2 \,= \, \pmb{\operatorname{M}}_2\, \pmb{\operatorname{M}}_2^T \,. \tag{8} \]

The integral becomes

\[ \pmb{\operatorname{Cov}}_{2,\pmb{\operatorname{C}}} \, =\, \pmb{\operatorname{M}}_2 \left[ {1\over 2\,\pi } \, \underset{ V_{2,D}^{sph}}{\iint } \, \pmb{Z}_{2d} \pmb{Z}_{2d}^T \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{Z}_{2d}^T \, \pmb{Z}_{2d} \,\right) \, dz_1\,dz_2 \right] \pmb{\operatorname{M}}_2^T \,. \tag{9} \]

The “volume” V2,Dsph of the spherical core (for dm=D) in (6) actually is the area of a circle. To make the notation look more familiar, we replace z1, z2 by x and y:

\[ \pmb{z}_{2d} \, = \, (x, \, y)^T\,, \,\,\,\,|\pmb{z}_{2d}| \,=\, r \,=\,\sqrt{x^2 + y^2\, } \,. \tag{10} \]

Our integral then gets

\[ \pmb{\operatorname{Cov}}_{2,\pmb{\operatorname{C}}} \, =\, \pmb{\operatorname{M}}_2 \left[ {1\over 2\,\pi } \, \underset{ V_{2,D}^{sph}}{\iint …\int} \, (x,\, y)^T\circ (x, y) \, \, \operatorname{exp} \left( -\, {1\over 2} \, r^2 \right) \, dx \,dy \right] \pmb{\operatorname{M}}_2^T \,. \tag{11} \]

Integrals for the off-diagonal elements

Let us write down the central integrals for the off-diagonal elements of the matrix Cov2,C for the contents within our 2-dim core

\[ \operatorname{Int}_{xy,D} \,=\, \left[ {1\over (2\,\pi) } \, \underset{ V_{2,D}^{sph}}{\iint} \, x \, y * \operatorname{exp} \left( -\, {1\over 2} \, r^2 \,\right) \, dx \,dy \right] \,. \tag{12} \]

Switching to polar coordinates (r, φ) we find

\[ \begin{align} x \,&=\, r * \cos( \varphi) \,, \\ y \,&=\, r*\sin(\varphi) \,. \end{align} \tag{13} \]
\[ \begin{align} \operatorname{Int}_{xy,D} \,&=\, {1\over (2\,\pi) } \, {\int_0^D \int_0^{2\pi} } \, r^2 \,\cos(\varphi) \, \sin(\varphi) \,\, \operatorname{exp} \left( -\, {1\over 2} \, r^2 \,\right)\, r\, d \varphi \, dr \\[10pt] &=\, {1\over (2\,\pi) } \, \int_0^D \, \left( \int_0^{2\pi} \, {1\over2} \sin(2 \varphi) \, d \varphi \, \right) \, r^3 \, \operatorname{exp} \left( -\, {1\over 2} \, r^2 \,\right) \, dr \\[10pt] & =\, 0\,. \end{align} \tag{14} \]

I.e. CovC in our 2-dim case becomes a diagonal matrix W2,diag :

\[ \pmb{\operatorname{Cov}}_ {2, \pmb{\operatorname{C}}} \,=\, \pmb{\operatorname{W}}_{2,diag} \,=\, \begin{pmatrix} W_{xx} & 0 \\ 0 & W_{yy} \end{pmatrix} \, . \tag{15} \]

Based on elementary symmetry arguments, we had already predicted this in post 2 – even for the general n-dimensional case.

Diagonal matrix elements

We can already conclude by symmetry reasons that Wxx = Wyy. But lets do it explicitly:

\[ \begin{align} W_{xx} \,&=\, {1\over 2\,\pi } \, {\int_0^D \int_0^{2\pi} } \, r^2 \,\cos^2(\varphi) \,\, \operatorname{exp} \left( -\, {1\over 2} \, r^2 \,\right)\, r\, d \varphi \, dr \\[10pt] &=\, {1\over (2\,\pi) } \, {1\over 2}\, \int_0^{D^2} \, \left( \int_0^{2\pi} \, \cos^2(\varphi) \, d \varphi \, \right) \, r^2 \, \operatorname{exp} \left( -\, {1\over 2} \, r^2 \,\right) \, d(r^2) \\[10pt] & = {1 \over 4} \int_0^{D^2} \, r^2 \, \operatorname{exp} \left( -\, {1\over 2} \, r^2 \,\right) \, d(r^2) \,. \end{align} \tag{16} \]

The integral of the cos2 can be looked up in tables or solved by substitution. Note that regarding the 2nd integral we changed from the integration variable r to the integration variable r2, which gave us an additional factor of 1/2 (due to d(r2) = 2rdr). Analogously,

\[ \begin{align} W_{yy} \,&=\, {1\over 2\,\pi } \, {\int_0^D \int_0^{2\pi} } \, r^2 \,\sin^2(\varphi) \,\, \operatorname{exp} \left( -\, {1\over 2} \, r^2 \,\right)\, r\, d \varphi \, dr \\[10pt] & = {1 \over 4} \int_0^{D^2} \, r^2 \, \operatorname{exp} \left( -\, {1\over 2} \, r^2 \,\right) \, d(r^2) \,, \\[10pt] & \Rightarrow \quad W_{xx} \, = \, W_{yy} \, =:\, W_2 \,. \end{align} \tag{17} \]

We arrive at a first intermediate result, namely that the cov-matrices CovC and Σ2 differ by an element-independent factor W2

\[ \begin{align} \pmb{\operatorname{Cov}}_ {2, \pmb{\operatorname{C}}} \,& =\, W_2 * \pmb{\operatorname{\Sigma}}_2 \, , \tag{18} \\[10pt] \pmb{\operatorname{\Sigma}}_2 \,&=\, f_{2,D,\,pred} * \pmb{\operatorname{Cov}}_ {2, \pmb{\operatorname{C}}} \,, \quad f_{2,D,\,pred} = 1 / W_2 \, . \tag{19} \end{align} \]

As predicted in post 2. But remember that we have ignored any re-normalization of the cut-off distribution, so far.

The integral over the radius

The remaining radial integral can be integrated by parts :

\[ \begin{align} W_2 \,&=\, {1 \over 4} \int_0^{D^2} \, r^2 \, \operatorname{exp} \left( -\, {1\over 2} \, r^2 \,\right) \, d(r^2) \\[10pt] &= \, 1 \,-\, {1\over 2} \, \left( 2 + D^2 \right)\, \exp\left(-{1\over 2}\,D^2\right) \, \,. \end{align} \tag{20} \]

So, this is our (preliminary) prediction for the correction factor, which we should apply to the cov-matrix of a cut-off BVN’s core to get the cov-matrix of the original BVN. So, let us be brave and compare this with numerical results.

Numerical integration of our sample data for the core

Regarding the numerical evaluation of a given sample, our theoretical integration of the integrands with the probability density must of course be replaced by a corresponding summation over the data points and their x2-values. An approximation to the probability density will be achieved via a division by the number of the total number of elements in the core. There are multiple tools available, e.g. from Scikit or Numpy or SciPy, which provide you with numerical values of the core’s cov-matrix based on a sample of respective data points and their position vectors. What are the numerical steps? I describe them for Numpy’s function “cov()“.

  • Step1: Create data of some MVN (e.g. in 4 dimensions). Perform a projection of the data down to coordinate planes (via slicing the data arrays) to get a 2-dim distribution of data points and related position vectors. Extract the elements of the respective matrix Σ2 form ΣY.
  • Step 2: Create a reduced sample of data points by calculating each 2-dim vector’s Mahalanobis distance dm with the given matrix and throw away all points with dm > D.
  • Step 3: For the remaining data of your core determine an estimate of its cov-matrix Cov2,C with the help of Numpys cov() function.
  • Step 4: Multiply the resulting data with the theoretical estimation factor f2,D,pred = 1/W2.

The result of our theoretical estimation factor is (up to rounding):

\[ W_2 \,\approx\, 0.2569, \quad {1\over W_2} \,\approx 3.893 \, . \]

The matrix resulting from Numpy’s cov() for our case with 40,000 points used during BVN-creation is

\[ \pmb{\operatorname{Cov}}_{2,\pmb{\operatorname{C}}, num} \,=\, \begin{pmatrix} 8.653 & 2.463 \\ 2.463 & 2.873 \end{pmatrix} \,. \]

So, the prediction of values for the original matrix according to relation (16) would be

\[ \pmb{\operatorname{\Sigma}}_{2, pred} \,=\, {1 \over W_2} * \pmb{\operatorname{Cov}}_{2,\pmb{\operatorname{C}}, num} \,=\, \begin{pmatrix} 33.686 & 9.587 \\ 9.587 & 11.183 \end{pmatrix} \,. \]

Which, unfortunately, is blatantly wrong!

Another correction factor from an integrated chi-squared distribution

Why did we get a wrong result? The reader has probably understood it, already: While our theoretical formulas (2), (9) and (11) are in principle correct, Numpy’s cov-function does something else than we did during our theoretical integration. Our theoretical derivation referred to a probability density which was normalized with respect to an integral of the probability density gY(y) over the infinite MVN being set to 1. See the (see the 2nd post, eq. (13)). This explains the normalization factor in front of the integrals (as e.g in eqs. (2) and (11)).

However, what Numpy’ cov and similar tools do is to normalize with respect to the number of available data points in the sample provided to them – i.e. with the number of data points in our elliptical (or spherical) core. Which in our case is only a fraction of the total number of points used to create a sample for the MVN.

In other words: So far, we have forgotten to properly normalize our cut-off distribution confined to the core having a zero probability density outside. The normalization factor is in the general case of n dimensions, of course, given by the integral

\[ F_{n,D} \,=\, {1 \over (2\pi)^{n/2} \, \left(\operatorname{det}\pmb{\operatorname{\Sigma}}_{\small Y}\right)^{1/2} } \,*\, \underset{ V_{n,D}^{ell}}{\int \int … \int} \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{y}^T \,\pmb{\Sigma}_{\small y}^{-1} \, \pmb{y} \,\right) \, dy_1\,dy_2 \, ….\, dy_n \,, \tag{21.1} \]

Regarding theory this means:

  • Whenever we use numerically evaluated cov-matrices for a core region’s data sample as the base of an estimation, we must apply an additional estimation factor Fn,D which reflects the finite integral of the MVN’s probability density over the core’s region, only.

We will look at the integral Fn,D for the general case a bit closer in the next post of this series.

We know already that this is mapped to

\[ F_{2,D} \,=\, {1 \over (2\pi) } \,*\, \underset{ V_{2,D}^{sph}}{\iint} \, \operatorname{exp} \left( -\, {1\over 2} \, \pmb{z}^T \pmb{z} \,\right) \, dz_1\,dz_2 \tag{21.2} \]

for the 2-dim situation. We could solve this integral directly. We can, however, use a result which I have proven in another post of this blog; see [1]. There we integrated a BVN’s probability density over the area of an ellipse up to Mahalanobis distance dm=D and got for the respective probability P(dmD) of finding a data point within the region up to D:

\[ P \left( d_m \le D\right) \:=\: F_{2D} \,=\, 1 \,- \, \exp \left( -\,{1\over 2} \, D^2 \right) \,. \tag{22} \]

This formula is just what we need to normalize our 2-dim cut-off distribution properly. We just have to multiply the standard probability density (6) by a factor 1/P(D). This leads to a modification of our formula for Σ2 from Cov2,C:

\[ \begin{align} \pmb{\operatorname{\Sigma}}_2 \,& = \, \left(1 \,-\, \exp(-{1\over2} D^2\right) \, {1 \over W_2 }\, \pmb{\operatorname{Cov}}_ {2, \pmb{\operatorname{C}}} \\[10pt] & =:\, C_{2,D} * \pmb{\operatorname{Cov}}_ {2, \pmb{\operatorname{C}}} \,. \tag{23} \end{align} \]

with

\[ C_{2,D} \,=\, \, \left(1 \,-\, \exp(-{1\over2} D^2\right) \, {1 \over W_2 } \,=\, {F_{2,D} \over W_2} \, \,. \tag{24} \]

And regarding an estimated value Σ2,est based on a numerically derived Cov2,C,num of the cores covariance matrix:

\[ \pmb{\operatorname{\Sigma}}_{2, est} \,= \, C_{2,D} * \pmb{\operatorname{Cov}}_ {2, \pmb{\operatorname{C}}, num} \,. \tag{25} \]

Results

Application of the correct theoretical formula (25) gives for our 2-dim example and D=1.4:

\[ \pmb{\operatorname{\Sigma}}_{2, est}|_{D=1.4} \,=\, \begin{pmatrix} 21.043 & 5.989 \\ 5.989 & 6.986 \end{pmatrix} \,. \]

These values are actually very close to the real values. The error is less than 0.2%. For D=1.8, we get

\[ \pmb{\operatorname{\Sigma}}_{2, est}|_{D=1.8} \,=\, \begin{pmatrix} 21.091 & 6.038 \\ 6.038 & 7.020 \end{pmatrix} \,. \]

The results depend, of course, a bit on the statistical generation of the sample’s data points, but on average the error is less than 2%. I leave it to the reader to evaluate predictions from statistical samples for other values of D.

Numerical cross checks – and numerical integration of gZ(z) over the circle area

For those interested in numerical experiments some additional hints:
You can of course try to approximate the volume integral (8) of the integrands x2 * gZ(z) and/or y2 * gZ(z) over a spherical core numerically as an alternative way for getting an estimation factor. This requires to both rotate the sample’s vectors by angles given by the eigenvectors of Σ2 (see the plot at the top of this article), and, afterward, to standardize them by using values of the standard deviations σ1/2 . The latter are given by the eigenvalues λ1/2 of Σ2 as σ1/2 = 1/sqrt(λ1/2). In addition you may determine the fraction of points in the core in relation to the number of total points created for the MVD. This eventually provides a numerical derived factor C2,D,num for our estimation. Let us compare these numerical factors with the theoretical one, e.g. for D=1.4 and D=1.8:

\[ \begin{align} D = 1.4 \, : \quad C_{2,D} \,&=\, 2.432 \,, \,\,\, C_{2,D,num} \,=\, 2.437 \,, \\[10pt] D = 1.8 \, : \quad C_{2,D} \,&=\, 1.666 \,, \,\,\, C_{2,D,num} \,=\, 1.661 \,. \end{align} \]

I have averaged the numerical results for x2 * gZ(z) and y2 * gZ(z). These rather good results open up a way for a purely numerical approach for the estimation of a BVN’s cov-matrix from data points of a limited core.

Conclusion

We have evaluated the factors required for an estimation of the covariance matrix Σ2 of a 2-dimensional BVN from sample data of a respective cut-off distribution confined and limited to a core. The core was defined by a given Mahalanobis distance dm=D. In addition to the results of previous posts, we saw that we must take care of a proper normalization of the confined cut-off distribution. After we took this point into account we got a very good match of theoretical and numerical results for our sample of data points. We evaluated the core’s cov-matrix CovCCovC,num by numerical means and applied both theoretically and numerically derived estimation factors to get estimation values for the elements of Σ2 . The error was less than 1% for a case with around 25,000 data points inside a core defined by D=1.4.

In the next post of this series

we look a bit closer at the volume integrals governing the 3-dimensional case.

Links and literature

[1] R. Mönchmeyer, 2025, “Bivariate Normal Distribution – integrated probability up to a given Mahalanobis distance, the Chi-squared distribution and confidence ellipses”,
https://machine-learning.anracom.com/2025/07/30/bivariate-normal-distribution-integrated-probability-up-to-a-given-mahalanobis-distance-the-chi-squared-distribution-and-confidence-ellipses/