Skip to content

Multivariate Normal Distributions – IV – Spectral decomposition of the covariance matrix and rotation of the coordinate system

In the preceding posts of this series we have considered a comprehensible definition and basic properties of a non-degenerateMultivariate Normal Distribution” of vectors in the ℝn [N-MND]. In this post we will make a step in the direction of a numerical analysis of some given finite vector distribution with properties that indicate an underlying N-MND. We want to find an optimal Euclidean coordinate System [ECS] which allows for a simple representation and handling of the distribution’s probability density function [pdf].

Links to introductory posts:

Steps and results so far

In “post I” we represented a vector distribution by a “random vector“. We afterward described the probability density of a continuous vector distribution and considered random vectors based on independent Gaussians. In “post II” we defined a MND as the result of a linear transformation M applied onto a special distribution of vectors whose component values varied according to independent and standardized Gaussian functions. We derived the functional form of an MND’s continuous pdf in an Euclidean Coordinate System [ECS]. In the preceding “post III” we have shown that the contour-hypersurfaces of the probability density are surfaces of multidimensional ellipsoids. For a general MND the main axes of these ellipsoids are rotated against the ECS-axes. We have understood that such a rotation reflects a correlation of the components of the random vector. In general the off-diagonal elements of a MND’s covariance matrix are not zero.

Objective of this post: Choose an optimal ECS for a given MND-like vector distribution

In this post we will look at MND features from a point of view which is relevant for the practical numerical analysis of assumedly normal distributions given in a ML-context. Our key question is: Can we find a special coordinate system in which the main axes of the (hopefully) ellipsoidal contour surfaces coincide with the ECS-axes? I.e., an ECS built on (abstract) coordinates, in which the distribution of the component values de-correlate? Such an ECS would make our analysis significantly easier – in particular with respect to numerical methods.

The answer to the posed question is: Yes, we can. And we will see that finding a suitable ECS corresponds to solving an eigenvalue problem. We start with considering the algebraic representation of ellipsoids whose main axes are oriented in parallel to the axes of an ECS. Afterward we discuss a suitable decomposition (= factorization) of the symmetric covariance matrices Σ of a N-MND and its inverse Σ-1. The combination will give us a method to determine the aspired ECS.

We abbreviate the expression “Multivariate Normal Distribution” by either MND or, synonymously MVN. Both abbreviations appear in the literature. We refer to the “variance-covariance matrix” of a random vector just as its “covariance matrix”.

Main axes of a normal W-distribution of independent Gaussians

We work with vector distributions and related point distributions in the ℝn. Remember that we constructed a non-degenerate n-dimensional MND by applying an invertible linear transformation (plus a shift vector) onto a much simpler distribution with independent Gaussian distributions of the vector component values. We have symbolized such a basic distribution by a random vector W – and its centered standardized variant by Z. See posts I and II of this series.

\[ \pmb{W} \,\sim\, \pmb{\mathcal{N}}_n \left(\pmb{\mu}_{\small W},\, \pmb{\Sigma}_{\small W} \right), \, \quad \pmb{Z} \,\sim\, \pmb{\mathcal{N}}_n \left(\pmb{0},\, \pmb{\operatorname{I}} \right) \,, \]
\[ \mbox{with} \quad \pmb{\operatorname{\Sigma}}_{\small W} \, = \, diag \left(\, \sigma_1^2,\, \sigma_2^2, \cdots, \, \sigma_n^2 \, \right) ,\quad \mbox{and} \quad \pmb{\operatorname{\Sigma}}_{\small Z} \, = \, \pmb{\operatorname{I}} \,. \]

diag” indicates a diagonal matrix and the σi2 represent the variances of the component distributions. Remember that the contour surfaces of the pdf of Z are surfaces of multidimensional spheres. The inverse of Σw, Σw-1, is a diagonal matrix, too, with the reciprocal of the variance values 1 / σi2 as elements along its diagonal.

An important property of a W-distribution is that a constant Mahalanobis distance (see post III) for its vectors w defines the surface of an ellipsoid whose main axes indeed are oriented in parallel to the ECS axes. How can we conclude this from our basic formulas? Well, the standard definition of an ellipsoidal surface in n dimensions with the main axes of the ellipsoid oriented in parallel to the axes of the chosen ECS is given by an expression of the form

\[ \sum_{i=1}^n \, \left( {x_i \over a_i} \right)^2 \,=\, C = const. \]

with constant factors ai. When we “move” the value of C into the ai, the factors give us the lengths of the main half axes of the ellipsoids. Now compare this to the square of the Mahalanobis distance in an ECS centered with respect to the W-related MND, i.e. in an ECS where μW = 0:

\[ D_W (\pmb{w}) \,=\, \pmb{w}^T {\small \bullet} \, \pmb{\operatorname{\Sigma}}_{\small W}^{-1} \, {\small \bullet} \, \pmb{w} \, = \, \sum_{i=1}^n \, \left( {w_i \over \sigma_i} \right)^2 \,=\, C = const. \]

This is exactly the algebraic form required. What helped us is the fact that the inverse of the covariance matrix of W is a diagonal matrix. A W-distribution can easily be transformed into a standardized distribution Z with the help of a scaling diagonal matrix. So, we have good reason to believe that a given general non-degenerate MND is linearly related to a W-distribution with contours given by axis-parallel ellipsoids. But we apparently need a transition to an ECS in which the respective Σ-matrix and its inverse become diagonal.

From a given covariance matrix of a N-MND to a normal random vector with de-correlated components

So, let us try to reverse the considerations of previous posts. Let us assume that someone has given us a non-degenerate MND-distribution Y of vectors (assumedly) having a probability density like the g(y) we derived in post II (with μ being the mean vector):

\[ g(\pmb{y}) \,=\, {1 \over (2\pi)^{n/2} \, \left(\operatorname{det} \pmb{\operatorname{\Sigma}}\right)^{1/2} } \, {\Large e}^{ – \, {\Large 1 \over \Large 2} \, \left[ \left( {\Large \pmb{y} \,-\, \pmb{\mu} } \right)^{\Large T} \, { \Large \pmb{\operatorname{\Sigma}}}^{\Large -1} \, \left( {\Large \pmb{y} \,-\, \pmb{\mu}} \right) \right] } \,. \]

By some magic we have also got the distribution’s covariance matrix Σ (or a numerical approximation of it). As we work in the ℝn, Σ is a symmetric, positive (n x n)-matrix. We know from our construction of N-MNDs that Σ should factorize like

\[ \pmb{\operatorname{\Sigma}} = \pmb{\operatorname{M}} \, {\small \bullet} \, \pmb{\operatorname{M}}^T \,, \quad \mbox{for some} \,\, (n \, \operatorname{x} \, n) \mbox{-matrix} \,\,\pmb{\operatorname{M}}. \]

Can we find a well defined and invertible matrix M leading us back to underlying Z-like distributions based on independent Gaussians in all coordinate directions? More precisely: Is there a (numerical) method to derive the elements of such a matrix M from Σ? Obviously, we must find some well defined factorization of Σ

A problem you should be aware of is that due to our construction (see post II) M is not unique without further restrictions. Actually, it is unique only up to a multidimensional rotation, i.e. an orthogonal matrix. The reason is that a chosen Z-distribution can be rotated by any degree without changing any of our basic conditions for a non-degenerate MND. Or in other words: We can choose any rotated ECS with respect to Z to start with. This means that a well defined method must refer to a specific ECS, which we must select by imposing some condition on the factorization of Σ. And this restriction should in the best case have something to do with de-correlation of the vectors’ component distributions. To achieve this let us refer to the geometry of the pdf’s contour hyper-surfaces.

From a geometrical point of view a special ECS would be the one in which the orthogonal axes of the multi-dimensional ellipsoids, which define the pdf-contours of a N-MND, would be aligned with the coordinate axes of the ECS.

In such an ECS our MND-distribution would appear like a W-distribution composed of independent Gaussians for the distributions of the vector component values.

Spectral decomposition of the covariance matrix of a non-degenerate MND

We simplify our problem by moving the origin of our ECS to the center of the distribution of our MND vectors y, such that the MND’s mean vector μ becomes μ = 0.

\[ \pmb{Y} \,\sim\, \pmb{\mathcal{N}}_n \left(\pmb{0},\, \pmb{\operatorname{\Sigma}} \right) \,. \]

Let us call this specific ECS in which we describe the vectors y (given by the random vector Y) “ECSY“. We now use some theorems of Linear Algebra regarding matrix decomposition. A factorization of a given matrix is often possible in multiple ways.

Cholesky-decomposition?
In the case of a symmetric, positive-definite and real-valued matrix Σ it is tempting to pick the so called “Cholesky decomposition” (see [3]). It tells us that such a matrix Σ can always be decomposed into a pair K • KT of invertible triangular matrices with positive elements along the diagonal

\[ \pmb{\operatorname{\Sigma}} \, = \, \pmb{\operatorname{K}} \, {\small \bullet } \, \pmb{\operatorname{K}}^T \, \quad \mbox{with} \,\, \pmb{\operatorname{K}} \,\, \mbox{being an upper or lower triangle matrix} . \]

This would give us the aspired form. However, we can not see any directly understandable relation to a specific ECS and a diagonalization of Σ. We need to find a better suited decomposition.

Spectral decomposition
Another decomposition, which is of more interest, is the so called “spectral decomposition“. You can read all about it in [3] (page 149). A short summary is: A symmetric matrix as Σ can always be factorized and written as

\[ \pmb{\operatorname{\Sigma}} \: = \: \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}} \pmb{\operatorname{V}}^T \,=\, \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}}^{1/2} {\small \bullet} \,\, \pmb{\operatorname{\Lambda}}^{1/2} \, \pmb{\operatorname{V}}^T \,=\, \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}}^{1/2} \, {\small \bullet} \,\, \pmb{\operatorname{\Lambda}}^{1/2} \, \pmb{\operatorname{V}}^{-1} \]
\[ \mbox{with} \quad \pmb{\operatorname{\Lambda}}, \, \pmb{\operatorname{\Lambda}}^{1/2} \, diagonal, \,\,\pmb{\operatorname{V}} \, orthogonal \]

V is an orthogonal matrix consisting of n orthogonal or even orthonormal eigenvectors of Σ. Λ is a diagonal matrix with real values.

\[ \pmb{\operatorname{V}} \,=\, \left( \pmb{v}_1, \, \pmb{v}_2, …, \, \pmb{v}_n \right) , \quad \mbox{with} \,\, \pmb{\operatorname{V}} ^T = \pmb{\operatorname{V}} ^{-1} \,\,\, \mbox{and} \,\,\, \pmb{v}_i \, {\small \bullet } \, \pmb{v}_j \,=\, \delta_{i,j} * || \pmb{v}_i ||^2 \, , \]
\[ \pmb{\operatorname{\Lambda}} \,=\, diag \left( \lambda_1, \, \lambda_2, \, …, \, \lambda_n \right) \,=\, \pmb{\operatorname{\Lambda}}^{1/2} {\small \bullet } \, \pmb{\operatorname{\Lambda}}^{1/2}, \,\, \mbox{with}\,\, \lambda_i \gt 0, \, \forall \,i \in [1, n] \,. \]

It follows that

\[ \pmb{\operatorname{\Sigma}}^{-1} \: = \: \left[ \, \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}} \pmb{\operatorname{V}}^T \, \right]^{-1} \,=\, \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}}^{-1} \pmb{\operatorname{V}}^T \,=\, \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}}^{-1} \pmb{\operatorname{V}}^{-1} \,=\, \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}}^{-1/2}\, {\small \bullet} \,\, \pmb{\operatorname{\Lambda}}^{-1/2} \, \pmb{\operatorname{V}}^{-1} \,. \]

The first positive point with respect to our objective is that V‘s column vectors are orthogonal eigenvectors of Σ. Such vectors can be found for a general symmetric matrix by well established numerical methods if its determinant is positive. The other positive point is that Λ is diagonal and contains the respective positive eigenvalues λi. From LinAlg we know that all eigenvalues λi of a real symmetric and positive definite matrix are real and that all λi > 0 (see e.g. [4]). Λ1/2 contains square roots of the eigenvalues on the diagonal. Λ-1 contains the values 1/λi on its diagonal. Λ actually represents Σ in a rotated coordinate system (see below).

Note that we can normalize the eigenvectors by moving respective length factors into the eigenvalues. So, V can be chosen to be an orthonormal matrix (with ||vi|| = 1). Then the eigenvectors of Σ the can be regarded as unit vectors of a special Euclidean coordinate system. In addition the sign of the eigenvectors can always be chosen such that the determinant of V becomes +1. This is good, too, because then we can interpret V and its inverse as a rotation matrices (see below).

It is easy to show that eigenvectors ye of Σ also are eigenvectors of Σ-1, but for the eigenvalues 1/λi.

\[ {1\over \lambda_e} ||\pmb{y}_e||^2 \, \pmb{y}_e = {1\over \lambda_e} \pmb{y}_e \left[\, \pmb{y}_e^T \pmb{y}_e\, \right] = {1\over \lambda_e} \pmb{y}_e \left[ \, \pmb{y}_e^T \, \pmb{\operatorname{\Sigma}} \pmb{\operatorname{\Sigma}}^{-1} \pmb{y}_e \, \right] = {1\over \lambda_e} \pmb{y}_e \, \left[ \, \lambda_e \, \pmb{y}_e^T \,\pmb{\operatorname{\Sigma}}^{-1} \pmb{y}_e \, \right] = ||\pmb{y}_e||^2 \, \pmb{\operatorname{\Sigma}}^{-1} \pmb{y}_e \\ \Rightarrow \,\, \pmb{\operatorname{\Sigma}}^{-1} \, \pmb{y}_e \,=\, {1\over \lambda_e} \,\pmb{y}_e \]

As Σ has full rank n, V has full rank, too. However, V is not symmetric. (M isn’t either!) A spectral decomposition is a special case of a so called eigen-decomposition.

Orthonormal matrices represent rotations

Note: Angles and scalar products between some vectors w1, w2 transformed by V are kept up due to properties of the orthogonal matrices.

\[ \mbox{ECS}_{\small Y} : \,\, \pmb{y}_1^T \, {\small \bullet } \, \pmb{y}_2 \,=\, \,=\, \left( \pmb{\operatorname{V}} \pmb{w}_1 \right)^T\, \pmb{\operatorname{V}} \pmb{w}_2 = \pmb{w}_1^T \, \pmb{\operatorname{V}}^{-1} \, {\small \bullet } \, \pmb{\operatorname{V}} \pmb{w}_2 \,=\, \pmb{w}_1^T \, {\small \bullet } \, \pmb{w}_2 \,. \]

And for a matrix By = VBVT we find

\[ \mbox{ECS}_{\small Y} : \,\, \pmb{y}^T \pmb{\operatorname{B}}_y \, \pmb{y} \,=\, \left( \pmb{\operatorname{V}} \pmb{w} \right)^T \, {\small \bullet } \, \pmb{\operatorname{V}} \pmb{\operatorname{B}} \pmb{\operatorname{V}}^T \, {\small \bullet } \, \pmb{\operatorname{V}} \pmb{w} \,=\, \pmb{w}^T \pmb{\operatorname{V}}^{-1} {\small \bullet } \, \pmb{\operatorname{V}} \pmb{\operatorname{B}} \pmb{\operatorname{V}}^{-1} \, {\small \bullet } \, \pmb{\operatorname{V}} \pmb{w} \,=\, \pmb{w}^T \pmb{\operatorname{B}} \pmb{w} \,. \]

The geometrical meaning is that an orthogonal matrix represents a rotation of vectors in an ECS by an angle φ around some axis given by a vector r.

However, an orthonormal matrix O with determinant +1 can also be interpreted such that it gives us the components of a vector in a new coordinate system ECSW rotated in opposite direction (-φ) against the original coordinate system ECSY. The elements of a matrix B transform during a transition from ECSY to ECSW OBOT. The other way round, O-1 can be interpreted to give coordinates of a given vector y in an ECSW rotated by +φ.

V, in particular, represents a rotation of an ECSW, whose axes were aligned with the orthogonal eigenvectors of Σ, onto ECSY. The inverse matrix V-1 thus determines the component values of vectors y in an ECSW with axes parallel to these eigenvectors. Therefore, our matrix Σ = V Λ VT is a representation of Λ in the rotated ECSY . Or, if you like it to see the other way round, Λ represents our Σ in ECSW.

\[ \mbox{ECS}_W \sim \pmb{\operatorname{V}} {\small \bullet} \mbox{ ECS}_Y \]

The Mahalanobis distance in terms of spectral decomposition matrices

Let us combine our insights. We decide to chose a special M = MS as indicated by the spectral decomposition

\[ \pmb{\operatorname{M}} = \pmb{\operatorname{M}}_S \,=\, \pmb{\operatorname{\Lambda}}^{1/2} \, \pmb{\operatorname{V}}^{-1} \, \quad \Rightarrow \, \quad \pmb{\operatorname{M}}_S^{-1} \,=\, \pmb{\operatorname{\Lambda}}^{-1/2} \, \pmb{\operatorname{V}}^{-1} \, , \]

and find out, how far we get with this. First we have

\[ \pmb{\operatorname{\Sigma}} \,=\, \pmb{\operatorname{M}}_S \, {\small \bullet} \, \pmb{\operatorname{M}}_S^T \quad \Rightarrow \quad \pmb{\operatorname{\Sigma}}^{-1} \,=\, \left(\pmb{\operatorname{M}}_S^T\right)^{-1} {\small \bullet} \,\, \pmb{\operatorname{M}}_S^{-1} \,=\, \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}}^{-1/2}\, {\small \bullet} \,\, \pmb{\operatorname{\Lambda}}^{-1/2} \, \pmb{\operatorname{V}}^{-1} \,. \]

Let us write down the Mahalanobis distance for a vector y of a non-degenerate Y-MND::

\[ \pmb{y}^T \, {\small \bullet} \, \pmb{\operatorname{\Sigma}}^{-1} \, {\small \bullet} \, \pmb{y} \,=\, \pmb{y}^T \, {\small \bullet} \, \left(\pmb{\operatorname{M}}_S^T\right)^{-1} {\small \bullet} \,\, \pmb{\operatorname{M}}_S^{-1} \, {\small \bullet} \, \pmb{y} \,=\, \pmb{y}^T \, {\small \bullet} \, \left[ \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}}^{-1/2}\, {\small \bullet} \,\, \pmb{\operatorname{\Lambda}}^{-1/2} \, \pmb{\operatorname{V}}^{-1} \right] \, {\small \bullet} \,\, \pmb{y} \, . \]

Thus, with y = Mz, we can fulfill an essential condition of our construction of a non-degenerate MND:

\[ \pmb{y}^T \, {\small \bullet} \, \pmb{\operatorname{\Sigma}}^{-1} \, {\small \bullet} \, \pmb{y} \,=\, \pmb{y}^T \, {\small \bullet} \, \left[ \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}}^{-1/2}\, {\small \bullet} \,\, \pmb{\operatorname{\Lambda}}^{-1/2} \, \pmb{\operatorname{V}}^{-1} \right] \, {\small \bullet} \,\, \pmb{y} \, = \, \pmb{z}^T \, {\small \bullet} \, \pmb{z} \,, \\ \quad \mbox{with} \,\, \pmb{z} \, =\, \pmb{\operatorname{\Lambda}}^{-1/2} \, \pmb{\operatorname{V}}^{-1} \, {\small \bullet} \,\, \pmb{y} . \]

What does this all mean geometrically?

Recreation of the given N-MND from a Z-distribution

Let us first describe the creation of Y in ECSY for a given Σ, Λ and V. The elementary operation MSz to construct a MND obviously consists of two steps or operations:

Step 1: Pick a (spherically symmetric) Z-distribution of vectors and stretch all vector components by the square root of respective positive eigenvalues of Σ. I.e. transform our z-vectors by

\[ \pmb{w} \,=\, \pmb{\operatorname{\Lambda}}^{1/2} \, {\small \bullet} \, \, \pmb{z}, \, \quad \mbox{with} \,\, \pmb{w} \,\,\mbox{given by} \,\, \pmb{W} \,\sim\, \pmb{\mathcal{N}}_n \left(\pmb{0},\, \pmb{\operatorname{\Lambda}} \right) \]

This obviously transforms the spheres of equal probability density of Z into ellipsoidal surfaces with the main axes of the ellipsoids being oriented along the ECSY-axes. These w-vectors can be interpreted as elements of a distribution given by a centered normal random vector W with independent Gaussian components and variances λi.

Step 2: Pick the vectors w of W and rotate them via an orthonormal V (defined by eigenvectors of Σ – arranged in the order of the respective eigenvalues in Λ). Chose the sign of the eigenvectors such that V becomes a rotation (det V = +1).

\[ \pmb{y} \,=\, \pmb{\operatorname{V}} \, {\small \bullet} \, \pmb{w}, \, \quad \mbox{with} \,\, \pmb{y} \,\,\mbox{given by} \,\, \pmb{Y} \,\sim\, \pmb{\mathcal{N}}_n \left(\pmb{0},\, \pmb{\operatorname{\Sigma}} \right) \,\, . \]

Reverse order: From Σ to V-matrices and respective W– and Z-distributions

We now revert the whole process for the analysis of a given Y which seems to have properties of a N-MND in ECSY. We follow three steps:

Step A – determination of eigenvectors: In ECSY we first determine the variance-covariance matrix Σ and its inverse (e.g. by numerical methods). We then calculate the n orthonormal eigenvectors of Σ and Σ-1. Afterward we build a matrix V by using the eigenvectors as columns of this matrix. We choose the sign of the eigenvectors such that V defines a rotation (det V = +1). The eigenvectors define unit vectors along the axes of a new Euclidean coordinate system ECSW. We organize respective eigenvalues λ1, λ2, .. λn in a matrix Λ the the same order as we positioned the eigenvectors as columns in the matrix V.

Step B – Rotation of the coordinate system: We now rotate ECSY by V such that it coincides with a new ECSW. The components of a vector y in ECSW are equal to the components of the following vector w in ECSY (!):

\[ \pmb{w} \,=\, \pmb{\operatorname{V}}^{-1} \, {\small \bullet} \, \pmb{y} \]

I.e, the inverse of V, i.e., V-1, gives us the components of y in ECSW.

If the distribution Y really were a N-MND, than V-1 would transform the contour ellipsoids of equal probability density into axis-parallell ellipsoids in ECSW. We can see this via a transformation of the square of the Mahalanobis distance by V-1 :

\[ \begin{align} \pmb{y}^T \, {\small \bullet} \, \pmb{\operatorname{\Sigma}}^{-1} \, {\small \bullet} \, \pmb{y} \,&=\, \pmb{y}^T \, {\small \bullet} \, \left[ \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}}^{-1/2}\, {\small \bullet} \,\, \pmb{\operatorname{\Lambda}}^{-1/2} \, \pmb{\operatorname{V}}^{-1} \right] \, {\small \bullet} \,\, \pmb{y} \, \\ &=\, \pmb{y}^T \, {\small \bullet} \, \left( \pmb{\operatorname{V}} \pmb{\operatorname{V}}^{-1} \right) \, {\small \bullet} \, \left[ \pmb{\operatorname{V}} \pmb{\operatorname{\Lambda}}^{-1/2}\, {\small \bullet} \,\, \pmb{\operatorname{\Lambda}}^{-1/2} \, \pmb{\operatorname{V}}^{-1} \right] \, {\small \bullet} \left( \pmb{\operatorname{V}} \pmb{\operatorname{V}}^{-1} \right) \, {\small \bullet} \,\, \pmb{y} \, \\ &=\, \left( \pmb{y}^T \, \pmb{\operatorname{V}} \right) \, {\small \bullet} \, \pmb{\operatorname{I}} \, \, {\small \bullet} \, \pmb{\operatorname{\Lambda}}^{-1} \, {\small \bullet} \,\pmb{\operatorname{I}} \, \, {\small \bullet} \, \left( \pmb{\operatorname{V}}^{-1} \, \pmb{y} \right) \, \\ &=\, \left( \pmb{\operatorname{V}}^{-1} \, \pmb{y} \right)^T \, {\small \bullet} \,\, \pmb{\operatorname{\Lambda}}^{-1} \, {\small \bullet} \,\, \left( \pmb{\operatorname{V}}^{-1} \, \pmb{y} \right) \\ &=\, \pmb{w}^T {\small \bullet} \,\, \pmb{\operatorname{\Lambda}}^{-1} \, {\small \bullet} \,\, \pmb{w} . \end{align} \]

So, Λ-1 indeed represents Σ-1 in ECSW. If Y really were a N-MND, the diagonal form would guarantee that the main axis of the transformed ellipsoidal hyper-surfaces were aligned with ECSW‘ s coordinate axes. Furthermore the components of the variances of the “de-correlated” Gaussians for the components would just be given by the eigenvalues λ1, λ2, .. λn.

However, if Y did not have the properties of a MND then we would not get axis-parallel ellipsoids in ECSW. The difference to the theoretical ellipsoids is something that we can investigate numerically.

Step C – Scale to get (or not get) a spherically symmetric distribution: As soon as we have our W-distribution we can rescale it by applying Λ-1/2. In case of an original N-MND Y tis would now give us a spherically symmetric distribution Z.

A comment on the meaning of rotated coordinate system in ML-contexts

The coordinate system we choose to work with in a ML context is typically given by some predefined set of variables – either corresponding directly to the properties of objects we work with or to already abstract orthogonal coordinates of the latent space of an ML algorithm (like e.g. an Autoencoder). When you move to a rotated ECS you should be very clear about one thing:

The new coordinates will be abstract ones. They (most often) have no direct interpretation in terms of the original properties of the objects we apply an ML-algorithm to.

The correlation of the original (natural) properties does not disappear by some magic when we go over to some abstract coordinates via a rotation of the ECS. And: Even in a coordinate system with axes-parallel ellipsoidal contours of a N-MND the ratios of the lengths of the main-axes of the ellipsoids would have fixed values. These ratios do not disappear via a rotation.

Conclusion

In this post we have seen that we can use the variance-covariance matrix of a N-MND to determine a coordinate system in which the main axes of the ellipsoidal contour hyper-surfaces align with the coordinate axes. Did this remind you of a method often used in the context of classic ML-methods? Probably it did: You may have thought of PCA. However, before we get there, I want to present a more general definition of a MND in the next post. This will also bring closer to the topic of how to include and justify degenerate MNDs.