In the preceding posts of this series we have considered a comprehensible definition and basic properties of a * non-degenerate* “

**M**ultivariate

**N**ormal

**D**istribution” of vectors in the ℝ

^{n}[

**N-MND**]. In this post we will make a step in the direction of a numerical analysis of some given finite vector distribution with properties that indicate an underlying N-MND. We want to find an optimal Euclidean coordinate System [

**ECS**] which allows for a simple representation and handling of the distribution’s probability density function [

Links to introductory posts:

- Multivariate Normal Distributions – III – Variance-Covariance Matrix and a distance measure for vectors of non-degenerate distributions
- Multivariate Normal Distributions – II – Linear transformation of a random vector with independent standardized normal components
- Multivariate Normal Distributions – I – Basics and a random vector of independent Gaussians

## Steps and results so far

In “post I” we represented a vector distribution by a “*random vector*“. We afterward described the probability density of a continuous vector distribution and considered random vectors based on independent Gaussians. In “post II” we defined a MND as the result of a linear transformation **M** applied onto a special distribution of vectors whose component values varied according to *independent* and *standardized* Gaussian functions. We derived the functional form of an MND’s continuous **pdf** in an Euclidean Coordinate System [**ECS**]. In the preceding “post III” we have shown that the contour-hypersurfaces of the probability density are surfaces of multidimensional ellipsoids. For a general MND the main axes of these ellipsoids are *rotated* against the ECS-axes. We have understood that such a rotation reflects a ** correlation** of the components of the random vector. In general the off-diagonal elements of a MND’s covariance matrix are

*not*zero.

## Objective of this post: Choose an optimal ECS for a given MND-like vector distribution

In this post we will look at MND features from a point of view which is relevant for the practical numerical analysis of assumedly *normal* distributions given in a **ML-context**. Our key question is: Can we find a special coordinate system in which the main axes of the (hopefully) ellipsoidal contour surfaces coincide with the ECS-axes? I.e., an ECS built on (abstract) coordinates, in which the distribution of the component values de-correlate? Such an ECS would make our analysis significantly easier – in particular with respect to numerical methods.

The answer to the posed question is: Yes, we can. And we will see that finding a suitable ECS corresponds to solving an eigenvalue problem. We start with considering the algebraic representation of ellipsoids whose main axes are oriented in parallel to the axes of an ECS. Afterward we discuss a suitable decomposition (= factorization) of the symmetric covariance matrices **Σ** of a N-MND and its inverse **Σ**^{-1}. The combination will give us a method to determine the aspired ECS.

We abbreviate the expression “Multivariate Normal Distribution” by either **MND** or, synonymously **MVN**. Both abbreviations appear in the literature. We refer to the “variance-covariance matrix” of a random vector just as its “covariance matrix”.

## Main axes of a normal *W*-distribution of *independent* Gaussians

We work with vector distributions and related point distributions in the ℝ^{n}. Remember that we constructed a non-degenerate *n*-dimensional MND by applying an *invertible* linear transformation (plus a shift vector) onto a much simpler distribution with independent Gaussian distributions of the vector component values. We have symbolized such a basic distribution by a random vector ** W** – and its centered standardized variant by

**. See posts I and II of this series.**

*Z*“*diag*” indicates a diagonal matrix and the *σ _{i}*

^{2}represent the variances of the component distributions. Remember that the contour surfaces of the pdf of

**are surfaces of multidimensional**

*Z**spheres*. The inverse of

**Σ**

_{w},

**Σ**

_{w}

^{-1}, is a diagonal matrix, too, with the reciprocal of the variance values 1 /

*σ*

_{i}^{2}as elements along its diagonal.

An important property of a ** W**-distribution is that a constant

**Mahalanobis**

**distance**(see post III) for its vectors

**defines the surface of an ellipsoid whose main axes indeed are oriented**

*w**in parallel*to the ECS axes. How can we conclude this from our basic formulas? Well, the standard definition of an ellipsoidal surface in

*n*dimensions with the main axes of the ellipsoid oriented

*in parallel*to the axes of the chosen ECS is given by an expression of the form

with *constant* factors *a _{i}*. When we “move” the value of

*C*into the

*a*, the factors give us the lengths of the main half axes of the ellipsoids. Now compare this to the square of the Mahalanobis distance in an ECS

_{i}*centered*with respect to the

**-related MND, i.e. in an ECS where**

*W*

*μ*_{W}=

**:**

*0*This is exactly the algebraic form required. What helped us is the fact that the inverse of the covariance matrix of ** W** is a diagonal matrix. A

**-distribution can easily be transformed into a standardized distribution**

*W***with the help of a scaling diagonal matrix. So, we have good reason to believe that a given**

*Z**general*non-degenerate MND is linearly related to a

**-distribution with contours given by**

*W**ellipsoids. But we apparently need a transition to an ECS in which the respective*

**axis-parallel****Σ**-matrix and its inverse become diagonal.

## From a given covariance matrix of a N-MND to a normal random vector with de-correlated components

So, let us try to reverse the considerations of previous posts. Let us assume that someone has given us a non-degenerate MND-distribution ** Y** of vectors (assumedly) having a probability density like the

*g(*we derived in post II (with

**y**)**being the mean vector):**

*μ*By some magic we have also got the distribution’s covariance matrix **Σ** (or a numerical approximation of it). As we work in the ℝ^{n}, **Σ** is a symmetric, positive (*n* x *n*)-matrix. We know from our construction of N-MNDs that **Σ** should factorize like

Can we find a well defined and invertible matrix **M** leading us back to underlying ** Z**-like distributions based on independent Gaussians in all coordinate directions? More precisely: Is there a (numerical)

*method*to derive the elements of such a matrix

**M**from

**Σ**? Obviously, we must find some well defined factorization of

**Σ**…

A problem you should be aware of is that due to our construction (see post II) **M** is **not** unique without further restrictions. Actually, it is unique only up to a multidimensional rotation, i.e. an orthogonal matrix. The reason is that a chosen ** Z**-distribution can be rotated by any degree without changing any of our basic conditions for a non-degenerate MND. Or in other words: We can choose

*any*rotated ECS with respect to

*to start with. This means that a well defined method must refer to a*

**Z***ECS, which we must select by imposing some condition on the factorization of*

**specific****Σ**. And this restriction should in the best case have something to do with de-correlation of the vectors’ component distributions. To achieve this let us refer to the geometry of the pdf’s contour hyper-surfaces.

From a geometrical point of view a special ECS would be the one in which the orthogonal axes of the multi-dimensional ellipsoids, which define the pdf-contours of a N-MND, would be aligned with the coordinate axes of the ECS.

In such an ECS our MND-distribution would appear like a ** W**-distribution composed of independent Gaussians for the distributions of the vector component values.

## Spectral decomposition of the covariance matrix of a non-degenerate MND

We simplify our problem by moving the origin of our ECS to the center of the distribution of our MND vectors ** y**, such that the MND’s mean vector

**becomes**

*μ***=**

*μ***.**

*0*Let us call this specific ECS in which we describe the vectors ** y** (given by the random vector

*) “ECS*

**Y**_{Y}“. We now use some theorems of Linear Algebra regarding matrix decomposition. A factorization of a given matrix is often possible in multiple ways.

**Cholesky-decomposition? **

In the case of a symmetric, positive-definite and real-valued matrix **Σ** it is tempting to pick the so called *“Cholesky decomposition*” (see [3]). It tells us that such a matrix **Σ** can always be decomposed into a pair **K • K**^{T} of invertible *triangular* matrices with positive elements along the diagonal

This would give us the aspired form. However, we can not see any directly understandable relation to a specific ECS and a diagonalization of **Σ**. We need to find a better suited decomposition.

**Spectral decomposition**

Another decomposition, which is of more interest, is the so called “**spectral decomposition**“. You can read all about it in [3] (page 149). A short summary is: A symmetric matrix as **Σ** can always be factorized and written as

**V** is an orthogonal matrix consisting of *n* orthogonal or even *orthonormal* eigenvectors of **Σ**. **Λ** is a diagonal matrix with real values.

It follows that

The first positive point with respect to our objective is that **V**‘s column vectors are **orthogonal****eigenvectors** of **Σ**. Such vectors can be found for a general symmetric matrix by well established *numerical* methods if its determinant is positive. The other positive point is that **Λ** is diagonal and contains the respective positive eigenvalues *λ*_{i}. From LinAlg we know that all eigenvalues *λ*_{i} of a real symmetric and positive definite matrix are real and that all *λ*_{i} > 0 (see e.g. [4]). **Λ**^{1/2} contains square roots of the eigenvalues on the diagonal. **Λ**^{-1} contains the values 1/*λ*_{i} on its diagonal. **Λ** actually represents **Σ** in a rotated coordinate system (see below).

Note that we can normalize the eigenvectors by moving respective length factors into the eigenvalues. So, **V** can be chosen to be an ** orthonormal matrix** (with ||

**v**

_{i}|| = 1). Then the eigenvectors of

**the can be regarded as unit vectors of a special Euclidean coordinate system. In addition the sign of the eigenvectors can always be chosen such that the determinant of**

**Σ****V**becomes +1. This is good, too, because then we can interpret

**V**and its inverse as a

*rotation matrices*(see below).

It is easy to show that eigenvectors *y*_{e} of **Σ** also are eigenvectors of **Σ**^{-1}, but for the eigenvalues 1/*λ*_{i}.

As **Σ** has full rank *n*, **V** has full rank, too. However, **V** is **not** symmetric. (**M** isn’t either!) A spectral decomposition is a special case of a so called eigen-decomposition.

### Orthonormal matrices represent rotations

**Note:** Angles and scalar products between some vectors *w*_{1}, **w**_{2} transformed by **V** are kept up due to properties of the orthogonal matrices.

And for a matrix **B**_{y} = **VBV**^{T} we find

The geometrical meaning is that an orthogonal matrix represents a *rotation* of vectors in an ECS by an angle *φ* around some axis given by a vector **r**.

However, an orthonormal matrix **O** with determinant +1 can also be interpreted such that it gives us the components of a vector in a new coordinate system ECS* _{W}* rotated in opposite direction (-

*φ*) against the original coordinate system ECS

_{Y}. The elements of a matrix

**B**transform during a transition from ECS

_{Y}to ECS

_{W}**OBO**

^{T}. The other way round,

**O**

^{-1}can be interpreted to give coordinates of a given vector

**in an ECS**

*y**rotated by +*

_{W}*φ*.

**V**, in particular, represents a rotation of an ECS* _{W}*, whose axes were aligned with the orthogonal eigenvectors of

**Σ**, onto ECS

_{Y}. The inverse matrix

**V**

^{-1}thus determines the component values of vectors

**in an ECS**

*y**with axes parallel to these eigenvectors. Therefore, our matrix*

_{W}**Σ**=

**V Λ V**

^{T}is a representation of

**Λ**in the rotated ECS

_{Y}. Or, if you like it to see the other way round,

**Λ**represents our

**Σ**in ECS

*.*

_{W}## The Mahalanobis distance in terms of spectral decomposition matrices

Let us combine our insights. We decide to chose a special **M** = **M**_{S} as indicated by the spectral decomposition

and find out, how far we get with this. First we have

Let us write down the Mahalanobis distance for a vector ** y** of a non-degenerate

**-MND::**

*Y*Thus, with ** y** =

**M**•

**, we can fulfill an essential condition of our construction of a non-degenerate MND:**

*z*What does this all mean geometrically?

## Recreation of the given N-MND from a Z-distribution

Let us first describe the creation of ** Y** in ECS

_{Y}for a given

**Σ**,

**Λ**and

**V**. The elementary operation

**M**

_{S}•

**to construct a MND obviously consists of**

*z***two**steps or operations:

**Step 1:** Pick a (spherically symmetric) ** Z**-distribution of vectors and stretch all vector components by the square root of respective positive eigenvalues of

**Σ**. I.e. transform our

**-vectors by**

*z*This obviously transforms the spheres of equal probability density of ** Z** into ellipsoidal surfaces with the

*main*axes of the ellipsoids being oriented along the ECS

_{Y}-axes. These

**-vectors can be interpreted as elements of a distribution given by a centered normal random vector**

*w***with independent Gaussian components and variances**

*W**λ*.

_{i}**Step 2:** Pick the vectors ** w** of

**and rotate them via an orthonormal**

*W***V**(defined by eigenvectors of

**– arranged in the order of the respective eigenvalues in**

**Σ****Λ**). Chose the sign of the eigenvectors such that

**V**becomes a rotation (det

**V**= +1).

## Reverse order: From **Σ** to V-matrices and respective *W*– and *Z*-distributions

**Σ**

We now revert the whole process for the analysis of a given ** Y** which seems to have properties of a N-MND in ECS

_{Y}. We follow three steps:

**Step A – determination of eigenvectors:** In ECS_{Y} we first determine the variance-covariance matrix **Σ** and its inverse (e.g. by numerical methods). We then calculate the *n* *orthonormal* eigenvectors of **Σ** and **Σ**^{-1}. Afterward we build a matrix **V** by using the eigenvectors as columns of this matrix. We choose the sign of the eigenvectors such that **V** defines a rotation (det **V** = +1). The eigenvectors define unit vectors along the axes of a new Euclidean coordinate system ECS_{W}. We organize respective eigenvalues λ_{1}, λ_{2}, .. λ_{n} in a matrix **Λ** the the same order as we positioned the eigenvectors as columns in the matrix **V**.

**Step B – Rotation of the coordinate system: **We now rotate ECS_{Y} by **V** such that it coincides with a new ECS_{W}. The components of a vector ** y** in ECS

_{W}are equal to the components of the following vector

**in ECS**

*w*_{Y}(!):

I.e, the inverse of **V**, i.e., **V**^{-1}, gives us the components of ** y** in ECS

_{W}.

If the distribution * Y* really were a N-MND, than

**V**

^{-1}would transform the contour ellipsoids of equal probability density into axis-parallell ellipsoids in ECS

_{W}. We can see this via a transformation of the square of the Mahalanobis distance by

**V**

^{-1}:

So, **Λ**^{-1} indeed represents **Σ**^{-1} in ECS_{W}. If * Y *really were a N-MND, the diagonal form would guarantee that the main axis of the transformed ellipsoidal hyper-surfaces were aligned with ECS

_{W}‘ s coordinate axes. Furthermore the components of the variances of the “de-correlated” Gaussians for the components would just be given by the eigenvalues λ

_{1}, λ

_{2}, .. λ

_{n}.

However, if ** Y **did not have the properties of a MND then we would not get axis-parallel ellipsoids in ECS

_{W}. The difference to the theoretical ellipsoids is something that we can investigate numerically.

**Step C –** **Scale to get (or not get) a spherically symmetric distribution**: As soon as we have our ** W**-distribution we can rescale it by applying

**Λ**

^{-1/2}. In case of an original N-MND

**tis would now give us a spherically symmetric distribution**

*Y***.**

*Z*## A comment on the meaning of rotated coordinate system in ML-contexts

The coordinate system we choose to work with in a ML context is typically given by some predefined set of variables – either corresponding directly to the properties of objects we work with or to already abstract orthogonal coordinates of the latent space of an ML algorithm (like e.g. an Autoencoder). When you move to a rotated ECS you should be very clear about one thing:

The new coordinates will be * abstract* ones. They (most often) have no direct interpretation in terms of the original properties of the objects we apply an ML-algorithm to.

The correlation of the original (natural) properties does **not** disappear by some magic when we go over to some *abstract* coordinates via a rotation of the ECS. And: Even in a coordinate system with axes-parallel ellipsoidal contours of a N-MND the ratios of the lengths of the main-axes of the ellipsoids would have fixed values. These ratios do not disappear via a rotation.

## Conclusion

In this post we have seen that we can use the variance-covariance matrix of a N-MND to determine a coordinate system in which the main axes of the ellipsoidal contour hyper-surfaces align with the coordinate axes. Did this remind you of a method often used in the context of classic ML-methods? Probably it did: You may have thought of PCA. However, before we get there, I want to present a more general definition of a MND in the next post. This will also bring closer to the topic of how to include and justify degenerate MNDs.