Bivariate Normal Distribution – Mahalanobis distance and contour ellipses

I continue with my posts on Bivariate Normal Distributions [BVDs]. In this post we consider the exponent of a BVD’s probability density function [pdf]. This function is governed by a central matrix Σ^-1, the inverse of the variance-covariance matrix of the BVD’s random vector. We define the so called Mahalanobis distance d_m for BVD vectors. A constant value of the Mahalanobis distance defines a contour line of the probability density via a quadratic form. The resulting ellipses can be constructed using results of the eigendecomposition of the covariance matrix Σ or of its inverse Σ^-1.

Related posts:

Reminder: Density function of a bivariate normal distribution

I summarize some results of previous posts listed above. A given statistical data distribution can be centered in a Cartesian Coordinate System [CCS] by choosing an appropriate location of the origin: The expectation values for the marginal distributions are thereby set to zero. This corresponds to a simple translation. So, without loosing much of generality, I will only discuss centered BVDs below.

The probability density function g_2c(x,y) of a (centered) BVD and a respective random vector V = (X,Y)^T (composed of two statistical variables X and Y) can be written as:

\[ g_{2c}(x, y) \,=\, {1 \over 2 \pi \, \sigma_x \, \sigma_y } {1 \over { \sqrt{\, 1\,-\, \rho^2}} } \operatorname{exp} \left( – {1\over2} {1 \over 1\,-\, \rho^2 } \left[ {x^2 \over \sigma_x^2} \,-\, 2 \rho \, {x \over \sigma_x} {y \over \sigma_y} \,+\, {y^2 \over \sigma_y^2} \, \right] \right) \,. \tag{1} \]

ρ is the so called Pearson correlation coefficient of the statistical variables X and Y (see below). σ_x and σ_y are standard deviations for the X– and Y-distributions. Using the following notations

\[ \pmb{V} = \begin{pmatrix} X \\ Y \end{pmatrix}, \quad \mbox{concrete values}: \, \pmb{v} = \begin{pmatrix} x \\ y \end{pmatrix}, \quad \pmb{\mu} = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad \pmb{v}_{\mu} = \begin{pmatrix} x – \mu_x \\ y – \mu_y \end{pmatrix} \]

and inserting the symmetric matrix Σ^-1

\[ \pmb{\Sigma}^{-1} \,=\, {1 \over \sigma_x^2\, \sigma_y^2\, \left( 1\,-\, \rho^2\right) } \, \begin{pmatrix} \sigma_y^2 &-\rho\, \sigma_x\sigma_y \\ -\rho\, \sigma_x\sigma_y & \sigma_x^2 \end{pmatrix}, \tag{2} \]

we can rewrite the density of a centered BVD ( μ = 0) in the form

\[ \mbox{centered CCS :} \quad g_{2c}(x, y) \,=\, {1 \over 2 \pi \, \sigma_x \, \sigma_y } {1 \over { \sqrt{\, 1\,-\, \rho^2}} } \operatorname{exp} \left( – {1\over2} \, {\pmb{v}}^{\operatorname{T}} \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} \, \right) \,. \tag{3} \]

For the sake of completeness, here the formula for μ ≠ 0 :

\[ \mbox{generell CCS :} \quad g_2(x, y) \,=\, {1 \over 2 \pi \, \sigma_x \, \sigma_y } {1 \over { \sqrt{\, 1\,-\, \rho^2}} } \operatorname{exp} \left( – {1\over2} \, {\pmb{v}_{\mu}}^{\operatorname{T}} \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}_{\mu}} \, \right) \,. \tag{4} \]

T symbolizes the transposition operation. The inverse Σ of Σ^-1 is the variance-covariance matrix, with ρ coupling X and Y :

\[ \pmb{\Sigma} \,=\, \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix}, \quad \pmb{\Sigma} \bullet \pmb{\Sigma}^{-1} = \mathbf I_n \,. \tag{5} \]

We require that Σ and Σ^-1 are invertible matrices to avoid a discussion of so called degenerate cases (where the ellipse collapses to a 1-dimensional object). For ρ one can show (see here) that it really is related to the covariance of the distribution:

\[ \rho \,=\, { \operatorname{cov} (X,Y) \over \sigma_x\,\sigma_y} \,. \tag{6} \]

Therefore, ρ indeed is a coefficient characterizing the correlation of the X and Y data.

Note that both the matrices Σ and Σ^-1 are symmetric and invertible. I.e., each of them is its own transposed matrix.

A notation often used to describe a two-dimensional normal distribution is

\[ \pmb{V}_B \,=\, \mathcal{N} \left[ \pmb{v}_{\mu},\, \pmb{\Sigma} \right] \,=\, \mathcal{N} \left[ \begin{pmatrix} \mu_x \\ \mu_y\end{pmatrix}, \, \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix} \right] \,. \tag{7} \]

Note that g_2c(x,y) depends on the inverse of Σ, only.

Mahalanobis distance

The scalar d_m given by the exponent of a BVD’s pdf

\[ \begin{align} \left(d_m\right)^2 \,:=\, {\pmb{v}}^T \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} \,, \\[10pt] d_m \,:=\, \sqrt{ \, {\pmb{v}}^{T^{\phantom{A}}} \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} } \,. \tag{8} \end{align} \]

is also called the Mahalanobis distance. It reminds us strongly of a norm.

Actually, it measures a kind of “distance” of a point (= endpoint of a vector) in the BVD-distribution from its center. It is clear that setting d_m = const. means g_2c(x,y) = const., and thus defines the data of a contour line of constant probability density.

I have shown in detail elsewhere that an equation

\[ \left(d_m\right) ^2 \,:=\, {\pmb{v}}^T \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} \, = \, const. \tag{9} \]

(for a symmetric and invertible matrix Σ^-1) gives us vectors with end-points on the border of a centered ellipse. (d_m)² = const. defines a so called quadratic form for conic sections – in our case for an ellipse. See

The first of the named posts is a substantially improved version of the older second post, which wa published in a sister post. (Note: In the second post the ellipse’s half-axes were unfortunately named σ1 and σ2. They must not be confused with the standard deviations named σ_x/y used in the present post.)

Note: When I refer to ellipses below, I always mean its border line and not its area contents – if not stated otherwise.

Ellipses rotated against the coordinate axes

In the posts named above you will find how the coefficients of a general symmetric and real valued (2×2)-matrix A_q in the following equation for 2-dimensional vectors v

\[ {\pmb{v}}^T \bullet \, \operatorname{\pmb{A}}_q \bullet {\pmb{v}} \, = \, 1 \,, \tag{10} \]

\[ \pmb{\operatorname{A}}_q \:=\: \begin{pmatrix} \alpha & \beta / 2 \\ \beta / 2 & \gamma \end{pmatrix} \tag{11} \]

are related to properties of a corresponding ellipse. Regarding the vector components of v, A_q (with constant elements) defines a general quadratic form – characteristic of an ellipse. Note that in the sense of the BVD discussion above and interpreting Σ^-1 as an example for a matrix A_q, equ. (10) corresponds to

\[ {\pmb{v}}^T \bullet \, \operatorname{\pmb{A}}_q \bullet {\pmb{v}} \, = \, d_m^2 \,=\, 1 \,. \tag{12} \]

We first focus on this special value of d_m and give a generalization later on.

In the posts quoted above, I have in particular shown why and how the directions of the main axes of the ellipse are given by the direction of the eigenvectors of the coupling matrix A_q (here of Σ^-1). The length of the axes are given by reciprocate value of the square roots of the inverse eigenvalues. See below.

Geometric interpretation

In the posts on ellipses mentioned above, I have shown that a general ellipse defined by A_q can be created from

scaling the orthogonal half-axes of a unit circle (centered in a CCS) along the coordinate axes
and afterward rotating the resulting figure (and its constituting position vectors).

How can we relate this to Σ^-1 ?

Well, in a previous post I have shown that a centered BVD can be created by applying a linear transformation (in form of a matrix M) on a centered random vector Z = (Z₁, Z₂)^T . The components Z₁, Z₂ are random variables representing two independent normalized and standardized Gaussian distributions. In a CCS concrete vectors z of Z (and their end-points) are distributed according to the probability density function

\[ g_{\small Z}(\pmb{z}, \pmb{0}, \pmb{\operatorname{I}} ) \, = \, {1 \over (2\, \pi) } \, {\large e}^{ – \, {\Large 1 \over \Large 2} \left( {\Large \pmb{z}^T} \, \bullet \, {\Large \pmb{z}} \right) } \, = \, {1 \over (2\, \pi) } \, {\large e}^{ – \, {\Large 1 \over \Large 2} \left( {\Large \pmb{z}^T} \,\bullet\, {\Large \operatorname{\pmb{I}}} \,\bullet\, {\Large \pmb{z}} \right) } \,. \tag{13} \]

The symmetric Σ and Σ^-1 for our BVD incorporate a potential linear matrix M in the sense of

\[ \operatorname{\pmb{\Sigma}} \:=\: \operatorname{\pmb{M}} \bullet \, \operatorname{\pmb{M}}^{\operatorname{T}} \,. \tag{14} \]

M maps Z to a distribution V whose vectors follow the pdf of a centered BVD:

\[ \begin{align} \pmb{Z} \, &=\, \begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix} , \quad \pmb{V} \, =\, \begin{pmatrix} X \\ Y \end{pmatrix}, \\[10pt] \pmb{V} \, &= \, \operatorname{\pmb{M}} \bullet \, \pmb{Z}, \, \quad \pmb{Z} \,=\, \operatorname{\pmb{M}}^{-1} \bullet \pmb{V} \,, \quad \left| \operatorname{\pmb{M}} \right| \, \ne \, 0 \,, \\[10pt] g_v(\pmb{v}) \:&=\: g_{2c} (x, y) \,. \end{align} \tag{15}\]

So, we map vectors linearly and thereby also their probability density function. Note that, following our line of argumentation for the linear transformation in my last post, we, obviously, map vector end-points for z-vectors having a unit length to vectors v having d_m = 1.

\[ \begin{align} \pmb{z} \:\: & \rightarrow \:\: \pmb{v} \\[10pt] \sqrt{\,\pmb{z} \bullet \pmb{z}^{\operatorname{T}^{\phantom{A}}} \,} \:=\: 1 \:\: &\rightarrow \:\: d_m\left(\pmb{v}\right) \:=\: 1 \,. \end{align} \]

A unit circle is mapped to an ellipse with d_m = 1. In general and for scaled circles given by z-vectors, we have :

\[ \begin{align} \operatorname{\pmb{M}} \,: \,\quad \quad \pmb{z} \quad & \rightarrow \quad \pmb{v} \,=\, \operatorname{\pmb{M}} \bullet \, \pmb{z} \, \\[10pt] \Rightarrow \quad \|\pmb{z}\|^2 \,=\, \pmb{z} \bullet \pmb{z}^{\operatorname{T}^{\phantom{A}}} \:=\: r^2 \, \quad &\rightarrow \quad d_m^2(\pmb{v}) \:=\: {\pmb{v}}^{\operatorname{T}^{\phantom{A}}} \bullet \, \pmb{\Sigma}^{-1} \bullet \pmb{v} \:=\: r^2 \,. \tag{16} \end{align}\]

But, how do we understand the scaling and rotation occurring in the course of this mapping?

Eigendecomposition of Σ as a key for a geometric understanding of BVDs

For a symmetric and invertible matrix, as A_q or Σ^-1, we can always find a decomposition which can be interpreted as a combination of a stretching or scaling operation and a sub-sequent rotation. This corresponds to the facts

that a symmetric invertible matrix A_q has real eigenvalues and perpendicular eigenvectors.
that a symmetric invertible matrix as A_q or has an eigendecomposition of the form A_q = Q * Λ * Q^T, with Q being an orthogonal matrix (as e.g. a rotational matrix parameterized by an angle), with Q^T = Q^-1. Its columns are given by eigenvectors A_q . Λ, instead, is a diagonal matrix with the eigenvalues λ_u and λ_d of A_q on its diagonal. This is true because a transformation by orthogonal matrices does not change eigenvalues of a matrix.

\[ \pmb{\operatorname{A}_q} \:=\: \begin{pmatrix} \alpha & {1 \over 2}\, \beta \\ {1 \over 2}\, \beta & \gamma \end{pmatrix} \:=\: \pmb{\operatorname{Q}} \, \circ \, \begin{pmatrix} \lambda_u & 0 \\ 0 & \lambda_d \end{pmatrix} \, \circ \, \pmb{\operatorname{Q}}^{\operatorname{T}} \,. \tag{17} \]

The eigenvalues directly correspond to the two perpendicular main half-axes r_x and r_y of the (rotated) ellipse:

\[ \pmb{\operatorname{A}_q} \:=\: \pmb{\operatorname{Q}} \, \circ \, \begin{pmatrix} {1 / r_x^2} & 0 \\ 0 & {1 / r_y^2} \end{pmatrix} \, \circ \, \pmb{\operatorname{Q}}^{\operatorname{T}} \,. \tag{18} \]

See the articles on ellipses and matrices quoted above for more information.

This means that a general symmetric matrix and a relation like eq. (9) define a centered ellipse, but with its main axes rotated against the coordinate axes of the CCS. The lengths of the ellipse’s half-axes are given by the eigenvalues as

\[ \begin{align} r_x \:&=\: {1 \over \sqrt{\lambda_u}} \,, \\[10pt] r_y \:&=\: {1 \over \sqrt{\lambda_d}} \,. \end{align} \tag{19} \]

The direction of the half-axes of the ellipse is given by the perpendicular eigenvectors. Theses eigenvectors only depend on parameters of A_q.

Identifying Σ^-1 with the central matrix of a quadratic form for a contour ellipse

To get information about a BVD’s contour lines, we now identify the general matrix A_q for an ellipse as the BVD’s covariance matrix Σ^-1 :

\[ \begin{align} \pmb{A}_q \:&=\: \begin{pmatrix} \alpha & {1 \over 2}\, \beta \\ {1 \over 2}\, \beta & \gamma \end{pmatrix} \\[10pt] =\: \pmb{\Sigma}^{-1} \:&=\: {1 \over\left( 1\,-\, \rho^2\right) } \, \begin{pmatrix} {1 \,/\, \sigma_x^2} & -\,\rho * \left(\,1 \,/\, \left( \sigma_x \, \sigma_y\right) \,\right) \\ -\,\rho * \,/\, \left(\sigma_x \, \sigma_y \right) & {1 \,/\, \sigma_x^2} \end{pmatrix} \,. \end{align} \tag{20} \]

Its inverse Σ is a bit simpler (see above and here):

\[ \begin{align} \pmb{A}_q^{-1} \:&=\: \begin{pmatrix} a & {1\over 2 }\, b \\ {1\over 2 }\, b & c \end{pmatrix} \\[10pt] \:=\: \pmb{\Sigma} \:&=\: \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix} \,. \end{align} \tag{21} \]

Note: The eigenvalues λ₁ and λ₂ of Σ^-1 have reciprocate values of the eigenvalues ξ₁ and ξ₂ of Σ.

\[ \begin{align} \operatorname{\pmb{A}}_q \,=\, \operatorname{\pmb{\Sigma}}^{-1} \:&\Rightarrow \: \mbox{eigenvalues} \: : \quad \,\, \lambda_1, \quad \quad \quad \lambda_2 \,, \\[10pt] \operatorname{\pmb{A}}_q^{-1} \,=\, \operatorname{\pmb{\Sigma}} \:&\Rightarrow \: \mbox{eigenvalues} \: : \quad \xi_1 = {1 \over \lambda_1} , \:\: \xi_2 = {1 \over \lambda_2} \,. \end{align} \tag{22} \]

Eigenvalues of Σ, Σ^-1 and the lengths of the ellipse’s half-axes

In another post of this blog

Eigenvalues and eigenvector of a positive-definite, real valued and symmetric matrix

I have already derived formulas for the eigenvalues of symmetric, real valued matrices. The formulas there give us the eigenvalues of Σ^-1 in terms of

\[ \begin{align} \alpha \,& =\, {1 \over (1 \,-\, \rho^2 ) } \, {1\over \sigma_x^2} \,, \\[10pt] \beta \,& =\, – \,2 * \rho * {1 \over \sigma_x\, \sigma_y } \,, \\[10pt] \gamma \,&= \, {1 \over (1 \,-\, \rho^2 )} \, {1\over \sigma_y^2} \,, \end{align} \tag{23} \]

\[ \begin{align} \lambda_1 \:&=\: {1 \over a_L^2} \:=\: {1\over 2} \, \left[ \left(\, \alpha + \gamma \right) \, – \, \left[\, \beta^2 \,+\, \left(\alpha \,-\, \gamma \right)^2 \, \right]^{1/2} \,\right] \,, \\[10pt] \lambda_2 \:&=\: {1 \over a_S^2} \:=\: {1\over 2} \, \left[ \left(\, \alpha + \gamma \right) \, + \, \left[\, \beta^2 \,+\, \left(\alpha \,-\, \gamma \right)^2 \, \right]^{1/2} \,\right] \,. \end{align} \tag{24} \]

a_L and a_S are the longer and the shorter of the half-axes, respectively. We have to check

whether a_L = r_x , a_S = r_y <=> r_x > r_y <=> λ₁ = λ_u , λ₂ = λ_d <=> σ_x > σ_y
or a_S = r_x , a_L = r_y <=> r_x < r_y <=> λ₁ = λ_d , λ₂ = λ_u <=> σ_x < σ_y .

Note:

λ₁ always refers to the longer half-axis.
One can show that the difference between the two cases distinguished above is given by (σ_x > σ_y) OR (σ_x < σ_y).
The half-axes a_L and a_S depend on all parameters ρ, σ_x and σ_y.

The given expressions can unfortunately not be made much simpler than they are. For Machine Learning we do not care much about this obstacle, as for most practical purposes we would use numerical values, anyway.

The angle θ by which our ellipse is rotated against the x-coordinate axis of our CCS was found to be given by

\[ \sin \left(2\,\theta\right) \:=\: \,\mp\, { \beta \over \left[ \,\beta^2 \,+\, \left(\alpha \,-\,\gamma\right)^2\,\right]^{1/2} } \,. \tag{25} \]

The minus-sign holds for (σ_x > σ_y), the plus sign for (σ_x < σ_y).

There are still multiple solutions. By a more detailed analysis (or by trial and error 🙂 ) one can show that with

\[ \psi \:=\: {1 \over 2} \, \arcsin \left(\, { -\, \beta \over \left[ \,\beta^2 \,+\, \left(\alpha \,-\,\gamma\right)^2\,\right]^{1/2} } \right) \,, \tag{26} \]

we should choose the following options for specific conditions

\[ \begin{align} \sigma_x \, & \ge\, \sigma_y \, \,\land\, \rho \,\gt\, 0 \,\,: \quad \theta = \psi \,, \quad \mbox{with} \quad \psi \gt 0 \,, \\[10pt] \sigma_x \, & \ge\, \sigma_y \, \,\land\, \rho \,\lt\, 0 \,\,: \quad \theta = \psi \,, \quad \mbox{with} \quad \psi \lt 0 \,, \\[14pt] \sigma_x \, & \lt\, \sigma_y \,\land\, \rho \,\gt\, 0 \,\,: \quad \theta = {1 \over 2} \, \pi \,- \, \psi \,, \quad \mbox{with} \quad \psi \gt 0 \,, \\[10pt] \sigma_x \, & \lt\, \sigma_y \,\land\, \rho \,\lt \, 0 \,\,: \quad \theta = {1\over 2} \, \pi \,- \psi \,, \quad \mbox{with} \quad \psi \lt 0 \,. \end{align} \tag{27} \]

In an improved version of this post, I will provide a thorough derivation and a recipe for the reconstruction of the contour ellipse from the elements of the covariance matrix. If you are interested, please visit this post again in some days.

We now have found a way to construct an ellipse for a Mahalanobis distance d_m = 1:

We first build an ellipse which half-axes a_e = (1/λ₁)^1/2 and b_e = (1/λ₂)^1/2 parallel to the CCS’ s x-axis and y-axis, respectively.
Then we rotate the ellipse by angle θ in counter-clockwise direction. The rotated a_e has an angle of θ with the x-axis of the CCS.

But, how to construct the ellipses for larger values of d_m?

Other constant values of d_m?

You can achieve something analogous to eqs. (24) and (25) for any value of d_m. The expressions in the exponents are quadratic terms of the vectors. A doubling of vector lengths ||z|| = 2 means a squared value ||z||² in the pdfs’ exponents, but d_m in the end simply changes by the same factor d_m = 2. We found this already in eq. (16):

\[ \Rightarrow \quad \|\pmb{z}\|^2 \,=\, \pmb{z} \bullet \pmb{z}^{\operatorname{T}^{\phantom{A}}} \:=\: r^2 \, \quad \rightarrow \quad d_m^2(\pmb{v}) \:=\: {\pmb{v}}^{\operatorname{T}^{\phantom{A}}} \bullet \, \pmb{\Sigma}^{-1} \bullet \pmb{v} \:=\: r^2 \,. \tag{28} \]

Bringing the right side back into a form where we have 1 as the target value scales the vector v by r. [Another point of view would be to accept , A_q (= Σ^-1) as changeable. Then all matrix coefficients and also the eigenvalues would be scaled by r².] We end up with scaling the half-axes of the new ellipse just by a factor r. Original z vectors on a circle with radius r get mapped to ellipses with d_m = r.

So, constructing a contour ellipse for vectors d_m(v) = r, just requires to start with an axis-parallel ellipse that has half-axes scaled by r

\[ \begin{align} &d_m(\pmb{v}) \:=\: \sqrt{\, {\pmb{v}}^{\operatorname{T}^{\phantom{A}}} \bullet \, \pmb{\Sigma}^{-1} \bullet \pmb{v} \,} \:=\: r \quad\Rightarrow \\[10pt] & r_1\:=\: r * {1 \over \sqrt{\lambda_1}} \,, \quad r_2 \:=\: r * {1 \over \sqrt{\lambda_2}} \,. \end{align} \tag{29} \]

Again with the point that we have to determine, how the longer axis r₁ > r₂ relates to the cases (σ_x > σ_y) or (σ_x < σ_y). We will deepen the understanding of these cases in forthcoming posts.

Note:

The angle θ remains unchanged! Despite the scaling, it depends on the coefficients of the (unchanged) matrix, only.
This also shows that contour ellipses for different constant values of the probability density of a BVD are concentric ellipses!

(If you had changed the matrix coefficients the factors would cancel each other in the fraction of eq. (25).) We will show that this gives the correct results by the help of numerical calculations and plots in the next post.

Special case: Standardized distributions σ_x = 1, σ_y = 1

Let us assume that someone has given us data of a (approximate) bivariate distribution in standardized form, such that the standard deviations of the marginals fulfill

\[ \sigma_x \,=\, 1, \quad \sigma_y = 1 \,. \]

Note that the standardization corresponds to a scaling of original x and y values to x’ = x/σ_x and y’ = y/σ_y. We assume that we also have determined the Pearson correlation coefficient ρ for our data. Then we get a rather interesting and helpful result:

\[ \begin{align} \lambda_1 \:&=\: {1 \over a_e^2} \:=\: {1\over 2} \, * {1 \over \left(\, 1 \,-\,\rho^2\,\right) } \, \left[ \, 2 \, – \, 2 * \rho \,\right] \:=\: {1 \over 1 \,+\, \rho} \,, \tag{30} \\[10pt] \lambda_2 \:&=\: {1 \over b_e^2} \:=\: {1\over 2} \, * {1 \over \left(\, 1 \,-\,\rho^2\,\right) } \, \left[ \, 2 \,+\, 2*\rho \,\right] \:=\: {1 \over 1 \,-\, \rho} \quad \Rightarrow \tag{31}\\[12pt] a_e \:&=\: \sqrt{ \, 1 \,+\, \rho \, } \,, \quad b_e \:=\: \sqrt{ \, 1 \,-\, \rho \, } \,, \quad \theta \:=\: \pi / 4\,. \tag{32} \end{align} \]

This, actually, gives rise to another method for the construction of contour ellipses. I will describe in one of the next posts.

Conclusion

In this post we have seen that constant values of the Mahalanobis distance d_m of a BVD define concentric ellipses, which at the same time are contour lines of the BVD’s probability density.

Such contour ellipses can be constructed by calculating their perpendicular half-axes as reciprocate square roots of the eigenvalues of the covariance matrix and by determining the inclination angle θ of the half-axes against the axes of the Cartesian coordinate system we work with. θ is also given by elements of the BVD’s covariance matrix.

In the next posts I will first discuss a simple method to parameterize contour ellipses of a BVD by the Mahalanobis distance and an angle 0 ≤ φ ≤ 2π. In addition we will find yet another way to construct contour ellipses.