Bivariate Normal Distribution – integrated probability up to a given Mahalanobis distance, the Chi-squared distribution and confidence ellipses

In previous posts of this blog we have discussed the general form of the probability density function [pdf] of a Bivariate Normal Distribution [BVD]. In this post we consider the integral over a BVD’s pdf up to a defined value of the Mahalanobis Distance. A given value of the latter defines an elliptic contour line of constant probability density. With our integration, we will evaluate the cumulative probability of finding a BVD-vector pointing into the area enclosed by the elliptic contour.

The integral is eventually given by an analytic expression corresponding to the famous chi-squared function for two degrees of freedom. We can invert the integral function to find a Mahalanobis distance for a given cumulative probability. In later posts on BVDs our results will help us to (numerically) construct confidence ellipses of (approximate) BVDs.

Related posts:

Previous findings

I summarize some of the results derived in previous posts. For the sake of simplicity, I only discuss BVDs centered in a Cartesian coordinate system [CCS]. The probability density function g_2c(x,y) of such a centered BVD and a respective random vector V = (X,Y)^T (composed of two statistical variables X and Y) can be written as:

\[ g_{2c}(x, y) \,=\, {1 \over 2 \pi \, \sigma_x \, \sigma_y } {1 \over { \sqrt{\, 1\,-\, \rho^2}} } \operatorname{exp} \left( – {1\over2} {1 \over 1\,-\, \rho^2 } \left[ {x^2 \over \sigma_x^2} \,-\, 2 \rho \, {x \over \sigma_x} {y \over \sigma_y} \,+\, {y^2 \over \sigma_y^2} \, \right] \right) \,. \tag{1} \]

A pair (x,y) marks a point in the 2-dimensional space. Equivalently the tuple gives us the components of a position-vector reaching from the CCS-origin to this point. g_2c(x,y) is normalized such that the integral over the whole 2-dimensional (x,y)-space (ℝ²) just has a value of 1.

ρ is the Pearson correlation coefficient of the statistical variables X and Y (see below). σ_x and σ_y are standard deviations for the X– and Y-distributions.

Using the following notations

\[ \pmb{V} = \begin{pmatrix} X \\ Y \end{pmatrix}, \quad \mbox{concrete values}: \, \pmb{v} = \begin{pmatrix} x \\ y \end{pmatrix}, \quad \pmb{\mu} = \begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \quad \pmb{v}_{\mu} = \begin{pmatrix} x – \mu_x \\ y – \mu_y \end{pmatrix} \]

and the symmetric matrix Σ^-1

\[ \pmb{\Sigma}^{-1} \,=\, {1 \over \sigma_x^2\, \sigma_y^2\, \left( 1\,-\, \rho^2\right) } \, \begin{pmatrix} \sigma_y^2 &-\rho\, \sigma_x\sigma_y \\ -\rho\, \sigma_x\sigma_y & \sigma_x^2 \end{pmatrix}, \tag{2} \]

we can rewrite the pdf of a centered BVD ( μ = 0) in the form

\[ \mbox{centered CCS :} \quad g_{2c}(x, y) \,=\, {1 \over 2 \pi \, \sigma_x \, \sigma_y } {1 \over { \sqrt{\, 1\,-\, \rho^2}} } \operatorname{exp} \left( – {1\over2} \, {\pmb{v}}^{\operatorname{T}} \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} \, \right) \,. \tag{3} \]

T symbolizes the transposition operation. The inverse Σ of Σ^-1 is the variance-covariance matrix, with ρ describing the correlation of X and Y :

\[ \pmb{\Sigma} \,=\, \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix}, \quad \pmb{\Sigma} \bullet \pmb{\Sigma}^{-1} = \mathbf I_n \,. \tag{5} \]

\[ \left| \, \operatorname{\pmb{\Sigma} } \, \right| \:=\: \sigma_x^2 \,\sigma_y^2 \, \left(\, 1 \,-\, \rho^2 \right) \,. \tag{6} \]

i.e.,

\[ \mbox{centered CCS :} \quad g_{2c}(x, y) \,=\, {1 \over 2 \pi \, \sqrt{\, \left|\, \operatorname{\pmb{\Sigma }} \, \right|^{\phantom{1}} \, } } \operatorname{exp} \left( – {1\over2} \, {\pmb{v}}^{\operatorname{T}} \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} \, \right) \,. \tag{7} \]

We require that Σ and Σ^-1 are symmetric, invertible and positive-definite matrices to avoid a discussion of degenerate cases (where the contour ellipses collapse to 1-dimensional objects). ρ is related to the covariance of the distributions X,Y:

\[ \rho \,=\, { \operatorname{cov} (X,Y) \over \sigma_x\,\sigma_y} \,. \tag{8} \]

g_2c(x,y) depends on the inverse of Σ, only. The scalar d_m given by the exponent of a BVD’s pdf

\[ \begin{align} \left(d_m\right)^2 \,&:=\, {\pmb{v}}^T \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} \,, \\[10pt] d_m \,&:=\, \sqrt{ \, {\pmb{v}}^{T^{\phantom{A}}} \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} } \,. \end{align} \tag{9} \]

is called Mahalanobis distance. Setting d_m = const. defines an elliptic contour line. Reason: A quadratic form like

\[ {\pmb{v}}^T \bullet \, \pmb{\Sigma}^{-1} \bullet {\pmb{v}} \, = \, const. \,=\, \left(d_m\right) ^2 \,, \tag{10} \]

(for a symmetric, invertible and positive definite matrix Σ^-1), defines vectors whose end-points reside on the border of a centered ellipse. All contour ellipses of a centered BVD are concentric ellipses. Their half-axes and orientation in the CCS can be calculated with the help of the eigenvalues, eigenvectors of Σ^-1 or Σ. See ; see this post for details and formulas.

Integral of the probability density over the area of an ellipse defined by a constant Mahalanobis distance

Let us assume that we want to integrate the probability density function from the CCS’ origin up to a Mahalanobis distance

\[ d_m \,=\, D \,. \tag{11} \]

Let us name the corresponding contour ellipse E_D and the enclosed area A_D. This ellipse defines a maximum x-value x_max(E_D). For each x-value there is a minimum y-value y_min(x,E_D ) and a maximum y-value y_max(x, E_D) given by the ellipses properties. So, we want to evaluate an integral of the type

\[ p(D) \:=\: \iint\limits_{\pmb{A_D}} \, g_{2c}(x,y) \, dx \, dy \:=\: \int\limits_0^{x_{max}(E_D)} \, \int\limits_{y_{min}(x,E_D)}^{y_{max}(x, E_D)} \, g_{2c}(x,y() \, dy \, dx \,. \tag{12} \]

The integral gives us the probability p(D) that a statistically selected vector of our centered BVD reaches from the CCS-origin to a point within the area A_D (enclosed by the ellipse E_D.)

To properly solve this integral, a parameterization of x and y in terms of an angle and a distance d_m would be helpful. Actually, we know such a parameterization already. In a previous post, we have seen that any contour ellipse (given by d_m) of a BVD can be parameterized in the following way

\[ \begin{align} x \,&=\, d_m * \sigma_x * \cos \theta \,, \\[10pt] y \,&=\, d_m * \sigma_y * \left[\, \rho * \cos \theta \,+\, \sqrt{\, 1 \,-\, \rho^2 \, }* \sin \theta \, \right] \,. \\[8pt] \end{align} \tag{13} \]

with

\[ 0 \,\le\, \theta \,\lt\, 2\,\pi \,, \quad 0 \,\le\, d_m \,\le\, \infty \,. \tag{14} \]

As all contour ellipses are concentric, we can cover the area of E_D densely by continuously increasing d_m.

Changing the integral to variables θ and d_m

To properly change the integral from (x, y) to our new variables (θ, d_m) we have to take into account the determinant of the transformation’s Jacobian matrix. As all contour lines are defined as concentric ellipses we may expect that the Jacobian determinant will just depend on d_m. We have

\[\begin{align} \partial x /\partial d_m \,&=\, \sigma_x \, \cos \theta \,, \quad \partial x/\partial \theta \,=\,= – \,\sigma_x \, d_m\, \sin \theta \,, \\[10pt] \partial y /\partial d_m \,&=\, \sigma_y \,\left(\, \rho \, \cos \theta \,+\, \sqrt{\, 1 \,-\, \rho^2 \,} \sin \theta \,\right)\,, \\[10pt] \partial y /\partial \Phi \,&=\, \sigma_y \, d_m\, \left(\, -\, \rho \, \sin \theta \,+\, \sqrt{\, 1 \,-\, \rho^2 \,} \cos \theta \,\right) \,, \end{align} \tag{15}\]

and this, indeed, means

\[ \left| \begin{pmatrix} \partial x / \partial d_m & \partial x / \partial \theta \\ \partial y / \partial d_m & \partial y/\partial \theta \end{pmatrix} \right| \:=\: \sigma_x \, \sigma_y * d_m * \sqrt{\, 1 \,-\, \rho^2 \,} \:=\: d_m * \sqrt{ \, \left| \operatorname{\pmb{\Sigma}} \right| \, } \,. \tag{16} \]

Because of eqs. (10) and (16) our integral, i.e. the integrated probability p(D), becomes

\[ \begin{align} p\left( D \right) \:&=\: \iint\limits_{\pmb{A_D}} \, g_{2c}(x,y) \, dy \, dx \\[10pt] & =\: {1 \over 2\, \pi \, \sqrt{ \, \left| \operatorname{\pmb{\Sigma}} \right|^{\phantom{1}} \, } } \int_0^{2\,\pi} d\theta\, \, \int_0^D \, d_m * \sqrt{ \, \left| \operatorname{\pmb{\Sigma}} \right|^{\phantom{1}} } \, \exp \left( -\, {1\over 2} \, d_m^2 \right) \, d(d_m) \\[10pt] &= \: 1 – \exp \left( – \,{1\over 2} \, D^2 \right) \,. \tag{17} \end{align} \]

The integral is easy to solve. As the ellipses are concentric, we can eventually replace the edge D again with some given Mahalanobis distance d_m – and get the probability p(d_m) :

\[ p \left( d_m \right) \:=\: 1 \,- \, \exp \left( -\,{1\over 2} \, d_m^2 \right) \,. \tag{18} \]

Comparison to the chi-squared distribution

Now, readers experienced in statistics will probably recognize that the result is equivalent to the cumulative distribution function F(x; k=2) for a chi-squared distribution, i.e. the integral over the probability density function f(x; k) of the chi-squared distribution χ² (see here):

\[ \begin{align} f_{\chi^2} (x; k) \:&=\: { x^{(k/2) -1} \, \exp(-\, x/2) \over 2^{k/2} \,\Gamma \left( {k \over 2 } \right) } \, \quad \text{for } \,\, x \,\gt\, 0 \,, \\[10pt] f_{\chi^2}(x; k) \:&=\: 0 \,, \quad \text{ for } \,\, x \,\le\, 0\, . \tag{19} ] \end{align} \]

Γ is the “gamma function”. This pdf can be integrated to give the cumulative function F(x; k=2)

\[ F\left(x; \, k\right) \:=\: { \gamma\left( k/2, x/2 \right) \over \Gamma \left( {k / 2 } \right) } \,, \tag{20} \]

with γ() being the so called “lower incomplete gamma function”. Whatever, in the case k=2, you can directly integrate f(x; 2). The integration gives us a very simple function as the result:

\[ F \left( x; \, 2 \right) \:=\: 1 \,-\, \exp(- \, x/2 ) \,. \tag{21} \]

For x = (d_m)² we get the equivalence with p(d_m). In a later post in this blog we will see that something similar happens for Multivariate Normal Distributions in n dimensions.

Inversion of the integrated probability p(d_m) to get the Mahalanobis distance for confidence ellipses

In statistics as well in a Machine Learning context we often need to solve a different task, namely to plot confidence levels for a given cumulative probability p. This means that we need to calculate d_m as by the inverse function d(p) of p(d_m). Fortunately, the inversion is simple in our case:

\[ -\, {1 \over 2}\, d_m^ 2 \:=\: \ln\left(\, 1 \,-\, p\, \right ) \,. \]

This can be evaluated further, and we get

\[ d_m \:=\: d_m\left(p\right) \:=\: \sqrt{ \,-\, 2 \, \ln \left(\, 1 \,-\, p \, \right) } \,. \tag{22}\]

Eq. 22 lets us construct a so called confidence ellipse for any given value of the integrated probability p (with the help of e.g, eqs.(13)). The following list gives you the approximate d_m-values for interesting standard values of p:

p = 0.70 : d_m = 1.552 ,
p = 0.75 : d_m = 1.665 ,
p = 0.80 : d_m = 1.794 ,
p = 0.85 : d_m = 1.948 ,
p = 0.90 : d_m = 2.146 ,
p = 0.95 : d_m = 2.448 .

As soon as you have a method to create an ellipse defined by its Mahalanobis distance (e.g. via eqs. (13) and (14)), you are able to plot confidence ellipses for a given cumulative probability such that the ellipse covers a region containing a p percent quantile of all data points of the distribution.

Chi-squared properties as a test whether a numerical distribution is a BVD

Let us assume that you have a statistical set of data points in two dimensions and you get the impression that it might be a BVD. You then may numerically derive the variance-covariance matrix of the given data points – and define respective Mahalanobis distances and contour ellipses. If you are able to numerically verify that the cumulative probability within those ellipses really follows an χ² -distribution with growing (d_m)², then you have got a very convincing indication that the underlying distribution o your set indeed is a BVD – at least in a rather good approximation.

Conclusion

In this post we have shown how we can integrate a BVD’s probability density function up to a given Mahalanobis distance. We could establish an analytical expression describing the relation between the respective cumulative probability for finding data points within a BVD’s contour ellipse. The integral function could be inverted – and provided us with a formula to find the Mahalanobis distance for a confidence ellipse associated with a certain cumulative probability level and a respective quantile of all data points.

In a forthcoming post we will use our results and all in all 4 methods to construct confidence ellipses for 2-dimensional datasets which resemble a BVD.