How to compute confidence ellipses – III – 4 alternative construction methods

In previous mathematical posts of this blog, we have studied some core properties of Bivariate Normal Distributions [BVDs]. During the rather mathematical tour de force we have come across various methods to construct and plot confidence ellipses for a given confidence level and respective Mahalanobis distance from the distribution’s center. We have also covered the mathematical derivation of the methods. See the last section for respective links.

In this post, I briefly list and summarize 4 major methods for convenience reasons. In the yet another post of this mini series, I will show graphical results of an application of each of the presented methods to samples of bivariate statistical data.

Previous posts of this mini-series:

Motivation: Confidence ellipses for the analysis of samples which resemble Bi- and Multi-Variate Normal Distributions

From what we have learned about BVDs so far, their contour and confidence ellipses are determined by the variance-covariance matrix of their vector distributions. The Pearson correlation coefficient plays a major role in this game. However, in real-world applications we deal with limited samples of bivariate data, which may or may not fully approximate a BVD. So, creating a confidence ellipse always starts with estimating the variance-covariance matrix of a potentially underlying BVD from the limited amount of data points available from of the given sample.

Such a matrix can always be calculated by numerical means from the sample’s data points – if the sample fulfills some conditions. The sample must be reasonably big enough and the data should be concentrated around a well defined center. To be useful, the numerically determined matrix must fulfill the criterion of invertibility – and this in turn imposes restrictions on the variation of the data points density with distance from the sample’s center.

Approximate BVDs
As soon as we have computed an invertible covariance matrix for a given 2-dimensional sample of data points, we can construct a confidence ellipse. We can do this even for samples (and underlying distributions) which do not follow the conditions of a BVD. See [1] for examples and plots. But, of course, confidence ellipses are most useful to analyze distributions which relatively closely resemble a BVD – at least up to some distance from a pronounced density center of the sample’s vectors.

In cases with deviations from a BVD at locations beyond some distance from the center, we can use the visual and mathematical apparent differences of computed contour lines from the theoretical confidence ellipses. Such differences may define a hopefully narrow transition region from a central BVD to some other form of statistical distribution at large Mahalanobis distances. Such a transition region would allow for different treatments of regions within or beyond some specific Mahalanobis distance.

Approximate MVDs
BVDs are also helpful for the analysis of multivariate and thus multi-dimensional data samples which with some probability stem from an underlying population following a “Multivariate Normal Distribution” [MVD] or from a population which at least approximates a MVD in certain parameter regions. While complex data samples in multiple dimensions may not directly be accessible via graphical tools, projections of the data onto 2-dimensional coordinate planes will give us manageable and visualizable 2-dimensional distributions. In case of an approximate MVD, the projections hopefully approximate BVDs, too. BVDs are low level marginal distributions of a MVD.

So checking for approximate BVDs and/or deviations from BVDs via confidence ellipses in 2D-projections is a valuable approach for better understanding the nature of multivariate samples. It may at least help to understand the nature of the pairwise correlations between certain variables better – and give us a first indication whether the total sample might follow a MVD. This in turn may trigger other, more thorough tests for MVD-properties.

The problem of fitting estimated confidence ellipses

For real samples of bivariate data only a fraction of data points may approximate a BVD. In my experience, even in Machine Learning cases with largely normal data, outside a 70% to 80% confidence level contour lines start to deviate from ellipses. This is only in part due to the sparsity of the data and resulting fluctuations in the data points density. In general, a real-world data sample will contain so called “outliers”, which do not follow the normal BVD-like relations.

The existence of such outliers may be due to different reasons. To name a few: Measurements of low quality, real examples which do not follow the norm in certain regions of the variable space, contamination of a probe, etc. The chance for outliers typically rises for large Mahalanobis distances from a samples center.

Robust covariance estimators

Unfortunately, outliers can have a significant impact on the numerical determination of the position of a sample’s center and the computation of the sample’s covariance matrix. In particular, outlier data point at large distances have a relatively large influence on the results. Therefore, it may be necessary to experiment with so called “robust covariance estimators” – as e.g. provided by Scikit-Learn – and adjust the results. However, respective algorithms like the “Minimum Covariance Determinant Estimation” [MCD] often require an estimated value of the fraction of outliers as an input parameter. Methods which do not need these fraction estimates are discussed in [9] (I have not tested these advanced methods).

How to get the required fraction of outliers? I have no standard recipe for it. For outliers beyond a certain Mahalanobis distance, it often helps to numerically calculate and plot contour lines and determine the fraction of points that lie outside the last ellipse-like contour. When you work purely numerically, you may have to investigate the deviation of the determined contour from elliptic conditions – and set a threshold for some integrated deviation to separate inner from outer regions. In more general cases the problem is that we have to look at density variations and not at singular data points.

For MVDs the analysis even for distanced outliers may get complicated as projections on the various different coordinate planes may show different numbers and regions of such outliers. It may also happen that some variable pairs deviate strongly from MVD/BVD correlation rules, while others respect the required correlations. The reasons must be analyzed also from a theoretical point of view. An additional PCA-analysis may help to discuss the importance of certain variables.

As soon as we have a clear idea about the number of outliers and their position, we can try to adjust the center and orientation of confidence ellipses for the regular “normal” data points, which appear to follow a BVD relatively closely. We re-estimate the coordinates of the density center for those regular points. For a better matrix estimation we can eliminate the probable outliers explicitly or we can use the already mentioned robust estimators, which do this job for us automatically (with the help of optimization conditions).

Estimation of the variance-covariance matrix

All of our methods are based on the following steps:

Step 1: Numerical determination of the density center of the distribution and choosing a new Cartesian coordinate system [CCS], whose origin coincides with center. The latter corresponds to a simple vector translation and provides us with a centered distribution. For many available covariance-estimators the calculation of the center coordinates happens implicitly during the calculation of the matrix elements. For graphical purposes, we must, however, know the data of the sample’s center explicitly. Many Python libraries offer respective algorithms.
Step 2: Numerical computation of the elements of the estimated variance-covariance matrix from the sample’s data. We can e.g. use the Numpy cov()-function for this purpose. This method will calculate mean x,y-values intrinsically. Or we use a robust matrix estimator of another library.
Step 3: We then interpret the numerically derived matrix elements as those of the variance-covariance matrix Σ and its inverse Σ^-1 of an underlying BVD (see [1]).

\[ \begin{align} &\pmb{\Sigma} \,=\, \begin{pmatrix} \sigma_x^2 &\rho\, \sigma_x\sigma_y \\ \rho\, \sigma_x\sigma_y & \sigma_y^2 \end{pmatrix} \,, \tag{1} \\[10pt] &\pmb{\Sigma}^{-1} \,=\, {1 \over \sigma_x^2\, \sigma_y^2\, \left( 1\,-\, \rho^2\right) } \, \begin{pmatrix} \sigma_y^2 &-\rho\, \sigma_x\sigma_y \\ -\rho\, \sigma_x\sigma_y & \sigma_x^2 \end{pmatrix} \\[10pt] &\quad \quad =\, \begin{pmatrix} \alpha & \beta/2 \\ \beta/2 & \gamma \end{pmatrix} \,, \tag{2} \\[10pt] & \pmb{\Sigma} \bullet \pmb{\Sigma}^{-1} \,=\, \mathbf I_n \,. \tag{3} \end{align} \]

Step 4: Construction of the confidence ellipses with the elements of Σ or Σ^-1 by one of the 4 methods described below

Important point regarding adjustments for certain Mahalanobis distances: When we use the values of the estimated matrix elements to compute confidence ellipses, we can always replace the standard values of σ_x and σ_y by d_m * σ_x and d_m * σ_y, to account for a Mahalanobis distance d_m ≠ 1 – and get a stretched ellipse for the respective confidence level. This has already been shown in other posts and corresponds to the fact that a BVD’s confidence ellipses are nested and all have the same orientation.

The relation to a cumulative probability P of finding a data point inside a confidence ellipse at d_m is given by

\[ d_m \:=\: d\left(P\right) \:=\: \sqrt{ \,-\, 2 \, \ln \left(\, 1 \,-\, P \, \right) } \,. \tag{4} \]

See [3] for a mathematical derivation of this formula. The elements of Σ thereby (indirectly) enable us to create contour ellipses for relevant values of the Mahalanobis distance – and related values of a cumulative probability giving us the number of data points inside the confidence ellipse. All methods discussed below use the Mahalanobis distance d_m (see [2]) as the defining element of creating selected and nested confidence ellipses.

Method 1 – based on a parameterization of the confidence ellipses

This method uses the parameterization discussed in [4]. The coordinates of points on the ellipse for a distance d_m are given by :

\[ \begin{align} x \:&=\: \sigma_x * d_m * \cos \phi \,, \tag{5} \\[10pt] y\:&=\: \sigma_y * d_m * \left( \, \rho * \cos \phi \,+\, \sqrt{\, 1 \,-\, \rho^2 \, } * \sin \phi \, \right) \,, \tag{6} \\[10pt] 0 \, &\le \, \phi \, \le \, 2\,\pi \,, \quad 0 \, \lt \, d_m \, \lt \, \infty \,. \tag{7} \end{align} \]

This method is very easy to implement in a Python program. The values of σ_x, σ_y and ρ are taken from the elements of the numerically computed matrix Σ. The number of numerically used points to cover the value range of angle Φ and the distribution of these points in the relevant interval should be chosen such that the ellipses curvature is represented well enough at extreme points.

Method 2 – based on a parameterization of the confidence ellipses via elements of the inverse Σ^-1 of the covariance matrix

This method is basically equivalent to method 1. However, it uses the elements of the inverse matrix Σ^-1 (indirectly given by the elements of Σ). I.e., we first must invert the numerically determined matrix Σ to get the elements α, β and γ of Σ^-1 acc. to eq. (2). We, of course, use numerical matrix inversion algorithms to achieve this.

Again: Instead of the standard values of σ_x and σ_y, we use scaled values d_m * σ_x and d_m * σ_y, to account for the right Mahalanobis distance of our confidence ellipse. With the resulting values of α, β and γ we then compute a matrix K_ch :

\[ \operatorname{\pmb{K}}_{ch} \:=\: \begin{pmatrix} \sqrt{ {4\, \gamma^{\phantom{A}} \over 4\,\gamma\,\alpha \,-\, \beta^2 } } & 0 \\ -\, {\beta \over 2 \gamma} \, \sqrt{ {4\, \gamma^{\phantom{A}} \over 4\,\gamma\,\alpha \,-\, \beta^2 } } & \sqrt{1\over \gamma} \, \end{pmatrix} \,. \tag{8} \]

Afterwards, matrix K_ch is applied to vectors defining a centered unit circle in our chosen centered coordinate system:

\[ \mbox{Ellipse, }\,\, {\mathbb{R}}^2 \,: \quad \pmb{v}_E \:=\: \operatorname{\pmb{K}}_{ch} \circ \begin{pmatrix} \cos \theta \\ \sin \theta \end{pmatrix}\,, \quad 0\,\le\, \theta \,\le\, 2 \pi \,. \tag{9} \]

See [5] and [6] for details and a derivation. The matrix operation can be done via LinAlg libraries, e.g. of Numpy . The vectors v_E eventually give us the aspired points on the confidence ellipses for d_m. The resolution of the unit circle, i.e. the number of numerically used points, has to be chosen according to resolution requirements.

Method 3 – confidence ellipse via eigenvalue analysis of the inverse Σ^-1 of the covariance matrix

Already in [7] we have seen that the eigenvalues of an invertible and positive-definite symmetric matrix like Σ^-1 define an ellipse via a quadratic form. The lengths of the ellipse’s half-axes are given by the eigenvalues of Σ^-1 (for d_m = 1). We again get the elements of Σ^-1 by the inversion of a numerically computed covariance matrix Σ – and a replacement of σ_x and σ_y by scaled values d_m * σ_x and d_m * σ_y to account for stretching to the right Mahalanobis distance and confidence level.

With the resulting α, β and γ we then compute the eigenvalues as

\[ \begin{align} \lambda_1 \:&=\: {1 \over 2} \left(\, \left( \alpha \,+\, \gamma \right) \,+\, \left[ \beta^2 \,+\, \left(\gamma \,-\, \alpha \right)^2 \,\right]^{1/2} \,\right) \,, \tag{10} \\[10pt] \lambda_2 \:&=\: {1 \over 2} \left(\, \left( \alpha \,+\, \gamma \right) \,-\, \left[ \beta^2 \,+\, \left(\gamma \,-\, \alpha \right)^2 \,\right]^{1/2} \, \right) \,. \tag{11} \end{align} \]

and respective half-axes

\[ \begin{align} h_x \,=\, \sqrt{\, \lambda_1^{\phantom{1}} } \,, \,\, \quad h_y \,=\, \sqrt{ \, \lambda_2^{\phantom{1}} } \,. \tag{12} \end{align} \]

I.e., we associate the longer half-axis always with the x-axis of an axis-parallel ellipse. After we have built such an ellipse with the help of an elementary parameterization

\[ \begin{align} & {x^ 2 \over h_x^ 2 } \,+\, {y^ 2 \over h_y^ 2 } \:=\: 1 \,, \\[10pt] & y \:=\: \pm \, h_y \, \sqrt{\, 1 \,-\, \left({x\over h_x}\right)^ 2 \, } \,. \tag{13} \end{align}\]

we rotate it by an angle Φ according to some rules dependent on values of the elements of Σ^-1:

\[ \begin{align} \gamma \, – \, \alpha \,\ge\, 0, \,\, \beta \,\lt\, 0, \,\, \rho \,\gt \,0 \,\,& : \,\,\quad \phi \,= \, \psi , \quad \quad \quad \,\,(\psi \gt 0) \,, \,\, \quad \quad 0 \,\le\, \phi \,\le\, \pi/4 \,, \\[10pt] \gamma \, – \, \alpha \,\ge\, 0, \,\, \beta \,\gt\, 0, \,\, \rho \,\lt \,0 \,\,& : \,\,\quad \phi \,= \, \psi , \quad \quad \quad \,\,(\psi \lt 0) \,, \,\, -\, \pi/4 \,\le\, \phi \le \, 0 \,, \\[10pt] \gamma \, – \, \alpha \,\le\, 0, \,\, \beta \,\lt\, 0, \,\, \rho \,\gt \,0 \,\,& : \,\,\quad \phi \,= \, \pi/2 \,-\, \psi , \,\,(\psi \gt 0) \,, \,\,\quad \pi/4 \,\le\, \phi \,\le\, \pi/2 \,, \\[10pt] \gamma \, – \, \alpha \,\le\, 0, \,\, \beta \,\gt\, 0, \,\, \rho \,\lt \,0 \,\,& : \,\,\quad \phi \,= \, \pi/2 \,-\, \psi , \,\,(\psi \lt 0) \,, \,\, \quad \pi/2 \,\le\, \phi \le \, 3 \,\pi/4 \,,\ \end{align} \tag{14} \]

with

\[ \begin{align} \psi \::&=\:\, {1 \over 2} \operatorname{arcsin}\left( {-\, \beta \over \left[ \beta^2 \, +\, \left( \gamma \,-\, \alpha \right)^2 \, \right]^{1/2} } \right) \,, \\[10pt] -\pi/2 \,& \le\, \psi \,\le\,\pi/2 \,. \end{align}\tag{15} \]

See [7] for the mathematics. This method can employ calculation and plotting algorithms that only require values for the half-axes and rotation angles of the ellipses as input parameters. The required rotation is a simple matrix operation – often directly offered by plotting libraries.

Method 4 – Construct ellipse with the help of the Pearson correlation coefficient, an axis-parallel ellipse, rotation by π/4 and stretching

The 4th method is based on ideas of C. Schelp (see [8], and [1], [9] for modifications/details). From the numerically calculated matrix Σ we pick the correlation coefficient ρ and construct an axis-parallel ellipse with the following half-axes h_S,x and h_S,y in x– and y-direction:

\[ \begin{align} E_S \,: \quad h_{S,x} \,&=\, \sqrt{ \, 1 \, +\, \rho^{\phantom{1}} \, } \,, \tag{5} \\[10pt] h_{S,y} \,& =\, \sqrt{ \, 1 \, – \, \rho^{\phantom{1}} \, } \,.\tag{6} \end{align} \]

E_S is then rotated by π/4. Afterward, the values of the data points on the resulting intermediate ellipse are stretched by d_m*σ_x and d_m*σ_y in x- and y-direction, respectively . This eventually gives us the confidence ellipse E_C by the following vectors (x, y)^T :

\[ \begin{pmatrix} x \\ y \end{pmatrix} \,=\, \begin{pmatrix} d_m\, \sigma_x & 0 \\ 0 & d_m\, \sigma_y \end{pmatrix} \bullet \begin{pmatrix} {1 \over \sqrt{2}} & -\, {1 \over \sqrt{2}} \\ {1 \over \sqrt{2}} & {1 \over \sqrt{2}} \end{pmatrix} \bullet \begin{pmatrix} x_S \\ y_S \end{pmatrix} \,. \tag{18} \]

See [9] for the math. Similar to method 3, method 4 can use calculation and plotting algorithms that only require input values for the half-axes and an rotation angle. The rotation and the final stretching operations are simple Linear Algebra operations, anyway.

Which method to use?

All of the named four methods have technical advantages and disadvantages. As the problem of determining the variance-covariance matrix is the same for all methods, the choice mainly depends upon the questions

whether we want to select points on a unit circle and the resulting resolution and perform an explicit construction of the ellipses by ourselves (methods 1 and 2)
or whether we leave resolution problems to plotting algorithms for ellipses – and only apply rotation and/or stretching operations to axis-parallel ellipses defined by values for their half-axes (methods 3 and 4).

The decision, which way to go may depend on the available libraries. In my opinion method 4 is a very straightforward and easily controllable method, which does not require matrix inversions and/subtle decisions regarding the rotation angle.

Conclusion

In this post I have listed and summarized 4 relatively simple methods to construct confidence ellipses for a bivariate sample of statistical data. In the next post of this series we will apply the methods together with standard and robust covariance estimators to carefully constructed data samples and samples of data from a real ML calculation.

Links and literature

[1] R. Mönchmeyer, 2025, “Compute confidence ellipses – I – simple method based on the Pearson correlation coefficient”,
https://machine-learning.anracom.com/2025/08/02/compute-confidence-ellipses-i-simple-method-based-on-the-pearson-correlation-coefficient/

[1] R. Mönchmeyer, 2025, “Bivariate normal distribution – derivation by linear transformation of a random vector of two independent Gaussians”,
https://machine-learning.anracom.com/2025/06/24/bivariate-normal-distribution-derivation-by-linear-transformation-of-random-vector-of-two-independent-gaussians/

[2] R. Mönchmeyer, 2025, “Bivariate Normal Distribution – integrated probability up to a given Mahalanobis distance, the Chi-squared distribution and confidence ellipses”,
https://machine-learning.anracom.com/2025/07/30/bivariate-normal-distribution-integrated-probability-up-to-a-given-mahalanobis-distance-the-chi-squared-distribution-and-confidence-ellipses

[3] R. Mönchmeyer, 2025, “Bivariate Normal Distribution – Mahalanobis distance and contour ellipses”,
https://machine-learning.anracom.com/2025/07/03/bivariate-normal-distribution-mahalanobis-distance-and-contour-ellipses/

[4] R. Mönchmeyer, 2025, “BivariateNormal Distributions – parameterization of contour ellipses in terms of the Mahalanobis distance and an angle”,
https://machine-learning.anracom.com/2025/07/07/bivariate-normal-distributions-parameterization-of-contour-ellipses-in-terms-of-the-mahalanobis-distance-and-an-angle/

[5] R. Mönchmeyer, 2025, “Cholesky decomposition of an ellipse-defining symmetric matrix”,
https://machine-learning.anracom.com/2025/07/20/cholesky-decomposition-of-an-ellipse-defining-symmetric-matrix/

[6] R. Mönchmeyer, 2025, “Bivariate normal distribution – explicit reconstruction of a BVD random vector via Cholesky decomposition of the covariance matrix”,
https://machine-learning.anracom.com/2025/06/27/bivariate-normal-distribution-explicit-reconstruction-of-a-bvd-random-vector-via-cholesky-decomposition-of-the-covariance-matrix

[7] R. Mönchmeyer, 2025, “Ellipses via matrix elements – I – basic derivations and formulas”,
https://machine-learning.anracom.com/2025/07/17/ellipses-via-matrix-elements-i-basic-derivations-and-formulas

[8] Carsten Schelp, “An Alternative Way to Plot the Covariance Ellipse”,
https://carstenschelp.github.io/2018/09/14/Plot_Confidence_Ellipse_001.html

[9] R. Mönchmeyer, 2025, “Compute confidence ellipses – II – equivalence of Schelp’s basic construction method for confidence ellipse with other approaches”,
https://machine-learning.anracom.com/2025/09/29/compute-confidence-ellipses-ii-equivalence-of-schelps-basic-construction-method-for-confidence-ellipse-with-other-approaches/

[10] P. Stoica, P. Bahu, P.Varshney, 2024, “Robust Estimation of the Covariance Matrix From Data With Outliers”, IEEE Open Journal of Signal Processing,
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10704043