This page focus on the PCA of a multivariate Gaussian distribution. We'll describe here how to compute the PCA of a set of points. The aim here is to calculate:

- the center (mean) of the set of points
- the covariance matrix (standard deviations)
- the rotation angle (or rotation matrix)

Matlab scripts can be downloaded from the *Download* section of this page.

Let's consider the following set of points:

This set of \( N \) points has been picked at random with the following properties:

The center of the set of points is given by:

$$ C = \begin{bmatrix} 8 && 2 \end{bmatrix} $$

The covariance of the set of points is given by:

$$ \Sigma= \begin{bmatrix} 3 && 0\\ 0 && 1 \end{bmatrix} $$

The rotation of the set is: \( \alpha = \dfrac{\pi}{3} \)

Read this page for more details on covariance. The aim of PCA is to calculate the above parameters from the set of points.

In the following, will consider the set of points has the following coordinates:

$$ X = \begin{bmatrix} x_1 && y_1 \\ x_2 && y_2 \\ x_3 && y_3 \\ ... && ...\\ x_n && y_n \end{bmatrix} $$

The following matlab code compute the set of points

```
% Number of sample
N=1000;
% Standard deviation along X and Y
StdX=3;
StdY=1;
% Offset along X and Y
offsetX = 8;
offsetY = 2;
% Rotation of sample (correlation)
alpha=pi/3;
%% Create and display random sample
X = randn(N,2)
X = X*[StdX,0;0,StdY]';
X = X*[cos(alpha) , -sin(alpha) ; sin(alpha) , cos(alpha) ]';
X = X+[offsetX offsetY];
```

Calculating the center of the set of points is probably the simplest operation. The center is the mean of the \( x \) and \( y \) coordinates:

$$ \hat{C} = \begin{bmatrix} x_c && y_c \end{bmatrix} = \begin{bmatrix} \sum\limits_{i=1}^n x_i && \sum\limits_{i=1}^n y_i \end{bmatrix} $$

The following Matlab code compute the center of the set:

```
>> mean(X)
ans =
7.8775 1.9065
```

The center is close from the expected results \( \begin{bmatrix} 8 && 2 \end{bmatrix} \).

The next step is to compute the covariance matrix of the centered set of points. Before computing the covariance, we need to center the set of points, lets name \( X_c \) the set centered:

$$ X_c = \begin{bmatrix} x_1-x_c && y_1-y_c \\ x_2-x_c && y_2-y_c \\ x_3-x_c && y_3-y_c \\ ... && ...\\ x_n-x_c && y_n-y_c \end{bmatrix} $$

This covariance matrix is obtained by computing the following product:

$$ \Sigma = \dfrac{1}{N} \times X_c^\top \times X_c $$

The covariance can be calculated from the following equations:

$$ \Sigma = \begin{bmatrix} \dfrac{1}{N} \sum\limits_{i=1}^N (x_i-x_c)^2 && \dfrac{1}{N} \sum\limits_{i=1}^N (x_i - x_c)(y_i - y_c) \\ \dfrac{1}{N} \sum\limits_{i=1}^N (x_i - x_c)(y_i - y_c) && \dfrac{1}{N}\sum\limits_{i=1}^N (y_i-y_c)^2 \end{bmatrix} $$

In Matlab, the covariance can be calculated with the `cov()`

function:

```
>> cov (X-mean(X))
ans =
3.0458 3.7081
3.7081 7.5771
```

The covariance matrix contains both standard deviations and rotation matrices. These data can be extracted thanks to singular value decomposition (SVD). The result of the SVD is:

$$ \Sigma=R.(S.S).R^T $$

where:

- \(R\) is a rotation matrix (eigenvectors);
- \(S\) is a scaling matrix (square of eigenvalues).

To understand this decomposition, please read this section. I will not detail here how to compute the SVD, since I already detailed the calculation on this page: Singular value decomposition (SVD) of a 2×2 matrix.

On Matlab, the best option is to use the built-in `svd()`

function:

```
>> sigma = cov (X-mean(X));
>> [U,S,D] = svd(sigma)
U =
-0.4892 -0.8722
-0.8722 0.4892
S =
9.6569 0
0 0.9660
D =
-0.4892 -0.8722
-0.8722 0.4892
```

\( U \) is the rotation matrix and \(S\) contains the standard devitations.

Once the SVD is computed, it is quite easy to get the standard deviation along each axis by calculating :

$$ \Sigma = \begin{bmatrix} \sigma_x && 0 \\ 0 && \sigma_y \end{bmatrix} = \sqrt(S) $$

\( \sigma_x \) and \( \sigma_y \) are the standard deviations along each axis. It can also be seen as the sclaing factor along each axis.

Let's check with Matlab if our standard deviations are the expected values:

```
>> sqrt(S)
ans =
3.1076 0
0 0.9829
```

Since the expected values are 3 and 1, the result is quite close.

The next parameter we need to calculate is the rotation angle. The rotation matrix has already been calculated. This is the \( U \) matrix from the singular value decomposition. Since this matrix is a rotation matrix, we can now get the angle.

The rotation matrix is a rotation matrix around the Z-axis. This matrix is described by the following:

$$ U = = \begin{bmatrix} U_{11} && U_{12} \\ U_{21} && U_{22} \end{bmatrix} = R_z(\alpha) = \begin{bmatrix} cos(\gamma) && -sin(\gamma \\ sin(\gamma) && cos(\gamma) \end{bmatrix} $$

The angle \( \alpha\) can be calculated from the matrix coefficients. Since most
programming languages have the `atan2()`

function, this is the best way to get the
angle:

$$ \alpha = atan2(U_{12}, U_{11}) $$

Let's check with Matlab:

```
>> alpha = atan2(U(1,2), U(1,1))
alpha =
-2.0847
```

The expected angle is \( \dfrac{\pi}{3} \approx 1.0472 \). Since the ACP is symetrical, there are two options for the angle : \( \dfrac{\pi}{3} \) and \( \dfrac{\pi}{3} - \pi \approx -2.0944\). The second one is the answer computed in Matlab.

It may sometime be usefull to get the vectors representing the PCA as on this example:

To get the first vector, rotate and scale a unit vector along x-axis:

$$ \vec{u} = U \times \sqrt{S} \times \begin{bmatrix} 1 \\ 0 \end{bmatrix} $$

To get the second vector, rotate and scale a unit vector along y-axis:

$$ \vec{v} = U \times \sqrt{S} \times \begin{bmatrix} 0 \\ 1 \end{bmatrix} $$

With Matlab, we get the following vectors:

```
>> u=U*sqrt(S)*[1 ; 0]
u =
-1.4885
-2.6012
```

```
>> v=U*sqrt(S)*[0 ; 1]
v =
-0.8952
0.5123
```

The vector \( \vec{u} \) is displayed in red on the above figure, \( \vec{v} \) is displayed in green. Drawing the vectors confirms our results.

Here is the Matlab script used for drawing the figures and checking the equations.

principal_component_analysis.m

- Calculating the transformation between two set of points
- Catmull-Rom splines
- Check if a number is prime online
- Check if a point belongs on a line segment
- Cross product
- Common derivatives rules
- Common derivatives
- Dot product
- How to calculate the intersection points of two circles?
- How to check if four points are coplanar?
- Common integrals (primitive functions)
- Least square approximation with a second degree polynomial
- Least-squares fitting of circles
- Least-squares fitting of sphere
- Online square root simplifyer
- Sines, cosines and tangeantes of common angles
- Singular value decomposition (SVD) of a 2×2 matrix
- Tangent line segments to circles
- Understanding covariance matrices

Last update : 03/06/2022