# The mathematics behind PCA

This page focus on the PCA of a multivariate Gaussian distribution. We'll describe here how to compute the PCA of a set of points. The aim here is to calculate:

## Set of points

Let's consider the following set of points:

This set of $$N$$ points has been picked at random with the following properties:

The center of the set of points is given by:

$$C = \begin{bmatrix} 8 && 2 \end{bmatrix}$$

The covariance of the set of points is given by:

$$\Sigma= \begin{bmatrix} 3 && 0\\ 0 && 1 \end{bmatrix}$$

The rotation of the set is: $$\alpha = \dfrac{\pi}{3}$$

Read this page for more details on covariance. The aim of PCA is to calculate the above parameters from the set of points.

In the following, will consider the set of points has the following coordinates:

$$X = \begin{bmatrix} x_1 && y_1 \\ x_2 && y_2 \\ x_3 && y_3 \\ ... && ...\\ x_n && y_n \end{bmatrix}$$

The following matlab code compute the set of points

% Number of sample
N=1000;

% Standard deviation along X and Y
StdX=3;
StdY=1;

% Offset along X and Y
offsetX = 8;
offsetY = 2;

% Rotation of sample (correlation)
alpha=pi/3;

%% Create and display random sample
X = randn(N,2)
X = X*[StdX,0;0,StdY]';
X = X*[cos(alpha) , -sin(alpha) ; sin(alpha) , cos(alpha) ]';
X = X+[offsetX offsetY];

## Center

Calculating the center of the set of points is probably the simplest operation. The center is the mean of the $$x$$ and $$y$$ coordinates:

$$\hat{C} = \begin{bmatrix} x_c && y_c \end{bmatrix} = \begin{bmatrix} \sum\limits_{i=1}^n x_i && \sum\limits_{i=1}^n y_i \end{bmatrix}$$

The following Matlab code compute the center of the set:

>> mean(X)

ans =

7.8775    1.9065

The center is close from the expected results $$\begin{bmatrix} 8 && 2 \end{bmatrix}$$.

## Covariance

The next step is to compute the covariance matrix of the centered set of points. Before computing the covariance, we need to center the set of points, lets name $$X_c$$ the set centered:

$$X_c = \begin{bmatrix} x_1-x_c && y_1-y_c \\ x_2-x_c && y_2-y_c \\ x_3-x_c && y_3-y_c \\ ... && ...\\ x_n-x_c && y_n-y_c \end{bmatrix}$$

This covariance matrix is obtained by computing the following product:

$$\Sigma = \dfrac{1}{N} \times X_c^\top \times X_c$$

The covariance can be calculated from the following equations:

$$\Sigma = \begin{bmatrix} \dfrac{1}{N} \sum\limits_{i=1}^N (x_i-x_c)^2 && \dfrac{1}{N} \sum\limits_{i=1}^N (x_i - x_c)(y_i - y_c) \\ \dfrac{1}{N} \sum\limits_{i=1}^N (x_i - x_c)(y_i - y_c) && \dfrac{1}{N}\sum\limits_{i=1}^N (y_i-y_c)^2 \end{bmatrix}$$

In Matlab, the covariance can be calculated with the cov() function:

>> cov (X-mean(X))

ans =

3.0458    3.7081
3.7081    7.5771

## Singular value decomposition

The covariance matrix contains both standard deviations and rotation matrices. These data can be extracted thanks to singular value decomposition (SVD). The result of the SVD is:

$$\Sigma=R.(S.S).R^T$$

where:

• $$R$$ is a rotation matrix (eigenvectors);
• $$S$$ is a scaling matrix (square of eigenvalues).

To understand this decomposition, please read this section. I will not detail here how to compute the SVD, since I already detailed the calculation on this page: Singular value decomposition (SVD) of a 2×2 matrix.

On Matlab, the best option is to use the built-in svd() function:

>> sigma = cov (X-mean(X));
>> [U,S,D] = svd(sigma)

U =

-0.4892   -0.8722
-0.8722    0.4892

S =

9.6569         0
0    0.9660

D =

-0.4892   -0.8722
-0.8722    0.4892

$$U$$ is the rotation matrix and $$S$$ contains the standard devitations.

## Standard deviation

Once the SVD is computed, it is quite easy to get the standard deviation along each axis by calculating :

$$\Sigma = \begin{bmatrix} \sigma_x && 0 \\ 0 && \sigma_y \end{bmatrix} = \sqrt(S)$$

$$\sigma_x$$ and $$\sigma_y$$ are the standard deviations along each axis. It can also be seen as the sclaing factor along each axis.

Let's check with Matlab if our standard deviations are the expected values:

>> sqrt(S)

ans =

3.1076         0
0    0.9829

Since the expected values are 3 and 1, the result is quite close.

## Rotation

The next parameter we need to calculate is the rotation angle. The rotation matrix has already been calculated. This is the $$U$$ matrix from the singular value decomposition. Since this matrix is a rotation matrix, we can now get the angle.

The rotation matrix is a rotation matrix around the Z-axis. This matrix is described by the following:

$$U = = \begin{bmatrix} U_{11} && U_{12} \\ U_{21} && U_{22} \end{bmatrix} = R_z(\alpha) = \begin{bmatrix} cos(\gamma) && -sin(\gamma \\ sin(\gamma) && cos(\gamma) \end{bmatrix}$$

The angle $$\alpha$$ can be calculated from the matrix coefficients. Since most programming languages have the atan2() function, this is the best way to get the angle:

$$\alpha = atan2(U_{12}, U_{11})$$

Let's check with Matlab:

>> alpha = atan2(U(1,2), U(1,1))

alpha =

-2.0847

The expected angle is $$\dfrac{\pi}{3} \approx 1.0472$$. Since the ACP is symetrical, there are two options for the angle : $$\dfrac{\pi}{3}$$ and $$\dfrac{\pi}{3} - \pi \approx -2.0944$$. The second one is the answer computed in Matlab.

## PCA vectors

It may sometime be usefull to get the vectors representing the PCA as on this example:

To get the first vector, rotate and scale a unit vector along x-axis:

$$\vec{u} = U \times \sqrt{S} \times \begin{bmatrix} 1 \\ 0 \end{bmatrix}$$

To get the second vector, rotate and scale a unit vector along y-axis:

$$\vec{v} = U \times \sqrt{S} \times \begin{bmatrix} 0 \\ 1 \end{bmatrix}$$

With Matlab, we get the following vectors:

>> u=U*sqrt(S)*[1 ; 0]

u =

-1.4885
-2.6012
>> v=U*sqrt(S)*[0 ; 1]

v =

-0.8952
0.5123

The vector $$\vec{u}$$ is displayed in red on the above figure, $$\vec{v}$$ is displayed in green. Drawing the vectors confirms our results.