# Weighted PCA

This page focus on the weighted PCA of a multivariate Gaussian distribution. We'll describe here how to compute the weighted PCA of a set of points. The aim here is to calculate:

This page is the continuation of another article on the [calculation of the unweighted PCA](/en/mathematics/mathematics-behind-pca/. If the reader is not familiar with principal component analysis, it is strongly recommended to start by reading this page.

## Weighted set of points

Let's consider the following set of points:

This set of $$N$$ points has been picked at random with the following properties:

The center of the set of points is given by:

$$C = \begin{bmatrix} 2 && 7 \end{bmatrix}$$

The covariance of the set of points is given by:

$$\Sigma= \begin{bmatrix} 1 && 0\\ 0 && 1 \end{bmatrix}$$

What if we now associate weight to each point? The following figure show a set of points with the same properties as the previous one, except each point has a weight associated. The size of each point represents its weight:

As you can see, the points on the left are more weigthed, so this will change the PCA.

In the following, will consider the set of points has the following coordinates:

$$X = \begin{bmatrix} x_1 && y_1 \\ x_2 && y_2 \\ x_3 && y_3 \\ ... && ...\\ x_n && y_n \end{bmatrix}$$

The following matlab code compute the set of points:

% Dimension
D = 2;
% Number of samples
N = 1000;
% Offset
Offset = [2,7];

%% Sample
% Samples, each column is a random variable
X = randn(N,D);
% Offset
X = X + Offset ;

At each point $$x_i, y_i$$, a weight is associated:

$$W = \begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ ... && ...\\ w_n \end{bmatrix}$$

In the example presented on this page, the weighs are calculated with a Gaussian distribution on the x-axis. The closer the points are from the y-axis, the higher the weight is:

The formula used to calculate the weights is the following ( $$\mu=0.7$$ ):

$$w_i = 0.01 + \dfrac{1}{\mu \sqrt{2\pi}} e^{\frac{x_i^2}{2\mu^2}}$$

By superposing the weights and the Gaussian, the principle used to calculate the weights is easier to understand:

## Weighted center

Calculating the center of the set of points is similar to weighted mean of the $$x$$ and $$y$$ coordinates:

$$\hat{C} = \begin{bmatrix} x_c && y_c \end{bmatrix} = \begin{bmatrix} \dfrac{ \sum_{i=1}^n w_i.x_i}{\sum_{i=1}^n w_i} && \dfrac{ \sum_{i=1}^n w_i.y_i}{\sum_{i=1}^n w_i} \end{bmatrix}$$

The following Matlab code compute the center of the set:

>> C_w = sum(w.*X) / sum(w)

C_w =

0.7999    6.9403

The center is close to 7 for the y-coordinates (offset defined previously). For the x-coordinates, since the weights are higher close to zero, it attracts the center toward the axis.

## Covariance

The next step is to compute the covariance matrix of the centered set of points. Before computing the covariance, we need to center the set of points, lets name $$X_c$$ the set centered:

$$X_c = \begin{bmatrix} x_1-x_c && y_1-y_c \\ x_2-x_c && y_2-y_c \\ x_3-x_c && y_3-y_c \\ ... && ...\\ x_n-x_c && y_n-y_c \end{bmatrix}$$

This weighted covariance matrix is obtained by computing the following product:

$$\Sigma = \dfrac{1}{\sum_{i=1}^n w_i} \times w_i X_c^\top \times X_c$$

In Matlab, the covariance is calculated with the following code:

>> sigma_w =  (w.*(X-C_w))'*(X-C_w) / sum(w)

sigma_w =

0.6491   -0.0246
-0.0246    1.0454

## Singular value decomposition

Since the weighted covariance is calculated, the following is similar to the non-weighted PCA.

On Matlab, the best option is to use the built-in svd() function:

>> [U,S,D] =  svd(sigma_w)

U =

-0.0617    0.9981
0.9981    0.0617

S =

1.0469         0
0    0.6475

D =

-0.0617    0.9981
0.9981    0.0617

$$U$$ is the rotation matrix and $$S$$ contains the standard devitations.

## Results

Let's check if our PCA is correct:

There is no doubt about the result if we compare with the non-weighted PCA: