Weighted PCA

This page focus on the weighted PCA of a multivariate Gaussian distribution. We'll describe here how to compute the weighted PCA of a set of points. The aim here is to calculate:

This page is the continuation of another article on the [calculation of the unweighted PCA](/en/mathematics/mathematics-behind-pca/. If the reader is not familiar with principal component analysis, it is strongly recommended to start by reading this page.

Weighted set of points

Let's consider the following set of points:

This set of $$N$$ points has been picked at random with the following properties:

The center of the set of points is given by:

$$C = \begin{bmatrix} 2 && 7 \end{bmatrix}$$

The covariance of the set of points is given by:

$$\Sigma= \begin{bmatrix} 1 && 0\\ 0 && 1 \end{bmatrix}$$

What if we now associate weight to each point? The following figure show a set of points with the same properties as the previous one, except each point has a weight associated. The size of each point represents its weight:

As you can see, the points on the left are more weigthed, so this will change the PCA.

In the following, will consider the set of points has the following coordinates:

$$X = \begin{bmatrix} x_1 && y_1 \\ x_2 && y_2 \\ x_3 && y_3 \\ ... && ...\\ x_n && y_n \end{bmatrix}$$

The following matlab code compute the set of points:

% Dimension
D = 2;
% Number of samples
N = 1000;
% Offset
Offset = [2,7];

%% Sample
% Samples, each column is a random variable
X = randn(N,D);
% Offset
X = X + Offset ;

At each point $$x_i, y_i$$, a weight is associated:

$$W = \begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ ... && ...\\ w_n \end{bmatrix}$$

In the example presented on this page, the weighs are calculated with a Gaussian distribution on the x-axis. The closer the points are from the y-axis, the higher the weight is:

The formula used to calculate the weights is the following ( $$\mu=0.7$$ ):

$$w_i = 0.01 + \dfrac{1}{\mu \sqrt{2\pi}} e^{\frac{x_i^2}{2\mu^2}}$$

By superposing the weights and the Gaussian, the principle used to calculate the weights is easier to understand:

Weighted center

Calculating the center of the set of points is similar to weighted mean of the $$x$$ and $$y$$ coordinates:

$$\hat{C} = \begin{bmatrix} x_c && y_c \end{bmatrix} = \begin{bmatrix} \dfrac{ \sum_{i=1}^n w_i.x_i}{\sum_{i=1}^n w_i} && \dfrac{ \sum_{i=1}^n w_i.y_i}{\sum_{i=1}^n w_i} \end{bmatrix}$$

The following Matlab code compute the center of the set:

>> C_w = sum(w.*X) / sum(w)

C_w =

0.7999    6.9403

The center is close to 7 for the y-coordinates (offset defined previously). For the x-coordinates, since the weights are higher close to zero, it attracts the center toward the axis.

Covariance

The next step is to compute the covariance matrix of the centered set of points. Before computing the covariance, we need to center the set of points, lets name $$X_c$$ the set centered:

$$X_c = \begin{bmatrix} x_1-x_c && y_1-y_c \\ x_2-x_c && y_2-y_c \\ x_3-x_c && y_3-y_c \\ ... && ...\\ x_n-x_c && y_n-y_c \end{bmatrix}$$

This weighted covariance matrix is obtained by computing the following product:

$$\Sigma = \dfrac{1}{\sum_{i=1}^n w_i} \times w_i X_c^\top \times X_c$$

In Matlab, the covariance is calculated with the following code:

>> sigma_w =  (w.*(X-C_w))'*(X-C_w) / sum(w)

sigma_w =

0.6491   -0.0246
-0.0246    1.0454

Singular value decomposition

Since the weighted covariance is calculated, the following is similar to the non-weighted PCA.

On Matlab, the best option is to use the built-in svd() function:

>> [U,S,D] =  svd(sigma_w)

U =

-0.0617    0.9981
0.9981    0.0617

S =

1.0469         0
0    0.6475

D =

-0.0617    0.9981
0.9981    0.0617

$$U$$ is the rotation matrix and $$S$$ contains the standard devitations.

Results

Let's check if our PCA is correct:

There is no doubt about the result if we compare with the non-weighted PCA: