Weighted PCA

Weighted PCA computed on set of points

This page focus on the weighted PCA of a multivariate Gaussian distribution. We'll describe here how to compute the weighted PCA of a set of points. The aim here is to calculate:

This page is the continuation of another article on the [calculation of the unweighted PCA](/en/mathematics/mathematics-behind-pca/. If the reader is not familiar with principal component analysis, it is strongly recommended to start by reading this page.

Matlab scripts can be downloaded from the Download section of this page.

Weighted set of points

Let's consider the following set of points:

Original non-weighted set of points

This set of \( N \) points has been picked at random with the following properties:

The center of the set of points is given by:

$$ C = \begin{bmatrix} 2 && 7 \end{bmatrix} $$

The covariance of the set of points is given by:

$$ \Sigma= \begin{bmatrix} 1 && 0\\ 0 && 1 \end{bmatrix} $$

Read this page for more details on covariance.

By reading this page, the reader will understand that the non-weighted PCA is displayed on the following figure:

PCA of non-weighted set of points

What if we now associate weight to each point? The following figure show a set of points with the same properties as the previous one, except each point has a weight associated. The size of each point represents its weight:

Weighted set of points

As you can see, the points on the left are more weigthed, so this will change the PCA.

In the following, will consider the set of points has the following coordinates:

$$ X = \begin{bmatrix} x_1 && y_1 \\ x_2 && y_2 \\ x_3 && y_3 \\ ... && ...\\ x_n && y_n \end{bmatrix} $$

The following matlab code compute the set of points:

% Dimension
D = 2; 
% Number of samples
N = 1000; 
% Offset
Offset = [2,7];

%% Sample
% Samples, each column is a random variable
X = randn(N,D); 
% Offset
X = X + Offset ;

At each point \( x_i, y_i \), a weight is associated:

$$ W = \begin{bmatrix} w_1 \\ w_2 \\ w_3 \\ ... && ...\\ w_n \end{bmatrix} $$

In the example presented on this page, the weighs are calculated with a Gaussian distribution on the x-axis. The closer the points are from the y-axis, the higher the weight is:

Value of the weight is a Gaussian function of the x-coordinates of each point

The formula used to calculate the weights is the following ( \( \mu=0.7 \) ):

$$ w_i = 0.01 + \dfrac{1}{\mu \sqrt{2\pi}} e^{\frac{x_i^2}{2\mu^2}} $$

By superposing the weights and the Gaussian, the principle used to calculate the weights is easier to understand:

Weighted points a Gaussian used to calculate the weights

Weighted center

Weighted center of the pca

Calculating the center of the set of points is similar to weighted mean of the \( x \) and \( y \) coordinates:

$$ \hat{C} = \begin{bmatrix} x_c && y_c \end{bmatrix} = \begin{bmatrix} \dfrac{ \sum_{i=1}^n w_i.x_i}{\sum_{i=1}^n w_i} && \dfrac{ \sum_{i=1}^n w_i.y_i}{\sum_{i=1}^n w_i} \end{bmatrix} $$

The following Matlab code compute the center of the set:

>> C_w = sum(w.*X) / sum(w) 

C_w =

    0.7999    6.9403

The center is close to 7 for the y-coordinates (offset defined previously). For the x-coordinates, since the weights are higher close to zero, it attracts the center toward the axis.

Covariance

The next step is to compute the covariance matrix of the centered set of points. Before computing the covariance, we need to center the set of points, lets name \( X_c \) the set centered:

$$ X_c = \begin{bmatrix} x_1-x_c && y_1-y_c \\ x_2-x_c && y_2-y_c \\ x_3-x_c && y_3-y_c \\ ... && ...\\ x_n-x_c && y_n-y_c \end{bmatrix} $$

This weighted covariance matrix is obtained by computing the following product:

$$ \Sigma = \dfrac{1}{\sum_{i=1}^n w_i} \times w_i X_c^\top \times X_c $$

In Matlab, the covariance is calculated with the following code:

>> sigma_w =  (w.*(X-C_w))'*(X-C_w) / sum(w)

sigma_w =

    0.6491   -0.0246
   -0.0246    1.0454

Singular value decomposition

Since the weighted covariance is calculated, the following is similar to the non-weighted PCA.

On Matlab, the best option is to use the built-in svd() function:

>> [U,S,D] =  svd(sigma_w)

U =

   -0.0617    0.9981
    0.9981    0.0617

S =

    1.0469         0
         0    0.6475

D =

   -0.0617    0.9981
    0.9981    0.0617

\( U \) is the rotation matrix and \(S\) contains the standard devitations.

Results

Let's check if our PCA is correct:

Weighted PCA computed on set of points

There is no doubt about the result if we compare with the non-weighted PCA:

Weighted and non weighted pca

Download

Here is the Matlab script used for drawing the figures and checking the equations.

weighted_principal_component_analysis.m

See also


Last update : 01/20/2023