PageRank Random Surfer

Introduction

In the Google original paper, Sergey Brin and Lawrence Page stated the PageRank can be thought of as a model of user behavior:

PageRank can be thought of as a model of user behavior. We assume there is a " random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page.

This page compares the PageRank algorithm and the random surfer on some example to check if the results are similar. We will not detail the PageRank algorithm already explained on this page.

Adjacency matrix

The random surfer algorithm is implemented on MATLAB. The network is described with an adjacency matrix that represents links between pages:

$$ A = \left[ \begin{matrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 0 \end{matrix} \right] $$

The above adjacency matrix describes the following directed graph:

Directed graph described by the adjacency matrix

Values on each node represent the PageRank already calculated with the Google PageRank algorithm presented on this page.

Random surfer algorithm

To implement the random surfer algorithm, let's start with the following vector for the number of visits for each page (node of the graph). N is the number of nodes (here N=3):

Visits = zeros(N,1);

Let's define the starting page the surfer picked randomly:

Page = randi(N);

Let's define \(d\) the damping factor, the probability at each page the random surfer will get bored and request another random page.

d=0.85;

The algorithm is quite simple, here we loop 1000 times. At each step:

%% Itearate 1000 times
for i=2:1000

    %% Increase visits for the current page
    Visits(Page,1) = Visits(Page,1) + 1;

    %% Randomly pick a new page from the links going out of the current page
    % L are pages linked by Page
    L=find (A(Page,:)==1);
    % Idx is the index of the link
    Idx = randi(length(L));
    % Next page
    Page = L(Idx);

    %% Damping factor, the surfer request another random page 15% of the time
    if (rand(1)>d)
        Page = randi(N);
    end
end

At the end, the PageRank is calculted by normalizing the visits :

$$ PageRank(i) = \dfrac{N*Visits(i)}{\sum_{i=1}^{N}{Visits(i)}} $$

Here is the MATLAB code:

PR = N*Visits/sum(Visits);

Results

Letting the random surfer travelling over 1000 pages provides the following results:

$$ Visits = \left[ \begin{matrix} 215 \\ 402 \\ 382 \end{matrix} \right] $$

The associated PageRank is:

$$ PageRank = \left[ \begin{matrix} 0.6456 \\ 1.2072 \\ 1.1471 \end{matrix} \right] $$

First observation, the PageRanks calculated with the random surfer is very close to the PageRank algorithm:

Node PageRank Random Surfer
1 0.6444 0.6456
2 1.1922 1.2072
3 1.1634 1.1471

Here is the evolution of the PageRanks with the random surfer algorithm:

PageRank evolution over steps with the random surfer algorithm

The algorithm needs about 200 steps to converge where the Google PageRank algorithm needs less than 20 steps as shown on the figure below:

Convergence of PageRank over steps

Discussion

No links out

Of course, the random surfer algorithm does not work is a page has no links going out. I tryed to implement a new version where the surfer pick a random page if the current page has no link. But the results are not consistent with the PageRank algorithm.

No incomming links

Another interesting fact is when a page has no incomming link:

Graph with a node without incomming link

As before, the value for each node are the PageRanks calculated with the Google PageRank algorithm presented on this page.

Node PageRank Random Surfer
1 0.1500 0.1411
2 1.4595 1.4595
3 1.3905 1.3994

Values of the random surfer algorithm are consistent.

Disconnected graph

The last case is a disconnected graph:

Random surfer algorithm on a disconnected graphe

Since the surfer jumps on a new page 15% of the time, the whole graph is thus visited:

Node PageRank Random Surfer
1 1 0.9690
2 1 0.9449
3 1 1.0370
4 1 1.0490

Note that the random surfer is less accurante than the Google PageRank algorithm.

Download

Random surfer algorithm has been implemented with Matlab. All results presented on this page has been calculated with the following script:

pagerankRandomSurfer.m

See also


Last update : 10/03/2022