This page explains how the PageRank algorithm works. There are several ways to calculate the PageRank of a graph. This page explains how to implement the PageRank algorithm according to the Google original paper.
Quoting from the Google original paper, PageRank is defined like this:
We assume page \(A\) has pages \(T1...Tn\) which point to it (i.e., are citations). The parameter \(d\) is a damping factor which can be set between 0 and 1. We usually set \(d\) to 0.85. There are more details about \(d\) in the next section. Also \(C(A)\) is defined as the number of links going out of page \(A\). The PageRank of a page \(A\) is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.
PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web.
The PageRank of each page is then calculated with the following formula:
$$ PR(A) = (1-d) + d\left(\dfrac{PR(T1)}{C(T1)} + ... + \dfrac{PR(Tn)}{C(Tn} \right) $$
Damping factor can be seen as the following according to Sergey Brin and Lawrence Page:
PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page.
Let's consdider the following simple web graph:
$$ \begin{align} C(1) &= 1 \\ C(2) &= 1 \\ C(3) &= 2 \\ \end{align} $$
One problem of the algorithm is that to compute the PageRank, we need an initial value of the PageRank. Actualy, no matter what initial values are taken at the begining, the algorithm will always converge. Since "the sum of all web pages' PageRanks will be one." we'll start with a inital PageRank for each page equal to 1 (not 1/3, you'll understand why in the last section):
$$ \begin{align} PR(1) &= 1 \\ PR(2) &= 1 \\ PR(3) &= 1 \\ \end{align} $$
Let's detail the first iteration for each page:
Page 1 has one backlink from page 3:
$$ PR(1) = (1-d) + d\left(\dfrac{PR(3)}{C(3)}\right) $$ $$ PR(1) = (1-0.85) + 0.85\left(\dfrac{1}{2}\right) $$ $$ PR(1) = 0.575 \label{eq:PR1} $$
Page 2 has two backlink from page 1 and 3:
$$ PR(2) = (1-d) + d\left(\dfrac{PR(1)}{C(1)} + \dfrac{PR(3)}{C(3)}\right) $$ $$ PR(2) = (1-0.85) + 0.85\left(\dfrac{0.575}{1} + \dfrac{1}{2}\right) $$ $$ PR(2) = 1.06375 $$
Note that PageRank of page 1 has already been updated at equation \eqref{eq:PR1}: \(PR(1)=0.575\).
Page 3 has two backlink from page 2:
$$ PR(3) = (1-d) + d\left(\dfrac{PR(2)}{C(2)}\right) $$ $$ PR(3) = (1-0.85) + 0.85\left(\dfrac{1.425}{1}\right) $$ $$ PR(3) = 1.0541 $$
After the first iteration the PageRanks are:
$$ \begin{align} PR(1) &= 0.5750 \\ PR(2) &= 1.0637 \\ PR(3) &= 1.0542 \\ \end{align} $$
If we repeat the algorithm over the next iterations, here are the PageRanks:
Iteration | \(PR(1)\) | \(PR(2)\) | \(PR(3)\) |
---|---|---|---|
1 | 0.575 | 1.064 | 1.054 |
2 | 0.598 | 1.106 | 1.090 |
3 | 0.613 | 1.135 | 1.115 |
4 | 0.624 | 1.154 | 1.131 |
5 | 0.631 | 1.167 | 1.142 |
6 | 0.635 | 1.175 | 1.149 |
7 | 0.638 | 1.181 | 1.154 |
8 | 0.640 | 1.185 | 1.157 |
9 | 0.642 | 1.187 | 1.159 |
10 | 0.643 | 1.189 | 1.160 |
Here is the evolution of PageRanks over 20 steps. We can see that the values converge assymptotically:
Over 100 steps, the final PageRanks are:
$$ \begin{align} PR(1) &= 0.6444 \\ PR(2) &= 1.1922 \\ PR(3) &= 1.1634 \\ \end{align} $$
As stated Sergey Brin and Lawrence Page: "the sum of all web pages' PageRanks will be one.". Let's check:
$$ 0.6444 + 1.1922 + 1.1634 = 3 $$
Not exactly what I call one. You have to read "the normalised sum of all web pages' PageRanks" instead of "the sum of all web pages' PageRanks", in other words the average:
$$ \dfrac{0.6444 + 1.1922 + 1.1634 }{3} = 1 $$
Here is why we started with a PageRank of 1 instead of 1/3.
PageRank algorithm has been implemented with Matlab. All results presented on this page has been calculated with the following script: