The web as we know it today didn’t look the same in its early days. Web page authority based on links pointing to them is now perceived as the norm. But it was revolutionary back in 1998, when Google introduced the PageRank algorithm to make outbound link assessment a valid ranking factor. While PageRank has definitely played a crucial role in the evolution of SEO and its techniques, it’s not clear if it still matters after 2018, when the original patent has expired. In this post, we’ll look into the history of PageRank, explain how it’s calculated, and discover if it’s still applied to rankings.
What is PageRank
PageRank is an algorithm for ranking web pages based on the number and quality of links pointing to them. It was developed by Google pioneer engineers Larry Page and Sergey Brin in 1998 and marked the first successful attempt of any search engine to assess the level of authority a given web page had. Basically, it meant that a page would get higher rankings with the more backlinks it had.
As the engineers explain it in the original paper, PageRank was aimed to “bring order to the web” by distributing weights across pages. They built the algorithm on the idea of a random internet surfer who visits a page and gets to other pages by clicking on links. The probability that a random surfer reaches a certain page is that page’s PageRank. The score is calculated based on a logarithmic scale between 0 and 10 where 10 represents the most trustworthy web source there can be.
PageRank is an objective measure that aligns with searchers’ subjective intentions: the more sources pointing to a page, the more valuable the information on that page and the more likely users are to visit it.
But the referring sources are not equal—the number of pages that link to them is measured as well: the more backlinks a referring page has, the more PageRank power it passes on a page it links out to. Let’s explore it in more detail.
How it’s calculated
Here’s the original PageRank formula:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
- A is the analyzed page
- T1…Tn are the pages pointing to the analyzed page
- C is the number of links placed on the analyzed page
- d is a damping factor that corresponds to the probability that a user will abandon a page (usually set to 0.85)
When pages cast votes on other pages by citing them, they distribute their PageRank. For example, page A has a PageRank score of 5 and it links to pages B and C. In isolation from other links that pages B and C might have, pages B and C receive 85% of page A’s score (4.25) combined (the score multiplied by the damping factor). If page B cites page D, D’s PageRank score will include 85% of B’s score, and so on.
Let’s examine a simple example of PageRank distribution made with a PageRank simulator:
Page 3 here has the highest PageRank score because it is linked out to the most. And because page 3 has the highest score, the PageRank it passes on pages 4 and 5 is also higher. Naturally, this calculation is done in isolation from a real-world scenario, assuming that only these 5 pages exist on the web, but it shows, in a simplified manner, how the value of PageRank is distributed across web pages.
Since PageRank is an authority metric, the power passed through links is calculated hierarchically: a citation from a PageRank 8 page weighs more than a citation from a PageRank 2 page. But your page can get a higher PageRank value through links from less authoritative pages if they generally use fewer citations. Say, your page is referred to from a PageRank 7 source that contains 10 outbound links and also from a PageRank 3 source that contains only 3 links. The first source will pass the PageRank worth of 0.105 (0.7 multiplied by the damping factor) and the second will bring your page 0.15. However, high-quality and popular pages don’t usually link out to lots of other pages so it’s always best to concentrate on getting backlinks from the most trusted sites.
PageRank toolbar and link manipulation
In 2000, Google made the PageRank score of any website publicly visible on the browser toolbar. Such exposure led to ranking manipulations called PageRank sculpting: website owners and SEOs would concentrate on getting more links from high-scoring pages and whole link farms emerged to help people buy the links. Such an understanding of the algorithm—that revolved around getting as many links as possible from pages with as high a score as possible without considering the context of links and many other aspects—was not a sustainable SEO practice.
Google made different attempts to stop ranking manipulations with PageRank and eventually ceased the toolbar in 2016. We can still see online services that calculate the PageRank score and offer PageRank badges to put on websites, even though it’s a completely outdated practice. The algorithm is still used in Google’s rankings but there’s no way to find official calculations publicly accessible.
The nofollow value
Link manipulation techniques were not only related to the public-facing PageRank toolbar. To address the issue of comment spam, Google together with other major search engines introduced the nofollow value of the rel attribute in 2005. This value tells search crawlers not to follow a link and prevents the link equity distribution. Before nofollow, people could flood the internet with comments mentioning their website’s address and increase the PageRank score.
This new attribute value spurred new link manipulation practices. Given that the weight that PageRank passes to linked pages depends on their number—the more links a page has, the lesser part of this page’s PageRank gets distributed—SEOs would use nofollow to direct the flow of PageRank and pass more weight via followed links.
Say, a source with the PageRank score of 5 cited 10 other pages and marked 8 of all citations nofollow. Before nofollow, this would mean that each cited page got one tenth of the referring page’s score (0.425 with regard to the damping factor). With nofollow, only 2 followed pages would each receive half of the referring page’s PageRank (2.125). Since this was a manipulative technique, the situation changed in 2009: in the same scenario, two followed pages would receive the PageRank worth of 0.425 instead of 2.125. So, PageRank is equally distributed across all links on a page but is actually passed only through the links marked by follow.
The UGC value
Compared to naturally placed relevant outbound links, comment links are most often not so trustworthy and it’s not fair to give them the same credit. In 2019, Google added a new type of value of the rel attribute specifically designed for comment links: UGC (user-generated content). Now, many blogs and forums automatically set any links put in the comments section to UGC, while nofollow is used for a broader range of purposes.
The updated algorithm
In 2004, Google published the updated PageRank patent based on a “reasonable surfer model” where they introduced the idea that links may have different values based on their potential to be clicked. For example, links placed on top of the page or links with long enough informative anchor texts are usually more visible and attractive to users. From that moment on, the likelihood of being clicked has been considered for assessing authority and serving rankings.
In 2006, Google designed a new system that selects a few trusted sources called seed pages and assesses the quality of other pages based on links coming from seed pages. It was a response to PageRank being vulnerable to manipulations, and the new formula looked like this:
∀si ≠ p ∈ P, Ri(p) = d ∑ q→p Ri(q) / qout * w(q→p)
- si are high-quality seed pages
- P represents all web pages
- qout is the the out-degree of a page q
- w is a weight of the link (set to 1 by default)
Google names The New York Times as a good example of a seed page because it is diverse enough to cover a wide range of topics that interest users and features a lot of helpful outgoing links. Pages cited by seeds are considered to be high-quality as well, and the easier it is to reach a page from a seed, the more reliable it is and the higher score it has.
According to this updated patent, the process of ranking distribution based on links goes through the following steps:
- The system receives a set of pages open to be indexed and ranked
- The system knows a set of seed pages that link out to other pages
- The system calculates how far from seeds are the analyzed pages based on the links between them
- The system determines the rankings based on the shortest distances to seed pages
This new algorithm that replaced the original PageRank formula is faster to compute because it no longer progresses from one iteration to another. And even though the original PageRank patent expired in 2018, it doesn’t mean that Google doesn’t still use it. Replying to a tweet about authority, Google’s analyst John Mueller admitted that they used PageRank “among many other signals.”
Factors that influence PageRank
As we’ve mentioned, different aspects of linking affect the PageRank score:
- The number of links
- Link attributes
- Anchor text
- The likelihood of being clicked
Let’s see how you can get the most out of the links you place and those you acquire.
Optimizing the flow of link equity
Getting backlinks to cast votes in your website’s favor is still one of the most important things to establish authority on the web. Links pass link equity to the pages they cite under certain conditions:
- When they are relevant. Relevancy is key to SEO in many aspects. Google doesn’t like it when pages are interlinked randomly. Say, your page that contains a cooking recipe gets links from pages about cars—no matter how trusted the external source, this type of link won’t boost your page’s rankings.
- When they have natural anchor text. Meaningless anchor texts like “click here” or over-optimized ones that contain target keywords are not good for establishing relevance. Anchor text should describe what the linked source is about and serve as a hint to why a user should follow the link.
- When the sites they come from are trusted. It’s important to verify the domain and page quality of sources to get backlinks from and monitor harmful links coming from low-quality sources.
- When they are crawlable. Links matter if search crawlers can find them and they are not blocked in robots.txt or by other methods.
- When they don’t trigger an error server response. Both linked and linking pages should be open for indexing. Also, not any redirect can pass the full link equity: even though Google stated that all types of redirects pass PageRank, SEOs believe it may not be the case with non-301 redirects.
- When they are followed. We’ve already discussed how the nofollow value influences the distribution of ranking power: if your page is being cited but nofollowed, it won’t bring you much ranking benefits.
- When they are visible on a page. Hidden links might lead to penalties, and the more visible links are, the better for UX and SEO. It doesn’t mean that links should stand out sharply: they should be easily distinguishable but designed with common link visualization principles.
Since PageRank assesses the authority on a page and not a site basis, internal links are as important as backlinks. With proper internal linking, you can distribute the link flow:
- The more internal links a page has, the higher its PageRank
- The more links placed on a page, the less PageRank value they pass
- Links that are easily clicked pass a higher PageRank
- Links attributed by nofollow don’t pass any PageRank
Speaking of external links, they don’t impact the PageRank score of the pages they are placed on. They do serve as relevancy signals and help Google establish connections between different sources but they don’t directly influence search engine rankings.
Alternative authority metrics
PageRank was the first authority metric to influence the web and SEO practices. It is still used among Google’s ranking signals even though it’s not clear how exactly. It’s safe to say that relevant links from high-quality sources are crucial for both rankings and establishing authority.
Other SEO metrics aimed to assess website authority also revolve around backlink quantity and quality.
For example, SE Ranking’s Domain Trust and Page Trust are aggregated scores of domain and page quality that are based on the number and quality of backlinks and referring domains. You can get an idea of any website’s quality by running its analysis in the Competitive Research tool:
So, do you need to care about PageRank?
The value of links laid the foundation for Google’s ranking formula. Regardless of changes in the PageRank algorithm and its importance, links have always been and probably will be a major ranking factor. In a 2016 Q&A, Google representatives revealed that content and links are the top two factors influencing the rankings, and in the 2020 Twitter discussion we’ve already mentioned, John Mueller admitted that PageRank still matters for rankings.
What it means is that you need to prioritize working on a safe backlink profile and polish your internal linking once in a while. As Moz’s Rand Fishkin puts it, no matter how old PageRank theories are, it won’t hurt checking your links and clearing out rubbish ones. Make sure that your website’s structure makes it easy to navigate through different pages and establish backlink relations with authoritative sources that are relevant to the topics you’re targeting.