Let’s consider a group of users who are navigating through a connected network of web pages. At any given moment, each user randomly selects a link to visit. Today, we will be focusing on how to rank these pages based on a specific model called the Markov Chain. Specifically, we will be exploring the Page Rank algorithm, which is the backbone of Google Search.

🖼 Overview of a Markov Chain

🧠 What is Markov Chain?

In layman's terms, A Markov Chain is a mathematical model that describes a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. It is a way to predict the likelihood of certain events happening based on the events that have happened in the past. For example, if you are trying to predict the weather, a Markov Chain could be used to determine the probability of it raining tomorrow based on whether it rained today.

The name “Markov Chain” comes from the Russian mathematician Andrey Markov, who first described them in the early 1900s.

🖼 Basic diagram of Markov Chain

Before we delve deeper into the algorithm, it is important to understand the properties of a Markov Chain. In the context of a web network, individual web pages can be thought of as states in a Markov Chain, represented by circles labelled A, B, and C. The transition from one state to another is determined by a probability, represented by P(i, j). For example, in this model, a user can move from state C to state A with a transition probability of P(C, A) = 1, and the same applies to other states.

In the context of a web network, the transition probability between states is determined by the number of outgoing links from a particular web page. For example, if a web page has 2 out-going links (like page B), the probability of transitioning from that page is 1/2. These probabilities can be adjusted using a formula that takes into account the likelihood of a user choosing an outgoing link from their current state.

You can gain a more thorough understanding of the Markov Chain and its underlying principles by reading a blog post that explains it in great detail. — Markov Chain by Brilliant

So the main problem here we are trying to focus on is how we can rank these pages. And here is how the Page Rank algorithm comes into the picture!

⭐ The Page Rank Algorithm

The Page Rank algorithm is a method for determining the importance or “rank” of web pages in a search engine’s index. It was developed by Larry Page and Sergey Brin while they were PhD students at Stanford University.

The algorithm works by treating each page on the web as a node in a directed graph, with links between pages serving as edges. The PageRank of a page is determined by the number and quality of the pages that link to it, with the idea being that a page with many high-quality incoming links is more likely to be important than a page with few or low-quality links.

So how come Page Rank is using the Markov Chain to determine the ranks of pages in a web network?

🖼 Page Rank algorithm Trajectory

The PageRank algorithm uses a Markov Chain model to determine the importance or “rank” of web pages in a search engine’s index. In this model, each web page is treated as a state in a Markov Chain, and the links between pages are treated as transitions between states.

The algorithm calculates the probability of transitioning from one page to another based on the number of links pointing to the destination page, and also the probability of jumping to it from other pages. This probability is used to compute a score for each page, known as the PageRank score.

To compute the PageRank score, the algorithm uses an iterative approach which starts with an initial estimate of the scores and repeatedly updates the scores based on the scores of the pages that link to it. This process continues until the scores converge to a stable set of values. This stability is called Stationary Distribution.

🖼 Stationary Distribution Convergence Graph

Once the algorithm reaches a stationary distribution, the probability of each state will not change and will remain constant, even if users continue to navigate from one page to another. For example, in the above image, imagine there are 4 states in a Markov chain and the users navigate through every state. After some point in time, the probability value will get stable and won’t change at all. So how we might solve the problem of Stationary Distribution?

By the way, some Markov chains never converge. But how is that possible?

Before getting into it, Let’s first understand what are Reducible and Irreducible Markov Chains!

A Markov chain in which every state can be reached from every other state is called an irreducible Markov chain. If a Markov chain is not irreducible, but absorbable, the sequences of microscopic states may be trapped in some independent closed states and never escape from such undesirable states. Such types of Markov chains are called Reducible Markov Chains.

In simple terms, in Reducible Markov Chains, users always end up staying in the same state rather than moving to another. However, with Irreducible Markov Chains, users traverse from one state to another, resulting in a type of unique distribution for all linked states in the network.

⛓ Periodic Markov Chains

If we try to adjust the beginning distribution of states, which is not the same for everyone and then apply transitions, the distribution will try to fluctuate between the two extremes rather than converge. Periodic Markov Chains are the name given to this characteristic of Markov chains. However, it must be irreducible, and the user must visit the states at regular intervals.

🖼 Periodic Markov Change Example

For example, if we have two states with initial distributions of 80 and 20, and we apply a transition between them, the distribution would eventually fluctuate rather than converge due to the unequal distribution between the two states.

💡 Ergodic Theorem

If no such period exists for each state, such Markov chains are referred to as Aperiodic Markov Chains. However, aperiodic and irreducible chains have an extraordinary characteristic that, regardless of their starting distribution we keep on continuously navigating from one state to another. Markov Chains will always converge.

In a web network setting. Whatever the initial distribution of visitors on a web page is, as long as the web graph representing the Markov chain is irreducible and aperiodic. Long-term user distributions will converge to stationary distributions, providing a good approach to ranking websites. Then it doesn’t matter if the users started on a specific web page or are evenly distributed across web pages the Markov Chain will always guarantee a unique stationary distribution. This is called as Ergodic Theorem and it forms the base of most of the applications.

📌 Problems in Markov Chains

In their implementation, the page rank algorithm serves Markov Chains excellently. However, we believe that there are just a few chains that potentially cause problems in the algorithm. Consider a chain in which there is a state with no outgoing connections. Users who find themselves in such a condition will thereafter be unable to modify their state.

This problem has a very easy solution. Assume there is a large chain of web networks and a user moves from one state to another and ends up in a state with no outbound links. What is the user going to do next? The user will just choose a random state, and the distribution will be changed accordingly.

Technology, as well as our thoughts, is always improving. And we cannot foresee if such algorithms will be improved to be replaced by another one as Artificial Intelligence advances. However, as Computer Science Students and Developers, we should be aware that such algorithms serve as the foundation for the majority of today’s advanced mechanisms.

Anyway, that’s everything for now. I hope you enjoyed the article. Please leave any feedback or recommendations in the comments section below, and if you like my material, please check out my other posts. You are welcome to mail me if you want me to cover any tech-related topic. Thank you very much!

Page Rank: A Billion Dollar Code

🧠 What is Markov Chain?

⭐ The Page Rank Algorithm

⛓ Periodic Markov Chains

💡 Ergodic Theorem

📌 Problems in Markov Chains