How a 100-Year-Old Math Feud Powers Google and Your Phone
The Randomness in Your Pocket
Did you ever glance around in awe and wonder how your phone manages to foresee what you’re going to type next? Or how Google searches billions of web pages to produce the perfect one in a split second? The answer lies not in some Silicon Valley breakthrough of recent years, but in a bitter struggle between two Russian mathematicians a hundred years ago, a war that included free will, poetry, and the nature of randomness itself.
At the center of it all is a seemingly simple concept known as a Markov Chain, a tool that has risen silently to become one of the most significant algorithms in our contemporary world.
The Tsar, The Atheist, and a Theory of Everything
The tale begins during the early 20th century in Russia, a period of great political and social unrest. On one side of this process, we find Pavel Nekrasov, a leading Moscow mathematician and member of the upper class. He wanted to devise a mathematical basis for the Tsarist social hierarchy. His proof depended on the Law of Large Numbers, that the average outcome of lots of independent random events (such as coin tosses) will converge on a calculable value. Nekrasov asserted that human free will was comprised of millions of such autonomous decisions, quite obviously resulting in a solid, God-established society.
Sitting across from him was Andrey Markov, a St. Petersburg school modernist. Markov was outraged at what he took to be the politicization of mathematics to further a cause. To make the case of Nekrasov untenable, he needed to prove that statistical regularity could be seen even if events were dependent on each other.
His stroke of genius was to look at sequences where the next one depends on the previous one. He used famously the vowels and consonants of Alexander Pushkin’s poem Eugene Onegin to show that there was a predictable pattern. With this, he had not only provided a profound political commentary but had invented the Markov Chain.
What is a Markov Chain? (It’s Simpler Than It Sounds)
A Markov Chain is a model for a sequence of events where the probability of the next event is conditional only on the present state. This “memoryless” character is known as the Markov Property.
Think about the weather. The chance of it raining tomorrow is heavily dependent on whether it rains today. But not necessarily on the fact that it rained last week too. All the history is boiled down to the current situation (“it’s raining now”).
A Markov Chain has two crucial elements:
- States: The possible conditions or situations (e.g., ‘sunny’ or ‘rainy’, a vowel or a consonant, being on a specific webpage).
- Transitions: Transitions from one state to another with a specific probability (e.g., an 87% likelihood of a vowel after a consonant in Russian).
After running for a long time, these chains often settle into an equilibrium, a stationary distribution, where the probability of being in any given state becomes constant. This final probability is the secret sauce behind many technologies.
From Playing Cards to Nuclear Bombs
Years afterward, while he was employed on the Manhattan Project, physicist Stanislaw Ulam faced an insurmountable task: calculating the behavior of trillions of neutrons in a nuclear core. The problem was too complex to solve head-on. Recovering from illness at the time, idly playing solitaire, he started thinking about winning and the probabilities of doing so. He understood that he was unable to calculate it and suddenly had an inspiration: why not play 100 games randomly and count the wins?
This was the invention of the Monte Carlo Method, an approach to arriving at rough solutions to complex problems by using random sampling. Instead of making a measurement on all the trees in the forest, you make it on 100 randomly chosen ones to get an accurate estimate of the average. This method, based on a card game, was crucial in simulating the neutron chain reactions employed in the development of the atomic bomb.
How a “Random Surfer” Built a Trillion-Dollar Company
In the late 1990s, the web was a chaotic, unstructured mess. Search engines did exist, but they sorted websites by how often a keyword occurred, which was a simple case to game.
Two Stanford students, Larry Page and Sergey Brin, envisioned something different. They envisioned the web as a gigantic Markov Chain.
- The states are the pages.
- The transitions are the hyperlinks you select to transition from one page to the other.
They created a “random surfer” that continues to click on links ad infinitum. The PageRank of a page is simply the long-term probability that this random surfer will land on this page. A link from a well-known page is a more powerful “vote” than one from an unknown one. The result was Google Search, an algorithm that finally brought the web’s chaos into order.
Try the Concepts Out Yourself
Words are only so fine. To actually see these ideas in action, it is useful to see them play out. This interactive page lets you mess around with the underlying ideas discussed in this article:
Interactive Learning Hub: The Hidden Hand of Randomness>
Here on this page, you can:
- Simulate flipping thousands of coins to see the Law of Large Numbers work.
- Train your own Markov Chain text predictor.
- Simulate a Monte Carlo run of a nuclear core.
- Watch the PageRank algorithm rank a mini-internet in action.
Markov’s Ghost in the Machine: Modern AI
Markov’s work is a cornerstone of modern Artificial Intelligence. The predictive text on your mobile is a literal application of Markov Chains, which forecast the next word based on the last one or two.
But Large Language Models like ChatGPT today are not like that. They are founded upon a more advanced model structure called a Transformer. One of the developments is the attention mechanism, which allows the model to look back at all earlier words in a chunk of text and decide how important they are, giving it some form of long-term memory that the “memoryless” Markov property cannot.
This has caused new controversy. One of the growing concerns is “model collapse,” where AIs trained on AI-written text start to lose touch with the variety of human language and generate repetitive and surreal outputs.
An Element of Truth
From a political battle in the Tsarist Russia to computers in one’s pocket, the past of the Markov Chain is a moving illustration of how a single abstract idea can resonate down the ages. It illustrates that the universe is replete with hidden patterns and that understanding the mathematics is the explanation for the complex, network-like systems around us.
Credits
This article was heavily inspired by the great description in the video by Veritasium: The Math Behind Google’s Trillion Dollar Algorithm.