The Wikipedia Problem

in #writing7 years ago

You know all those blue links that fill a Wikipedia article? Apparently, they all connect each other. Researcher Stephen Dolan determined that every article connects to another by, at most, six articles. The “six degrees of Wikipedia”, as it is known, is like the modern version of a Bacon number, a theory that every person is connected to Kevin Bacon through six people.
Online games began to take advantage of this interesting phenomenon, tasking a player with navigating through Wikipedia’s 5 million articles in the fewest number of clicks. I never thought of myself as a cheater, but in this situation, I thought I could make an exception. While my friends mindlessly clicked, attempting to stumble upon the correct article, the programmer in me forced me to find some way to automatically find the shortest path. I began mapping out the program in my mind: how to gather each link from the article, how to navigate the 5 million articles to reach one single destination. Which is when I realized my program would not work.

The problem with opening every link on a Wikipedia article is you would have to open every article six times. For example, let’s just say each article has an average of 10 links. The program would have to open all those 10 links, then open all the 10 links on the 10 new articles, and so on. This would cause the computer to process around 1 million articles. This is when the program shifted from a link extractor to a searching program. I had to ask myself how I could narrow down an hour-long time frame to process 1 million articles to something a computer could handle instantly.

My first thought was to remove any duplicate articles, which worked, but only shrunk the processing time by a small fraction, nowhere near my target. Next, I thought about different ways to sift through the articles. I rewrote the program to start at the beginning and at the same time work backwards from the destination, so when two articles matched, I would know I found a path. Unfortunately, this method did not shrink the processing time at all because the program was still looking through the same amount of links, just in a different order. After these unsuccessful attempts, I drew out what the mass of articles looked like. It was almost like a kite; the bottom point was the beginning article and the top was the final article, but the middle was an impossibly large number of articles.

That is when I decided to rewrite the program, but this time start with an article from the beginning and compare it to an article in the middle. By comparing to the middle, I was instantly able to eliminate half the articles that didn’t lead anywhere. At first I did not realize how efficient the program had become until I ran it a few times. Every time an article matched with one in the middle, the options cut in half. The hour-long time frame for the program to process the shortest path towards the destination transformed into a few seconds. The next day when my friends and I played the six degrees of Wikipedia separation game, I was undefeated.

Coin Marketplace

STEEM 0.20
TRX 0.12
JST 0.028
BTC 61423.30
ETH 3383.10
USDT 1.00
SBD 2.54