Wikipedia Network Maps

Added November 2008

I have been working on a really exciting project with Ian Pearce and Max Darham that attempts to visualize Wikipedia.

Our idea was as such:

How are articles on Wikipedia organically organized? For instance, if you read the article on Bicycles, there will likely be links to the inventor of the modern bicycle within the text of the article, and there will likely be links to rubber, steel, and all the other parts; the countries that were important in making advances in bicycle technology, and so forth. If we treat an article as a vertex and the link as a line or connection to another article or vertex, could we make a program that visually maps this data? We basically wanted to make a prototype datamine project and then use our information to draw specific conclusions about how Wikipedia looks, works, is organized, and if that bears any similarities to any other known information networks maps or types.

Our plan was as such:

We wanted to build a ruby program that visited a random article. You can do this for the English language by using http://en.wikipedia.org/wiki/Special:Random.

Then, we wanted to have the program store all the links from the page, remember the page and all of its links, then recurse through and look at all the links that were on that page, then go through those linked pages, and so forth until we had a huge array of all the articles and the articles that were linked to within a given article. Then, we wrote a little method that turned that information into a Python list, and then used Sage to create visuals of the data. Below are a few examples as of right now of a few graphs we have generated so far; these are the largest connect components, or the largest group of connected articles; there are others, but they don’t link to anything, and nothing links to them, so they are kind of boring. In the next few weeks, we will wrap up the project and have more complete graphs and conclusions to boot. Check back later.

Actual Sage Code Used to make it all work:

(Click here to see the sage)

(Click here to see the Ruby)

Back