State Internet Statistics versus Twitter Geocodes
Added February 2010
Over the winter of 2010, I was a visiting researcher at RPI’s Tetherless World Constellation, which is a working group that deals with Semantic Web Technologies. Organized fairly recently by the “project leader” James Hendler, the specific project I was been working on was the data-gov wiki, or the codification/organization/research documentation of a Semantic Triple Store based off of the Data.gov data sets.
One of the visualizations that I created and thoroughly documented in a tutorial approach was the “State Internet Statistics versus Twitter Geocodes” Processing visualization.
I introduced Processing to the team, arguing that due to the combination of it both being a Java-based framework as well as being completely primitive in terms of its visual components and functions, we could rapidly prototype our own visualizations to fit data sets we had on hand.
As an example, I tackled a problem that seemed to plague a few of the example demos on the wiki: many were using a Google visualization as their preferred format, yet there were other sources of data that could be combined to add multiple dimensions of understanding for a given visualization. For example, Dominic Difranzo’s Castnet demo, which is highly sophisticated, could have easily been done on a simplified format like the one below, which combines the look and feel of the state-by-state comparison coloring of one Google visualization with the actual point-by-point geo-location system employed by Google maps, the two of which do not work together.
In the demo below, we’re looking at the percent of the state population that has access to broadband internet. In this case, I wanted to add another dimension to the data by overlaying tweets. As you can read further in the demo, the tweets were roughly 1,800 geo-coded tweets captured within the weeks of creating this demo. When placed on the map alongside the internet access data, we can begin to see not only the holistic totals per state and by extension per region, we can also see the diffusion of actual use: most people using the internet, at least for twitter, are clearly located in small pockets. Although a sort of naïve example, the idea can clearly be extended into other situations where the effect is far more profound. Next step: abstract this code out so that any variable number of data sets can be drawn into a graph, and make it clean so people can use this model over and over.
Back