Modeling a Re-tweet Influence Map
Modeling a Retweet Influence Map
Academics, particularly social scientists from various fields (international relations, political economy, sociology, anthropology, ...) have become quite interested in the effect the internet is having on politics and society. Part of answering that question is arriving at a method to employ in understanding the situation. In many previous cases, the method employed has generally been qualitative: with the WTO protests of 1999 (which, it has been argued, were fomented by internet communication surrounding the protests), researchers from the University of Washington conducted extensive interviews with a large number of the activists, protestors, and other “leaders” in the situation(1). Through these interviews, researchers were able to cull out basic understandings about the situation.
As recent as late last year, academics were still applying these same methods to studies concerning the 2009 Iran Election, due to either the lack of a programming background or an understanding of contemporary tools, were unable to use any large data set to understand the context, as well as the ecosystem, in which any particular message was created in. Instead, qualitative analysis on the assumed “leaderboard” users on Twitter for the #iranElection topic seemed to suffice(2) for a cursory, generalized and qualitative study.
Using APIs, however, we can query large amounts of data, and, given the right programming framework, query access, and mechanical capacity to conduct the research, create applications that can leverage massive amounts of data for novel insights on a given topic. More specifically, this iteration of creating a program to accomplish this task has yielded approximately 766,200 tweets that each include the “hashtag” #iranElection. Visualizations of network maps, particularly visualizations of highly-connected, many-edged network graphs, can quickly become a data intensive operation. Whereas many network pruning algorithms seem to rely on keeping only a small number of “degrees” open (that is, you can see every node one degree away, but no further than that, such as with the old WK.com site), there is a different route: since the data is very time-based, temporal pruning should allow for a scalable and structurally simple implementation.
As Figure 1 hopefully shows, there’s a pretty clear mapping of node data to time; In the actual database, there are objects, Retweet nodes, that consist of four basic attributes: the time it was published on twitter, the start_node (or person being re-tweeted in the message), the end_node (the person actually doing the re-tweeting), and the edge_id (the id of the message in which this occurs). These are then organized into various sets and subsets of GraphML files, all following a similar syntax of:
(Term)(Minute)(Hour)(Date).graphml
Each of these files may, or may not include the Minute, Hour, or Date. This way, it is possible to switch between three different “zoom” levels of the data, based on whether we are just focusing on the minute-to-minute data, or the full day-to-day date. Similarly, you could always just adopt your own structure for querying a given range of data directly from the database. A possible implementation could be as such: the user selects a given T-Step, or a moment of data corresponding to a set of points (Fig 3 highlights this pruning scheme). At this point, the program eager-loads the T-Step previous and the T-Step following this particular step. The program, in order to describe directedness and time-based changes, highlights the direction and comparable volume of each edge towards a node, and as the user interacts with a time-scrubbing mechanism (Shown as a big paddle on the bottom in the rough sketch shown in Figure 2), the program loads up the appropriate sets of data.
If you’re interested in talking more about the project, feel free to contact me via email or phone, and we can get going. The data set is very fresh, highly organized, and really interesting from almost any perspective. Building a proper visualization scheme to complement the actual data will prove exceedingly fruitful.
References
Burns, Alex and Eltham, Ben (2009) Twitter Free Iran: an Evaluation of Twitter’s Role in Public Diplomacy and Information Operations in Iran’s 2009 Election Crisis. In: Communications Policy & Research Forum 2009, 19th-20th November 2009, University of Technology, Sydney.
Smythe, Elizabeth, and Peter J. Smith. “New Technologies and Networks of Resistance.” Cyber-Diplomacy: Managing Foreign Policy in the Twenty-First Century. (2002): 48-82. Print.
WTO History Project. 16 Jul 2003. University of Washington, Web. 18 Nov 2009.
No comments yet. Add one?
Back