140kit Experimentation
Added April 2011
I have been working feverishly on 140kit via my own personal time and my work time. As a result, all sorts of goodies are making their way in to the package that could be released back into the wild some day. One of the things that may come out of the recent work, but regardless is too cool not to show, is a quick visualization of some graph data I am generating from 140kit.
Brief overview:
With 140kit analytics, we have always wanted to show the power of network theory. As such, from the very beginning, we have supported GraphML translations of re-tweet networks. Recently, the patron saint of network theory, Gephi started supporting a neat feature in Gexf, another file format, so we also implemented that. With Gexf files from 140kit’s “retweet_graph” analytic, in Gephi you can actually use a timeline scrubber to watch the communication over time. Its slick as all hell, and it’s going to be available soon.
But anyways, on to this picture I drew today. For some recent work, I have needed to, given a user account, scrape all the tweets of that accounts followers, and use that as the dataset. In short, I have 190 followers, which in turn have posted at least 181,350 times (that’s how many Tweets I have in my dataset, but its limited to the 3,200 pagination limit we know about.
ANOTHER cool thing that our updated re-tweet graph analytic does is that it packs in all sorts of fun metadata – statuses_count, followers_count, friends_count are the easy numerics, so we threw them in already, but I imagine we could add more like time zone (color the nodes by the time zone they live in, anyone?).
So, I created a new process that collects data, given a user name, from all that user’s followers, tweets and user data combined, and then stores it as a new dataset, which can in turn have analytics run on them. In practice, with retweet_graphs, we’re going to see all the re-tweets and mentions my followers have made, since most people have posted under 3,200 times in their account lifetime. It’s awesome. The nodes are colored by statuses count (the higher, the more times they have tweeted), sized by in-degree (how many people in this picture re-tweeted/mentioned them, and all the nodes that they interact with that aren’t following me as well are, unfortunately, due to scaling, left with the basic data without the statuses_count, followers_count, friends_count metadata packed in (or else we could easily start making wayyyy too many calls to the database). But, with some more work, this could be something everyone could do with 20 minutes in front of Gephi and a screen name to throw into 140kit. This is one of the many reasons we need to keep holding faith in doing Twitter mining. So, without any more explanation, here’s the initial picture.
Scalable Vector Version in PDF
Back