Resume

Interests

Areas of research: social network topography, online activism, information and communications technologies, international relations, political economy, telemetry

Practical Experience: data scraping, distributed computing, semantic web technologies, SEO, content management, database management, data interpreters

Experience

Intern, Berkman Center for Internet and Society, Harvard Law School, Cambridge, MA — 2010

I was tasked with two things at the Berkman Center – build a relatively straightforward Rails app for use as an education tool (teach kids about fair use and copyright), and collecting Twitter data. I can’t talk much about the actual data collected, as the paper is still in process, but I can say this: we were able to grab the bulk of Tweets from a specific country, over the entire existence of Twitter, and do analytics on this data set, all without any special access from Twitter. We were able to play disasters in reverse, and see cool correlations with data that have never had a parallel, as far as I’ve seen. Hopefully they find the right place to publish this information someday.

Managing Director, Web Ecology Project, Cambridge, MA — 2010

With his departure from Cambridge and to California, Tim Hwang put Sam Gilbert and I in charge of Web Ecology in his stead. Currently, the Web Ecology Project works on a few fronts – generating interesting research papers not necessarily under the jurisdiction of traditional academics, creating useful programs, applications, and frameworks for better research into online phenomena, and meeting face to face for rapid prototyping of projects in our quarterly meetings, traditionally either in Boston or New York.

Visiting Researcher, Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY — 2010

Create demonstrations showcasing semantic web technologies and employing predominantly US Government data sets available at http://data.gov. Created visualization templates in Processing, Flex, and adapted templates using the Google Visualization API. Additionally, created tutorials and walkthroughs on how any user could adapt SPARQL queries and templates to create new visualizations as part of an outreach component of the work done at TWC.


Intern, Rocketboom Inc; New York, NY — 2009

Responsible for development on http://KnowYourMeme.com, developing screen scraping and data collection library for http://mag.ma/. Collaborated with Jamie Wilkinson and others on developing appropriate back-end for http://mag.ma, which is an online video aggregator site, collecting information about online videos, then ostensibly placing them in a chart system to determine popularity.


Interactive Developer, Instrument Marketing; Portland, OR — 2007-2010

A wide range of duties while working at Instrument Marketing have been: Content Management, development of CSV parsers for proprietary CMS software, implementation of various reservation systems. Daily work included updating/templating/managing various client websites (viewable at http://www.weareinstrument.com) as well as creating in-house tools for CMS.

Education

Bennington College, Bennington, VT — BA, 2010

Thesis:

Title:”#iranelection: Are the dynamics and structure of Web 2.0 is shifting global policy in 140 characters or less?”


Abstract: Using a data set of 766,263 tweets captured between June 12th, 2009, and October 25th, 2009, matching the Twitter category (or “hashtag”) of #iranElection, what new insights can be gained? In previous studies surrounding online activism, qualitative, systemic, or at-a-distance analysis tends to predominate. In a Web 2.0 environment, and with new programming frameworks that stress rapid prototyping, how can we gain novel insights into the nature of online activism, citizen journalism, and the role of the internet in our society and politics at large?

Publications

(2010) #iranElection: quantifying online activism. In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, April 26-27th, 2010, Raleigh, NC: US. (In Press)

(2010) ChatRoulette: An Initial Survey Web Ecology Project, March 1st, 2010

Projects

TwitterGrab – A distributed computing network for recording live Twitter Search API data. A user inputs a term, a length of time to record data for that term, and they are sent an e-mail with a link to a zip file containing their raw data set of Users and Tweets, as well as some preliminary analysis conducted on the set to give a general topography of the data set. By using the distributed approach, the program can quickly scale up to large scrape requests, and can theoretically be used to quantitatively measure Twitter’s social topography. This program has been developed for the Web Ecology Project, a group of programmers, researchers, and other collaborators loosely affiliated with Harvard University’s Berkman Center. Currently, the WEP are allocating permanent server space at Berkman in order to allow researchers to use the tool.

GPS Balloon – A physical computing project which consisted of an Arduino project board controlling a cell phone, taking readings from a GPS sensor, reading temperature sensor data, and committing this all to SD card memory. A camera was attached to the bottom of the project (and was controlled by the Arduino) so that aerial images could be retrieved upon collection. The cell phone would then send a message to Twitter, in order to transmit coordinates so that the project could be retrieved and SD card read. The total cost for this project was < $200, and once retrieved, the project could be redeployed immediately.

Wikipedia Network Maps – This is a program that scrapes data from Wikipedia in order to create visual maps of linkages between articles. Essentially, the program seeds a random article, then recurses through an entire connected component. When it runs out of nodes, it seeds another random article, and continues this process until it has mapped out an entire Wikipedia network. It was never tested for scalability to large Wikipedia networks such as the English/Japanese/French/German/Spanish subdomains, but does accurately plot Wikipedia of article number n < 5,000. In general, the idea was to compare 14 small Wikipedia networks to draw conclusions on how users organize data online.

Skills

Programming Languages: Fluent: Ruby

Proficient: C, Java, Javascript

Conversational: Actionscript, Assembly, C#, Objective C, Python

Technical knowledge:

Protocols: TCP/IP (can build implementations), SMTP (general work)

Markup/Object Notation/Templating languages: HTML, XML, JSON, CSS, Textile, RDF

Query Languages: SPARQL, SQL

Understanding of RPC systems, Hadoop, Amazon EC2 and S3, MapReduce algorithm

Media:

New Scientist: Exploring the Network without the Guesswork, 10 May 2010, issue 2759

Back