How To Use R to Analyse and Plot Your Twitter Use

Here is a little how-to if you want to use R to analyse Twitter.  This is the first of two posts: this one talks about the How, the second will talk about the Why.  

If you follow all the code you should be able to produce plots like this:

As with all analytic projects its split into four different aspects: 1. getting the data; 2. transformations; 3. analysing; 4. plotting.

All the code is available on my first public github project:

https://github.com/MarkEdmondson1234/r-twitter-api-ggplot2

I did this project to help answer an idea: can I tell by my Twitter when I changed jobs or moved country?

I have the feeling the more I am doing SEO, the more I rely on Twitter as an information source; whereas for Analytics its more independent research that takes place more on StackOverflow and Github. Hopefully this project can see if this is valid.

1. Getting the data

R makes getting tweets easy via the twitteR package.  You need to install that, register your app with Twitter, then authenticate to get access to the Twitter API.

Another alternative to using the API is to use Twitter's data export, which will then let you go beyond the 3200 limit in the API. This gives you a csv which you can load into R using read.csv()

2. Transforming the data

For my purposes, I needed to read the timestamps of the tweets, and put them into early, morning, afternoon and evening buckets, so I could then plot the data.  I also created a few aggregates of the data, to suit what I needed to plot, and these dataframes I outputted from my function in a list.

Again, as with most analytics projects, this section represents most of the work, with to and fro happening as I tweaked the data I wanted in the chart.  Some tip I've picked up is to try and do these data transformations in a function taking the raw data as an input and outputting your processed data, as it makes it easier to repeat for different data inputs.

3. Analysing the data

This will be covered in the second post, and usually is the point of the whole exercise - it only takes about 10% of time on the project, but is the most important.

4. Plotting the data

This part evolves as you go to and fro from steps 2-3, but what I ended up with where these functions below.

theme_mark() is a custom ggplot2 theme you can use if you want the plots to look exactly the same as above, or at the very least show how to customise ggplot2 to your own fonts/colours.  It also uses choosePalette() and installFonts(). "mrMustard" is my name for the colour scheme chosen.

I use two layers in the plot - one is the area plot to show the total time spent per Day Part, the second is a smoother line to help pick out the trend better for each Day Part.

plotTweetsDP() takes as input the tweetTD (weekly) or tweetTDm (monthly) dataframes, and plots the daypart dataframe produced by the transformations above.  The timeAxis paramter expects "ym" (yearWeek) or "ym" (yearMonth) which it uses to make the x-axis be more suited to each.

plotLinksTweets() is the same, but works on the tweetLinks dataframe.


I hope this is of some use to someone, let me know in the comments!  Also any ideas on where to go from here - at the moment I'm working through some text mining packages to try and get something useful out of those. 

Again the full project code is available on Github here: https://github.com/MarkEdmondson1234/r-twitter-api-ggplot2