How I made GA Effect - creating an online statistics dashboard using R

GA Effect is a webapp that uses Bayesian structural time-series to judge if events happening in your Google Analytics account are statistically significant.  Its been well received on Twitter and how to use it is detailed in this guest post on Online Behaviour, but this blog will be about how to build your own or similar.

Update 18th March: I've made a package that holds a lot of the functions below, shinyga.  That may be easiest to work with.

Screen Shot 2015-02-19 at 205931png

What R can do

Now is a golden time for the R community, as it gains popularity outside of its traditional academic background and hits business.  Microsoft has recently bought Revolution Analytics, an enterprise solution of R so we can expect a lot more integration with them soon, such as the machine learning in their Azure platform.

Meanwhile RStudio are releasing more and more packages that make it quicker and easier to create interactive graphics, with tools for connecting and reshaping data and then plotting using attractive JavaScript visualisation libraries or native interactive R plots.  GA Effect is also being hosted using ShinyApps.io, an R server solution that enables you to publish straight from your console, or you can run your own server using Shiny Server.  

Packages Used

For the GA Effect app, the key components were these R packages:

Putting them together

Web Interaction

First off, using RStudio makes this all a lot easier as they have a lot of integration with their products.

ShinyDashboard is a custom theme of the more general Shiny.  As detailed in the getting started guide, creating a blank webpage dashboard with shinydashboard take 8 lines of R code.  You can test or run everything locally first before publishing to the web via the “Publish” button at the top.  

Probably the most difficult concept to get around is the reactive programming functions in a Shiny app.  This is effectively how the interaction occurs, and sets up live relationships between inputs from your UX script (always called ui.R) and outputs from your server side scripts (called server.r).  These are your effective front-end and back-end in a traditional web environment.  The Shiny packages takes your R code and changes it into HTML5 and JavaScript. You can also import JavaScript of your own if you need it to cover what Shiny can’t.

The Shiny code then creates the UI for the app, and creates reactive versions of the datatables needed for the plots.

Google Authentication

The Google authentication flow uses OAuth2 and could be used for any Google API in the console, such as BigQuery, Gmail, Google Drive etc.  I include the code used for the authentication dance below so you can use it in your own apps:

Fetching Google Analytics Data

Once a user has authenticated with Google, the user token is then passed to rga() to fetch the GA data, according to which metric and segment the user has selected. 

This is done reactively, so each time you update the options a new data fetch to the API is made.  Shiny apps are on a per user basis and work in RAM, so the data is forgotten once the app closes down.

Doing the Statistics

You can now manipulate the data however you wish.  I put it through the CausalImpact package as that was the application goal, but you have a wealth of other R packages that could be used such as machine learning, text analysis, and all the other statistical packages available in the R universe.  It really is only limited by your imagination. 

Here is a link to the CausalImpact paper, if you really want to get in-depth with the methods used.  It includes some nice examples of predicting the impact of search campaign clicks.

Here is how CausalImpact was implemented as a function in GA Effect:

Plotting

dygraphs() is an R package that takes R input and outputs the JavaScript needed to display it in your browser, and as its made by RStudio they also made it compatible with Shiny.  It is an application of HTMLwidgets, which lets you take any JavaScript library and make it compatible with R code.  Here is an example of how the main result graph was generated:

Publishing

I’ve been testing the alpha of shinyapps.io for a year now, but it is just this month (Feb 2015) coming out of beta.  If you have an account, then publishing your app is as simple as pushing “Publish” button above your script, where it appears at a public URL.  With other paid plans, you can limit access to authenticated users only.

Next steps

This app only took me 3 days with my baby daughter on my lap during a sick weekend, so I’m sure you can come up with similar given time and experience.  The components are all there now to make some seriously great apps for analytics.  If you make something do please let me know!

35 responses
I see the future... Thanks a lot for this post. Amazing work, hope you keep sharing
This is so cool, Thanks for the post and the app.
First, thanks for sharing Mark, it is exactly what I was looking for. I've just tried the webapp with one of my Google analytics view but it doesn't seem to work. I can't select a segment which causes the following error: "undefined columns selected". Do you have any idea why ?
Dear Issam, ahh yes, I have come across this. Unfortunetly the JSON parser for the API doesn't cope well with funny characters in some cases - do you have something like a "\" or a "&" in your segment name? The quickest way to get around it is to rename your segment. Let me know if this works. Yours sincerely, Mark
Well, it's highly probable since I'm french and god knows we have lot of strange characters ^^ Anyway, I've now tried with a basic newly created website paying attention on every character on GA but I still have the same issue. Here is a link for the screenshot of my error : http://i60.tinypic.com/20rqof9.png
Dear Issam, as segments are tied to your user and not the website you are probably still trying to load a segment with a "dodgy" name. Are you able to find a segment where this may be the case? If you edit the segment name in the GA interface then reload the app it should then work. Sorry this is a hack solution, but until the JSON parser is better its the quickest way to get you up and running.
I'm trying to recreate the problem but nothing works, so if you do find the offending name let me know!
To those who may be interested, I've released a package that offers a quick start to make apps like this: https://github.com/MarkEdmondson1234/shinyga
Hi mark, this looks very cool. I'm having the same issue as Issam above, in that I get that the undefined columns selected error message - despite the fact that I have no custom segments. This means I am only using the standard pre-packaged GA segments and I am still getting this error message.
Dear Ben (and Issam if you see this), thanks for commenting. I have just pushed up a patch that defaults to a standard set of GA segments if it can't find yours, and added some error logging to try and catch this pesky bug. Trouble is I can't replicate it, so its a bit mysterious. If you could try it in about 10 mins from this post and let me know if it works, that would be very helpful :) Yours sincerely, Mark
Hi Mark, Apologies about the delay. I tried using it again but I'm not sure the patch worked - I still got the same 'undefined columns selected' in the segments section, rather than it defaulting to the standard segments. I appreciate your work trying to get this working, it looks like a very useful app.
Dear Ben, curious bug! Does the GA Rollup app work for you? Http://mark.shinyapps.io/ga-rollup/
Hi Ben and Issam, I think I got it, let me know. I managed to log the error and could see the fix - pretty sure it is squished now but only you can tell me for sure :)
Hi Mark (and Ben), sorry for the very late response. I tried again this morning but I still get the same issue from the "select segment" panel. Please Ben, confirm if you have the time to try. I am not sure how you extract your data from GA, but if you get a JSON file somewhere, try to validate one sample with tools like this one : http://jsonformatter.curiousconcept.com/ Keep going Mark, I'm sure it's worth it :)
Dammit ;) Ok, thanks, will take another look.
Hi Ben and Issam, I'm 99% sure I have it now as I found an old Google account I had with the same problem. It seems Google Analytics changed the format to include "created" and "updated" at some point, and my script was looking for those columns - even though I don't use them in the app itself. I removed them and it worked for my account, give it a go with yours.
Hi Mark, I just checked again - it looks like you've got it! Well done, its all working perfectly for me now. Thanks very much One final request - we're looking at using this with some confidential client data. I know you stated that the data is forgotten once the app closes down, but is there some documentation you can point me to describing the security/authorisation that this app uses to access GA data? Thanks again for your great work on getting this working, this app looks very useful!
Great Ben, glad the bug got quashed. I updated shinyga() with the fix too. I know the issues with client data privacy from by agency work, so can give you some information, maybe it will be enough. The app itself is running on shinyapps.io that runs on SSL encryption, so no man in the middle data is likely. Check out their website for details though. It also uses best practice OAuth2 for authentication, including a random code per user so others can't spoof your account. The mechanics of that are in the shinyga() package code and Google files on OAuth2 is here: https://developers.google.com/accounts/docs/OAuth2 You can also see the code for data fetches to the GA API in the shinyga() code. The authentication will drop a cookie so you can log in easily next time, but you can always revoke access by visiting with the authorised account, no password details are kept, and no off-line access is possible https://security.google.com/settings/security/p... All that being said, the app could in theory write logs of your data, but it doesn't (hence making the bug hunting a bit more difficult!) You'll just have to take my word on that, but you know where I live on the web so hopefully that carries some kudos. But if finally that isn't enough data protection (and for some of my agency clients it wouldn't be) then you could consider buying your own server and hosting the app in a private environment, which is something I am thinking of offering given the response to GA Effect, a premium version where you can get unsampled data, more predictors,support and security etc. Give me a few months and there may be some more details about that here soon. :)
Hi Mark, Thanks for taking the time to provide that info - reading that extra doco certainly satisfies my initial concerns. As you say, some clients data is more confidential than others, so I would also be interested in checking out a privately hosted version of the app if you offer that in the future. Thanks again for your work getting this working, and congrats on building a great app
Hi Mark, Sorry just one more question on the app. In the Casual Impact documentation for R, it states that the model assumes you have a predictor variable, which is "a set control time series that were themselves not affected by the intervention". I was just wondering how this operates in your app, as from the interface it appears we are only dealing with a response variable (sessions for example). Is there some underlying predictor variable, or am I missing something? Thanks,
Hi Ben, no worries, good question. This app uses the most simple implementation of CausalImpact, where it has just one predictor and one response, which are the same variable (this is the first example in the CausalImpact documentation) You can add more predictors which is a lot more powerful, since the statistical confidence gets more robust. For example brand search from Google trends for SEO sessions, or PPC budget for paid search traffic. In the CausalImpact documentation they give the example of multi-regional paid traffic as the predictors, to judge the response on a test market. This is how I use it locally. I'm thinking on how to upgrade GA Effect to allow this, but keeping the interface simple enough. One suggestion has been to connect to a GoogleDoc.
Thanks Mark, I'm trying to get a json file using a custom Google Api (not the Google analytics) . And then build my dataset out of that file. Would you help me find where to give api root and function name and parameters etc? to get the json file? Thanks,
Hi Nasim, thanks for dropping by. I'm not sure which Google API you are using, as there are several, above and beyond GA. Do you have any more details? It may be best to ask the question on Stackoverflow then I can answer there if I can, or others who know can help. Post the link here when you do.
is it possible to combine all the file auth, ui and server in app.R file or will this mess up anything? Thanks for sharing...
Hi Hatham,Yes you could do that, I only separate them out for clarity. Also you may be interested in my shinyga() package that has been made since, that takes care of a few of the details.  -- Reply above this line to create a new comment -- Haytham left a new comment on How I made GA Effect - creating an online statistics dashboard using R: is it possible to combine all the file auth, ui and server in app.R file or will this mess up anything? Thanks for sharing... View the post and reply » Change your Posthaven email settings
Ya, I am looking at the package right now, you are awesome... I am fairly new to R but shiny is a big win.. thanks for sharing again
Is it possible in the future to make a video tutorial that walk people new to R through the installation process?
Thats a good idea, although if you are new to Shiny then its worth going through their tutorial first, to get used to what reactive programming is doing: http://shiny.rstudio.com/tutorial/
Hi Mark, thanks for your awesome blog! I'm still a newbie in building ShinyApps. But after reading your current post I'm still failing to understand the part about ui.R. In the actual app there is a sidebar with menuItem. How did you manage to create it without explicitly defining it in the upper code? Thank you! Eugen
Hi Eugen, thanks for your comment :) For the menuItem and general sidebar it comes with the package shinydashboard that you can read about here: https://rstudio.github.io/shinydashboard/
5 visitors upvoted this post.