Automating Google Console search analytics data downloads with R and searchConsoleR

Yesterday I published version 0.1 of searchConsoleR, a package that interacts with Google Search Console (formerly Google Webmaster Tools) and in particular its search analytics.

I'm excited about the possibilities with this package, as this new improved data is now available in a way to interact with all the thousands of other R packages.

If you'd like to see searchConsoleR capabilities, I have the package running an interactive demo here (very bare bones, but should demo the data well enough).

The first application I'll talk about in this post is archiving data into a .csv file, but expect more guides to come, in particular combining this data with Google Analytics.

Automatic search analytics data downloads

The 90 day limit still applies to the search analytics data, so one of the first applications should be archiving that data to make year on year, month on month and general development of your SEO rankings.

The below R script:

  1. Downloads and installs the searchConsoleR package if it isn't installed already.
  2. Lets you set some parameters you want to download.
  3. Downloads the data via the search_anaytics function.
  4. Writes it to a csv in the same folder the script is run in.
  5. The .csv file can be opened in Excel or similar.

This should give you nice juicy data.

Considerations

The first time you will need to run the scr_auth() script yourself so you can give the package access, but afterwards it will auto-refresh the authentication each time you run the script.

If you ever need a new user to be authenticated, run scr_auth(new_user=TRUE)

You may want to modify the script so it appends to a file instead, rather than having a daily dump, although I do this with a folder of .csv's to import them all into one R dataframe (which you could export again to one big .csv)

Automation

You can now take the download script and use it in automated batch files, to run daily.

In Windows, this can be done like this (from SO)

  • Open the scheduler: START -> All Programs -> Accessories -> System Tools -> Scheduler
  • Create a new Task
  • under tab Action, create a new action
  • choose Start Program
  • browse to Rscript.exe which should be placed e.g. here:
    "C:\Program Files\R\R-3.2.0\bin\x64\Rscript.exe"
  • input the name of your file in the parameters field
  • input the path where the script is to be found in the Start in field
  • go to the Triggers tab
  • create new trigger
  • choose that task should be done each day, month, ... repeated several times, or whatever you like

In Linux, you can probably work it out yourself :)

Conclusion

Hopefully this shows how with a few lines of R you can get access to this data set.  I'll be doing more posts in the future using this package, so if you have any feedback let me know and I may be able to post about it.  If you find any bugs or features you would like, please also report an issue on the searchConsoleR issues page on Github.

Finding the ROI of Title tag changes using Google's CausalImpact R package

After a conversation on Twitter about this new package, and mentioning it in my recent MeasureCamp presentation, here is a quick demo on using Google's CausalImpact applied to an SEO campaign.

CausalImpact is a package that looks to give some statistics behind changes you may have done in a marketing campaign.  It examines the time-series of data before and after an event, and gives you some idea on whether any changes were just down to random variation, or the event actually made a difference.

You can now test this yourself in my Shiny app that automatically pulls in your Google Analytics data so that you can apply CausalImpact to it.   This way you can A/B test changes for all your marketing channels, not just SEO.  However, if you want to try it manually yourself, keep reading.

Considerations before getting the data

Suffice to say, it should only be applied to time-series data (e.g. there is date or time on the x-axis), and it helps if the event was rolled out on only one of those time points.  This may influence the choice of time unit you use, so if say it rolled out over a week its probably better to use weekly data exports.  Also consider the time period you choose.  The package will use the time-series before the event to construct what it thinks should happen vs what actually happened, so if anything unusual or spikes occur in the test period it may affect your results.

Metrics wise the example here is with visits.  You could perhaps do it with conversions or revenue, but then you may get affected by factors outside of your control (the buy button breaking etc.), so for clean results try to take out as many confounding variables as possible. 

Example with SEO Titles

For me though, I had an example where some title tag changes went live on one day, so could compare the SEO traffic before and after to judge if it had any effect, and also more importantly judge how much extra traffic had increased.

I pulled in data with my go-to GA R import library, rga by Skardhamar.

Setup

I first setup, importing the libraries if you haven't got them and authenticating the GA account you want to pull data from.

Import GA data

I then pull in the data for the time period covering the event.  SEO Visits by date.

Apply CausalImpact

In this example, the title tags got updated on the 200th day of the time-period I pulled.  I want to examine what happened the next 44 days.

Plot the Results

With the plot() function you get output like this:

  1. The left vertical dotted line is where the estimate on what should have happened is calculated from.
  2. The right vertical dotted line is the event itself. (SEO title tag update)
  3. The original data you pulled is the top graph.
  4. The middle graph shows the estimated impact of the event per day.
  5. The bottom graph shows the estimated impact of the event overall.

In this example it can be seen that after 44 days there is an estimated 90,000 more SEO visits from the title tag changes. This then can be used to work out the ROI over time for that change.

Report the results

The $report method gives you a nice overview of the statistics in a verbose form, to help qualify your results.  Here is a sample output:

"During the post-intervention period, the response variable had an average value of approx. 94. By contrast, in the absence of an intervention, we would have expected an average response of 74. The 95% interval of this counterfactual prediction is [67, 81]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 20 with a 95% interval of [14, 27]. For a discussion of the significance of this effect, see below.

Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 4.16K. By contrast, had the intervention not taken place, we would have expected a sum of 3.27K. The 95% interval of this prediction is [2.96K, 3.56K].

The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +27%. The 95% interval of this percentage is [+18%, +37%].

This means that the positive effect observed during the intervention period is statistically significant and unlikely to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears substantive significance can only be answered by comparing the absolute effect (20) to the original goal of the underlying intervention.

The probability of obtaining this effect by chance is very small (Bayesian tail-area probability p = 0.001). This means the causal effect can be considered statistically significant."

Next steps

This could then be repeated for things like UX changes, TV campaigns, etc. You just need the time of the event and the right metrics or KPIs to measure them against.

The above is just a brief intro, there is a lot more that can be done with the package including custom models etc, for more see the package help file and documentation.

My Google Webmaster Tools Downloader app

Here is a tool that I have used for SEO analytics, that I am now making publicly available. It extends Google Webmaster Tools to help answer common SEO questions more easily.

Visit the Google Webmaster Tools Downloader

Here are a few example questions it helps answer:

  • SEO keyword rankings taking into account personalisation and localisation for Google, in this age of (not provided)
  • SEO keyword performance beyond the 90 days available by default e.g. year on year comparisons
  • How a segment of keywords have performed over time e.g. brand vs non-brand
  • How click through rates change over time e.g. after a website migration.
  • How new/old website sections perform in Google search via the Top Pages reports

These things were a lot easier before (not provided) took keywords out of web analytics.  This left Google Webmaster Tools as the only reliable source of rankings, but it was not an ideal replacement, with limitations that needed to be worked around by downloading data via an API - an API that rarely gets updated.

I'm aware this app could quickly become obsolete if Google updated GWT, but it has also served as a great project for me to get to know working with App Engine, jinja2 templating, Google Charts, caching, Stripe, Bootstrap, etc. so its all been worthwhile - I think I can safely say its been the most educational project I've done, and can serve as another template for more sophisticated APIs (the Google Tag Manager API is in sights)

Its also my first app that will be charged for, simply because keeping a daily breakdown of keywords in a database carries a cost, which is probably why Google don't offer it for free at the moment. There are other web apps on the market that do downloads for free, but I am wary of those by the adage "if you don't pay for a service, you pay with your data".

I plan to follow it up with more deeper features, including Tableau examples of what you can do with this data once you have it at such a deep level. 

For now, if you want to sign up to test the alpha, please check out the signup page here

SEO Is So Boring

SEO is so boring, and you think so too which is why you're reading this post.  Let me validate your feelings, with my personal reasons gathered from being involved in the industry 8 years. 

The main problem is that the SEO blogosphere talks about the same things every two years, with the same conclusions. These are:

  1. Paid Links are evil/good.  Actually, Google wouldn't care either way if its algo could surface content without paid links, but until then they use FUD to make SEOs eat each other.  The newish link disavow tool crowdsources this in a marvellous manner.
  2. A website starting with M and ending with Z will publish a "revolutionary" SEO tactic that will "transform" the industry, to help justify its subscription to its users.  Those user's and other invested interests will post things like "Its fucking amazing!!". Other SEOs will point out that its crap. The publishers are happy just to be talked about whatever.  If they are lucky, Matt Cutts will comment pointing out what they say is indeed, crap.
  3. A Big Brand will be penalised for some SEO tactic.  They will come back again in a fairly short time, much shorter than if it happened to your website, for example. This will be due to them spending lots on AdWords, despite Google public denials. Outrage.  Google penalties are political, deal with it.
  4. SEO is dead.  People confuse an SEO tactic with SEO.  Google discount one method due to spammers taking the piss - see guest blogging, infographics, directories etc. Those SEO's and non-SEO's who relied on that tactic, mostly link building to paper over unoptimised websites, find they have no more ideas, and decry SEO's death.
  5. Rebranding of SEO. Every so often, SEO will have its name changed by industry leaders, to try and disassociate with the above.  There will be discussion on why, how and what anyone cares other than the company trying to own the new keyword space.

Another major problem is that every SEO blogger/consultant/agency will at some point decide to run a content campaign as "content is good for SEO".  This means a proliferation of half-arsed reheating of SEO content, which range's from paraphrasing Google help files to program manuals with "for SEO" tacked on the end - "Excel2012 for SEO", "Using Twitter for SEO" etc. etc.  or perhaps its just the old standard X number of ways to do Y.  Bite-sized content designed for amateurs, written by the unqualified, since those who have time to maintain a heavy schedule of SEO publishing, don't have enough time to do actual SEO.  The best SEO's I've met hardly had time to tweet once a week. 

Finally, for a lot of companies that need SEO help, even these days its still the fundamentals that need looking at - title tags, duplicate content etc., which for very large companies can be a nightmare to correct. A lot of SEO opinions on the web work fine if you're running a Wordpress blog, but once it gets to a certain level of SEO its mainly about prioritisation - what things should you concentrate on to get most impact to bottom line revenue? 99% of the time its not going to be some secret SEO tactic, but getting an SEO fundamental correct, and its very rare this prioritisation is talked about - there isn't much more to say.

Don't be so negative

Ok. 

There are some interesting developments fuelled by search engines, mainly Google, for which the SEO industry feeds off of for its food scraps, another source of resentment it seems to some SEO bloggers. 

SEO for non-Google is interesting.  Yandex and Baidu have different models and philosophies, and optimising for the new searches in say AppStores, LinkedIn or Facebook offers new avenues.  

Google's move away from top 10 result search page towards its mission to be the Star Trek computer is exciting, and services like Google Now, Google Glass, and semantic technology combining to become the Internet of Things sounds like SEO's will become more like data curators than data manipulators.  

Likewise the move towards treating SEO holistically as part of a user journey, rather than a last touch channel, holds interest from an analytics viewpoint.

I don't mean to change anything with this post, and am probably contributing to the problem putting it out there, but at least I will have something to point to in the future when asked about the latest SEO fad.