Notes‎ > ‎

Google Analytics + R

posted 27 Mar 2012, 01:29 by David Sherlock   [ updated 27 Mar 2012, 02:55 ]

Comparing your resources and cross domain report generation. Part 1. Grabbing the Data.


If you are like me you might have noticed that your list of Google Analytics is starting to get out of hand.



My online presence seems to be spread across many different resources, choosing where and when to push different outputs for maximum impact is a challenge. Part of this challenge is trying to gain an understanding into the different trends of visitors have across my resources.  I’ve found that installing Google Analytics is a good starting place as provides a rich collection of stats and makes it easy to identify user trends.

As Google Analytics lets users add multiple administration or user accounts to profiles I’ve quickly found that I’ve suddenly got access to lots of stats from different online resources that I push my work too. However I’ve found that while Google Analytics does a great job of giving me stats for these separate resources and even lets me have multiple tracking codes per profile it becomes difficult where my different online personas cross over, for example sometimes I might want to combine work stats with personal sites stats and while I want to keep these things separate in Google Analytics there are occasions where I might want to combine the stats or compare them easily against each other.

For a while I have been toying with the idea of using the Google Analytics API to create something that I could use to auto generate reports that clearly showed both stats from different profiles but also gave me the option to combine stats from different profiles. 

I’ve picked R as my weapon off choice as I thought it might be a good way to do cross domain analysis once I have the information in there. A Google Analytics package is available and I’ve modified a script written by Fridolin Wild at the Knowledge Media Institute of the Open University that will go through all your profiles and put the data in matrixes. Heres how to get started:


1)Grab R (plus RStudio/Eclipse/some other sensible ide) + the Google Analytics package (wasn’t in CRAN repositories when I looked)
2) Copy the script below and fill in your own Google Account details. (many many thanks to Fridolin Wild who did the heavy lifting here)




source("packages/RGoogleAnalytics/R/RGoogleAnalytics.R")
source("packages/RGoogleAnalytics/R/QueryBuilder.R")

VERBOSE=TRUE

# 1. Create a new Google Analytics API object, authorise and get profiles
ga <- RGoogleAnalytics()
ga$SetCredentials("youremail", "yourpassword")
profiles <- ga$GetProfileData()

startdate = as.Date("2011-06-01")
enddate = as.Date("2012-03-24")

stats = list()
statsname = list()
no_profiles = length(profiles$profile$TableId)

for (n in 1:length((profiles$profile$TableId))){
  
wprfl = n
query <- QueryBuilder()
query$Init(
  start.date = as.character(startdate),
  end.date = as.character(enddate),
  dimensions = "ga:date",
  metrics = "ga:visitors,ga:pageviews,ga:timeOnSite,ga:visits,ga:pageviewsPerVisit",
  sort = "ga:date",
  table.id = as.character(profiles$profile[wprfl,3])
  )

  statsname[[n]] = profiles$profile[wprfl,2]
  stats[[n]] = ga$GetReportData(query)

}

# -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
# prep stuff

cs = c( "#FF9900", "#8AD71B", "#FFCC00", "#6ABFF5", "#8E4CE8", "#D02D2D", "#2AB8BD", "#BDA32A", "#1B2B8B", "#6A8B1B", "#1B728B", "#5A1B8B", "#828282", "#75A982" )


# -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
# aggregating visitors
ndays = difftime(enddate,startdate)

visitors = matrix(0, nrow=no_profiles, ncol=0)
visits = matrix(0, nrow=no_profiles, ncol=0)
pages = matrix(0, nrow=no_profiles, ncol=0)

rownames(visitors) = profiles$profile$ProfileName
rownames(visits) =  profiles$profile$ProfileName
rownames(pages) =  profiles$profile$ProfileName

for (j in 1:length((profiles$profile$TableId))){
lastmonth = ""
mydate = startdate
nmonth = 0
i<-0

  for (i in 1:(ndays+1)) {

    month = strftime(mydate, "%m")
    year = strftime(mydate, "%Y")
    
    if (lastmonth!=month) {
      nmonth = nmonth + 1
      
      if (j ==1){ 
        visitors = cbind(visitors, vector(length=nrow(visitors), mode="integer") )
        colnames(visitors)[nmonth] = paste(year,month,sep="-")
        visits = cbind(visits, vector(length=nrow(visits), mode="integer") )
        colnames(visits)[nmonth] = paste(year,month,sep="-")
        pages = cbind(pages, vector(length=nrow(pages), mode="integer") )
        colnames(pages)[nmonth] = paste(year,month,sep="-")
      }
    }

    visitors[j,nmonth] = visitors[j,nmonth] + stats[[j]]$data[i, 'ga:visitors']
    visits[j,nmonth] = visits[j,nmonth] + stats[[j]]$data[i, 'ga:visits']
    pages[j,nmonth] = pages[j,nmonth] + stats[[j]]$data[i, 'ga:pageviews']
    
    mydate = mydate + 1
    lastmonth = month
    
  }
}


#sum(visitors)
#sum(visits)
#sum(pages)


# -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
# visitors

  pdf(file=paste("stats.pdf", sep=""), width=20/2.54, height=20/2.54)
  op=par(mar = c(5, 11, 8, 8)+0.1)
  maxp = max(pages)+100 # or fixed if you want to compare against other domains e.g.: 13000

plot(visitors[profiles$profile$ProfileName[i],], type="l", col=cs[1], xaxt="n", cex=2, cex.axis=1, cex.sub=2, cex.lab=2, xlab=toupper(statsname[i]), ylab="visitors", ylim=c(0,maxp), frame.plot=F, cex.main=2, lwd=3 )

for (i in 1:length((profiles$profile$TableId))){ 
  lines(visits[profiles$profile$ProfileName[i],], col=cs[i], lwd=3)
#lines(pages[profiles$profile$ProfileName[i],], col=cs[i+1], lwd=3)
  axis(1,at=1:ncol(visitors),labels=colnames(visitors))
}
# labels


#text(ncol(visitors), round(sum(visitors[1,(ncol(visitors)-2):ncol(visitors)])/3), "visitors", col=cs[1], pos=4, cex=1.5, srt=0, adj=1, xpd = TRUE)
# text(ncol(visits), round(sum(visits[1,(ncol(visits)-2):ncol(visits)])/3), "visits", col=cs[2], pos=4, cex=1.5, srt=0, adj=1, xpd = TRUE)
# text(ncol(pages), round(sum(pages[1,(ncol(pages)-2):ncol(pages)])/3), "pages", col=cs[3], pos=4, cex=1.5, srt=0, adj=1, xpd = TRUE)



dev.off()
  


Now you show have pages, visitors and visits matrices  with stats information for each domain. Should make it easier to compare stats. I had a quick go and generated something like this: (but removed axis and descriptions.. sensitive data..)


Next steps are too

1) Improve the script to include data.frames of other user stats. user location etc
2) Start a visualisation script
3) Generate an auto report.






Comments