Reporting on group of customers post-black Friday performance

In my line of work, we tend to deal with a lot of different tasks, however, one day a year, there is so much traffic that we need to run fast in order to give our clients insights before, under and after – I am talking about Black Friday of course!

To make sure our client’s where able to see how they were performing under black friday, we did live dashboards that showed their performance, which Google Analytics cannot give, due to the amount of traffic on Black Friday and their natural time delay for processing data. In able to do this, we created a pipeline where we took the network call from the Google Analytics tracking code and transformed the call to grab the information and send it directly to BigQuery. From here it was easy to set the live dashboards up in datastudio.

After that, we had a lot of reporting to do in order to benchmark against targets, last year and other similar clients.

To do this, we made a script in R!

For this walkthrough I have used the following libraries:

  1. Google AnalyticsR
  2. apply
  3. Tidyverse
  4. future.apply

The first thing we will be doing is to load the libraries:

library(googleAnalyticsR)
library(future.apply)
googleAnalyticsR::ga_auth()

Once that is done, create a list of which views we need to grab data from. In our case, it was our customers. Since each of our customers’ data is located in a specific view in Google Analytics we find the right view from which we want to report and copy it in, and name the variable the name of the customer. Usually you can go into any view and you will see that in the URL there will be a reference to the Account(A), Property(W) and View(P):

https://analytics.google.com/analytics/web/#/report-home/a12345678w12345678p12345678

From here assign each GA link to a variable.

Customer1 <- “https://analytics.google.com/analytics/web/?authuser=2#/report-home/aAAAAAAAAwWWWWWWWWpPPPPPPPP”
Customer2 <- “https://analytics.google.com/analytics/web/?authuser=2#/report-home/aAAAAAAAAwWWWWWWWWpPPPPPPPP“
Customer3 <- “https://analytics.google.com/analytics/web/?authuser=2#/report-home/aAAAAAAAAwWWWWWWWWpPPPPPPPP”

Next thing is to take the link and strip it for the correct information that the googleAnalyticsR-package needs in order to get information. We do that by using gsub. Then we create a function(x), and take the information that appears after “p” in the link.


library(tidyverse)
extract_view <- 
  function(x) {
    gsub(".*p", "",x)
  }

Because we made a function we can now use this again. We do this by making yet another variable with the naming convention as before, however, this time we overwrite the data in our global environment with real account data, and not just the link. Note that we still haven’t requested any data yet. We are still building our data model.

Customer1 <- extract_view(Customer1)
Customer2 <- extract_view(Customer2)
Customer3 <- extract_view(Customer3)

After that take all over customer values and assign them to a dataframe. we will actually make two dataframes. One that takes the values from the above code, and one that takes the “names”

Dataframe 1

all_views <- c(customer1,custom2,customer3)

Dataframe 2

listname <- c("customer1","customer2","customer3")

Step 2

Now that we have the model done and ready, we will start to get the actual data from the Google Analytics API. Now we need to make the call to API we do that by using plan(multisession). To save time and be most efficient, we now bundle our call into gaids <- all_view. We then create a start date and an end date, to make sure that our data is not heavily sampled.

## setup multisession R for your parallel data fetches 
plan(multisession)
## the ViewIds to fetch all at once
gaids <- all_views
date_start <- "2018-11-19"
date_end <- "2018-11-25"

Now we will write our fetch command to get data. For this, we will create a function called my_fetch in which we will make our call. ga_auth(“.httr-outh”). The httr-ouath-file is the file that checks if our authentication through the browser is still valid, and this is a way of making sure that we refer to this file, and hopefully save some time in referring to it directly. We then make a Google Analytics call, and in there define our date-range, and metrics, and dimensions. Note that we haven’t defined our metrics and dimensions yet. We will get to that part later.

my_fetch <- function(x) {
  ga_auth(".httr-oauth")
  google_analytics(x, 
                   date_range = c(date_start, date_end), 
                   metrics, 
                   dimensions)

We now want to iterate over each customer name and list the data. We do this with a for-loop. Here we take our all_data and create a variable: all_data <- future_lapply(gaids, my_fetch). This takes our gaids, which we stripped from the URL, and the data fetched from my_fetch. We then create a new variable: alldata <- all_data. Then we iterate each brandname (customer name) to listname[i] 

all_data <- future_lapply(gaids, my_fetch)
alldata <- all_data
for(i in 1:NROW(alldata)){
  alldata[[i]][["brand"]] <- listname[i]
}

Once that is done, join the data, so we have a decent looking table, where the data is placed correctly according to the customer name.

brandTabel <- data.frame(alldata[1])
for(i in 2:NROW(alldata)){
  df2 <- data.frame(alldata[i])
  brandTabel <- full_join(brandTabel,df2)
}

Now, create a new variable and a new column named brand that takes the brand and makes it lower case. Then calculate the average basket size by taking Transaction Revenue and divide it by transactions. In R we need to refer this to the specific table which means that it looks like this: brandTabel$transactionRevenue/brandTabel$transactions. The same we do to calculate the conversion rate: brandTabel$transactions/brandTabel$sessions. We then end up by defining it as a new data frame called brandTabel 2018.

brandTabel$brand <- tolower(brandTabel$brand)
brandTabel$avgBasketSize <- brandTabel$transactionRevenue/brandTabel$transactions
brandTabel$conversionRate <- brandTabel$transactions/brandTabel$sessions
brandTabel -> brandTabel2018

Since we also want to compare this with last years numbers, we copy the entire R script and create a new with prefix 2017, and of course, change the date format.
As mentioned earlier we now have to define what metrics and dimensions we want to use in our data. For the next part to work, we need to save these as scripts. I have saved these as 2017_data.R and 2018_data.R.

We now define a new data frame called metrics and one called dimensions. If we then load these into the global environment we can use them in our script. We then load the two previous scripts with source(“2018_data.R”), source(“2017_data.R”).  After we create a new data frame called data, which consists of a full-join that joins everything on both tables.

And then we write the last part as a CSV.

metrics <- c("sessions","transactions", "transactionRevenue","itemQuantity")
dimensions <- c("date","hour","userAgeBracket","userGender","region")

metrics <- c("sessions","transactions", "transactionRevenue")
dimensions <- c("date","channelGrouping","deviceCategory")

source("2018_data.R")
source("2017_data.R")

data <- full_join(brandTabel2017,brandTabel2018)

write.csv2(data, "bf_channel + Device")

Leave a Reply