Part 2: Creating the Marketing Data Lake – Building a Marketing Data Lake and Data Warehouse on Google Cloud Platform

In the previous blog post in this series, we talked about why we decided to build a marketing data warehouse. This endeavor started by figuring out how to deal with the first part: building the data lake. We’ll try to go a bit more into detail on how you can do this yourself in this post in which we’ll create a marketing data lake. If you’re looking for more inspiration on this topic I would recommend reading this post by the Google Cloud team on how to build a Marketing Datawarehouse. A lot of the thoughts in their article align with our views and the process we followed.

This blog post is part of a series of four? (we found enough content to add more articles ;-)), in which we’ll dive into the details of why we wanted to create a data warehouse, how we created the data lake, how we used the data lake to create a data warehouse. It is written with the help of @RickDronkers and @hu_me / MarketLytics who we’ve worked with alongside during this (ongoing) project.

The Process of Building a Data Lake

Three Options: BigQuery Transfer Service, Firebase, SuperMetrics, and StitchData

Loading data from a dozen marketing vendors (not all of them are visualized in this graph) as efficiently as possible was important to us as we tried to save most money and resources for other parts of the data warehouse integration. In the end, we believe that we should let people do the jobs that they’re good at and leverage vendors instead of always following a ‘build ourselves’ strategy. This makes it possible for us to focus on the actual use cases instead of having to worry about the upkeep of integrating with over a dozen vendors, something that could/would take up a chunk of time in the long term.

That’s why we made a few decisions on how to load data:

  • Firebase: Google Analytics App & Web automatically can export to a BigQuery data set. So we decided to run our regular Universal Analytics integration of Google Analytics in sync with App & Web for now so we have the ability to cheaply export Google Analytics data.
  • BigQuery Transfer Service: Google Cloud offers some great integrations for YouTube, Google Ads that can be set up within minutes which makes it possible to instantly backfill and load data.
  • StitchData: We decided to go with Stitch as their pricing was more flexible and cheaper for the needs that we had. There are many alternatives to them, but they had integrations with most vendors that we needed. Since we didn’t require Enterprise level SLA’s (yet), we knew that we could rely on their standard integrations with many vendors that we have integrated with.
  • SuperMetrics: Some vendors weren’t available at the time in StitchData (Facebook Insights) so we decided to get a license for their BigQuery export.
  • ‘Manual Import’: As a last resort we decided to build connections to BigQuery ourselves, so far we only had to do that for 2 vendors which made the workload for this lower than expected.

Understanding platform APIs and Data Granularity

There were over a dozen vendors that we wanted to import data from and in different ways, as described, to load the data into our data lake: Google BigQuery. We needed to ensure upfront that data and platforms that we would be using were up for the task, as some important areas could have thrown problems our way. This was needed to build a good understanding of the platforms and also how we could leverage them. What we looked at:

  • Data Freshness & Backfill capabilities: How often would the data refresh and how often would we require a refresh of the data? For some vendors a 24-hour sync was more than enough but for others this meant that a 24-hour sync would last for many hours, basically making yesterday’s data useless as it would take too long. On the other side, we needed to make sure that we could backfill data from vendors and with most decisions on how to load this data that was possible (except for manual backfill, in which case we needed to write scripts that could go back in time to backfill its data).
  • Cut time to collecting: Do we need all the data from an API or only partial responses? And in combination with the previous point, do we need to load data hourly (realtime wasn’t much of a case) or could we just gather the data every day to save time collecting it?
  • Hard to see all situations beginning end to end: Google Ads exports dozens of tables through their Transfer Service into BigQuery, which meant that it took a while to figure out what tables to use and how certain columns were translated into API fields & tables.
  • Using data to find gaps and roadblocks: What worked out for us is that we tried visualizing the data as soon as possible so we could quickly find anomalies (days of data missing due to a broken backfill process), otherwise the data wouldn’t match the interface because metric X turned out to be calculated differently from what we thought.

Associated Costs

  • BigQuery Storage: The lowest cost out of all of this, while we store dozens of GBs daily it’s ridiculously cheap to store all of this long-term.
  • Stitch: Pricing can be found here. This is really dependent on how much data you’re exporting and from how many sources as you’re billed based on usage.
  • SuperMetrics: Pricing can be found here. Basically $190 per vendor monthly.
  • BigQuery Transfer Service: This is free for a couple of months for Google Ads export and YouTube. Previously it was $2.50 – one less coffee at Starbucks monthly.
  • BigQuery BI Engine Reservation: While it’s not fully related to the data lake I want to add a mention for it. We’re paying $60 monthly for a 2GB size reservation. Currently, we’ve optimized queries as much as possible to be under this limit, if needed we might move up. This has been a game-changer to make querying for dashboarding lightning fast.

Cutting time to prove the value

We knew many use cases that we wanted to solve with our data lake, for example: show Google Analytics metrics next to the open, unsubscribe rates from our marketing newsletters (Marketo). This meant that we had the work cut out for us and could quickly take a sample of the data and build out the use case to see if the data could be matched (would we run into tracking issues somewhere, we sometimes did) and what possibilities we had for visualization. In short, we tried to cut as much time out of the initial process as possible, by not focusing on scalability yet, to figure out if the use cases that we had actually could provide value for the teams. Basically thinking about the end goals that we had in mind.

We did that by:

  • Talking to end stakeholder – interview & document: The goal of the projects was to, at a minimum, provide every channel owner (PPC, SEO, Social, etc.) with dashboards with their core metrics. That was the basic level that we asked of them to provide, on top of that, we discussed briefly with them all the other use cases that they knew about that they wanted to repeat. By doing this we immediately had documentation on how the process currently worked for them, what data sources were needed to be included and what needed to be built. But more importantly, it also created their buy-in as they were able to pick and choose what we build for them.
  • Taking a sample and building a use case: Load the data from the vendor into the data lake and investigate what the schemas were and if the granularity of the data was enough. As long as there was a key in multiple tables that we could leverage between vendors, we knew we were on the right track (examples: UTM tags in emails needed to match with Google Analytics conventions).
  • Automate & Backfill later: We never focused on making sure that this could scale over the next years or would backfill our data directly for multiple months at a time. In the end, we noticed that we knew the process was iterative and that we would come up with new use cases which would potentially mean that we had to backfill many times for the same vendors.
  • Backfill Manually: Depending on the vendor and type of automation we could put in place, we backfilled some of the data manually where it was needed. Sometimes manually connecting some data points in a database makes more sense when you try to match vendors than it does trying to overcomplicate things in a codebase. We figured this out quickly when we had 50 email templates that we manually needed to map to other data sources.

The (unexpected) problems that we ran into?

Certain things always go differently from the expectations that we set upfront; that’s why I wanted to touch on some of the ‘problems’ that we ran into and how we dealt with it.

Inaccurate data: Dashboard versus Dump

Exports are always different from the data you see in the dashboards, mainly because the information is stored differently from how it’s visualized. For example, Cost data from Google Ads is saved in micro-cents, which means you always have to divide it to get the correct number. CTRs from multiple vendors are just a calculation of Clicks / Impressions. And in some cases, it meant that specific data points aren’t available for export through an API (looking at you Facebook).

Overall this means that you have to assess upfront what data schema is available in export carefully and if this matches the use cases that you want to fulfill. If that’s not the case, sometimes it might not be worth it to try to automate it.

Duplication as a problem in Stitch

StitchData has a different way of exporting data into BigQuery than we were used to. This meant that we had to get used to deduplicating data before we used it for analysis; in short: they created new rows in tables when they notice the unique row has changed. This means that we always needed to find the latest row value for a record. Not a massive problem in itself, but it took us a few days to figure this out as the documentation around this, at the time, lacked.

Backfills

How much useful data do you have available by vendor? As some vendors are very structured: Google Ads have their own transfer service into Google BigQuery, so it was easy to backfill the data here. But for other vendors: Marketo, Sendgrid, it was harder as we made so many changes to professionalize it over time. This meant that for some, we were able to backfill data into 2018, whereas for others, we could barely get data from the last six months. Knowing this upfront is essential as it might make certain use cases irrelevant if you’re trying to learn more from YoY behavior, and it turns out the data isn’t compatible.

Exceptions, for some sources (read Marketo) we had to rely on other solutions. There are limitations to some of the data that we can get out of their (old skool) API. So for some of it, we’re getting daily email reports that are parsed and sometimes ‘manually’ ingested into Google BigQuery. It’s some of the workarounds that we had to do as we would want the data to be accessible through our data lake and some vendors aren’t always as accessible.

Naming Conventions

While we had an early discussion about the naming conventions that we wanted to follow, this was more food for thought than expected. Quickly we had tons of datasets in BigQuery that inside of them followed their conventions (defined by the data load providers). As we were applying ETL on some data tables, it meant that we needed to make sure that it was clear what every table and field in the schema meant. For example, there were tables that were only used: temporarily, for backfill, for production, for development, for transformation, etc. What made it more complicated is that most of them were also individualized by vendor. In our case, we settled on making the distinction between: ‘prod/dev’ + ‘raw/formatted/dashboard’ + ‘tables’ for datasets and then specifying the specific vendor name and goal with the tables in the dataset.


Again… This blog post is written with the help of @RickDronkers and @hu_me / MarketLytics who we’ve worked with alongside during this (ongoing) project.


Part 1: Why Build a Marketing Data Warehouse? – Building a Marketing Data Lake and Data Warehouse on Google Cloud Platform

This blog post is part of a series of three, in which we’ll dive into the details of why we wanted to create a data warehouse, how we created the data lake, how we used the data lake to create a data warehouse. It is written with the help of @RickDronkers and @hu_me / MarketLytics, who we’ve worked with alongside during this (ongoing) project.

Six months ago, we were looking at Google Sheets, Google Analytics, and many other tools/vendors daily and had no marketing data live in a data warehouse (nor a lake). We wanted to get better at using (marketing) data but mostly find a better way to connect insights from multiple sources to find insights but also make more data available to the rest of the internal teams (think: PPC, SEO, Email, Product).

Why?

  • Prove business value: Everything we do as Marketing needs to prove some kind of business value. We’re not doing things just to keep ourselves busy. So we wanted to make sure upfront that the business cases that we had in mind would have an upside by creating this setup. Some of these were: detailed insights into social media performance, attribution level data, and a way for us to connect transactions between data sources.
  • Access to ‘our’ raw data: As often with marketing data, you’re using the interface of the actual marketing vendor to pull your daily numbers. But since so much of the marketing data overlaps with internal data from other channels we wanted to get access to all the raw data that is ‘ours’ (to begin with) and have the ability to combine it. This, for example, would provide us with insights on the lifetime value of customers by channels or the ability to export cohorts of users.
  • The analysis doesn’t scale in a spreadsheet and isn’t repeatable: Besides creating reporting monkeys, we wanted our teams to not have to spend time pulling data from different sources and manually combining it. It’s too high a risk for potential mistakes, the process doesn’t scale, and valuable time of team members would be lost in the process.
  • More control of data quality: We want to control our data pipelines and be involved in the ETL (Extract, Transform, Load) process so we know how data is shaped. There shouldn’t be a black box of data being created in the background. With having this process potentially in place, we can control the quality and usefulness of data more (and continue more over time).
  • More flexibility for visualization and ‘deep analysis’: With having access to our raw data in a warehouse, we knew that we would have to rely less on building analysis right on top of vendor’s APIs or worry about potential issues on how to visualize data (on dashboards). 

Looking at the complete picture

Marketing data is often confined within the platforms and tools used to distribute the marketing message. But your prospects don’t confine themselves to just one source or medium to interact with your company (the path length is usually longer than one session before a transaction takes place). To be genuinely able to measure the effectiveness of your marketing activities, extracting the data from these platforms and having the ability to analyze them is essential. This goes back to our first point on Why. It needs to solve cases that provide business value.

Beginning with the end in mind

One of the biggest challenges in working with data is that often you don’t actually know precisely what data you need to solve for your use case. Usually, you tend to go through the cycle of “capture – store – enrich – analyze” a couple of times before arriving at really valuable insights.

By creating your own data warehouse, you create the flexibility to transform and relate the data to each other in ways that aren’t possible when your data is locked within separate platforms.

Build versus Buy?

While it was clear that we wanted to own as much of the data ourselves, an apparent next discussion was how to gain access to the actual data. Would we build the pipelines ourselves? Would we buy tools to do so, and how much of the transformation of data would fall on the team? How about maintenance?

Pros of Buying a solution

  • Plug and Play: A strong Pro for buying software usually is a decreased time to value because you’re skipping the building phase. So you avoid needing to have a strong learning curve and having to deal with the hard/soft-ware setup. What was a requirement for us is that we didn’t want to deal with a long process of setting up resources.
  • Scalability: As we grow, our datasets will grow. While paying extra for additional volume isn’t a problem, we wanted to make sure that the infrastructure would hold up and could last while we reach a certain factor of scale.

Cons for Buy & Build

  • High Costs: Buying tools would likely, over the long term, have more direct costs associated with it. While obviously, the indirect costs for building the solutions ourselves (increased headcount) would also be expected to be high.
  • Vendor Buy-in: Regardless of either buy or build, it’s a strong Con that we’re basically bought into a vendor’s platform that we wanted to make sure that certain aspects of it are able to be moved over to another service if we deem that to be the right choice at that point in time.
  • Standard data formatting: With some vendors, it was harder to change the data schema (often for a good reason) or pull additional data formats that we wanted to collect from, for example Facebook (the data was available in the API, just not in the vendor’s export).
  • Time to first use: Building pipelines for extracting the data from certain platforms could take a long time. But obviously the real business value will, in most cases, only show up when you’re actively working with the data. Losing months while dealing with extracting the data was something (also taking into account our seasonality, which provides a ton of data) that would delay this. So going with platforms that could directly provide us with extracted data was a great pro versus a strong con for building it ourselves.
  • Maintenance: We didn’t want to maintain extract pipelines from the marketing platforms that we use. While it’s a really important part of our data strategy, we wanted to avoid the need to spend a significant amount of our time to stay up to date and maintaining pipelines. With a buy solution, we would rely on the resources of the vendor to deal with this part of the data flows, which is very much what we would prefer at this point, mainly considering that the resources of our data team are minimal.

In the end, we decided to go with a combination of the two strategies. We decided to let most of the Extract processes be handled by software that we would buy, or that would already be available opensource as this is a task that we wanted to make as cost-effective as possible. Most of the Transform will be done through a set of tools that we would maintain and build ourselves as there are too many custom use cases and needs. With that, it meant that we also needed to ‘Load’ the data back into our data warehouse mainly ourselves. More on the details of that in our upcoming blog posts.

Costs?

An often asked question in this process is what kind of costs we were expecting to have. While we won’t disclose the costs for the additional help received, we’ll quickly touch on the costs for this whole setup thus far. Parts of this are variable based on the size of your data but so far our costs:

  • Google Cloud Platform: ~$400-700 monthly.
    • This includes costs for Google BigQuery, Cloud Composer and other tools. We’re expecting that this will increase at some point but gradually will go up over time.
    • Note: In another blog post we might dive deeper into Google Cloud Composer and how we leverage that for Airflow (processing data pipelines). Right now, Cloud Composer presents at least half the costs for Google Cloud Platform. If you decide not to use it, your costs will be significantly lower for GCP.
  • Stitch Data: $ 350-500 monthly based on volume for now. We’re expecting that this will increase at some point but gradually will go up over time.
  • SuperMetrics: $190 monthly per license per vendor (depending on the number of licenses you might need).

Strategy going forward

With some of these decisions made upfront, we decided to move forward with collecting the data channel by channel, starting with the most important channels. This way, instead of needing to collect all the data for every channel, we were able to directly start building dashboards on top of the data that would be useful for teams. The steps to take to get there were as follows, and we’ll dive into some of them in detail in upcoming blog posts:

  1. Extract all the data using Google BigQuery Transfer Service, Supermetrics and StitchData.
  2. Validate that the data in Google BigQuery is correct by comparing it to the regular data sets, existing dashboards, and the data that we would look at in the vendor’s platform.
  3. Transform the data so data sources can be combined and Transform the data to be directly useful for reporting purposes.
  4. Load this new data into new reporting tables, backfill historical data where possible, and sync it back into Google BigQuery.
  5. Depending on the channel, build dashboarding in Google DataStudio or Tableau.

In the next blog post, we’ll go into depth on how we created a marketing data lake, where all our raw marketing data from our vendors (Google Search Console, Google Ads, Bing Ads, etc.) lives. Expect that to go live within the next two weeks. Until then, feel free to share your thoughts on this blog post on Twitter.
Again… This blog post is written with the help of @RickDronkers and @hu_me / MarketLytics, who we’ve worked with alongside during this (ongoing) project.


What books am I reading in 2020?

For the last four years, I wrote blog posts (2019, 2018, 2017 & 2016) listing the books that I read in the past year and that I wanted to be reading during that year. As always, the past year I didn’t read all the books that I’ve listed out in the blog post as I discovered some new ones and changed my focus on some others. But I did read a lot, as I finished ~25 books (and put two books aside that weren’t worth finishing).

What books I didn’t get to in 2019 and have re-added to the list for 2020:

My favorites from 2019:

  • High Growth Handbook: My favorite book from last year as it touched right on the topics that I care about on a daily basis, great interviews with daily practitioners on many topics that relate to High Growth companies including a lot of tactical advice.
  • The Ride of a Lifetime: A fascinating read on how Robert Iger made it’s way to the top of Disney as their CEO. One of my favorites as it shows the personal side of growth he had to go through to reach the top and it provides some insights along the way of how big Disney is of a company.
  • High-Performance Habits: Nothing wrong with learning more about performance habits that could help you improve.
  • Loonshots: A different book from what I would usually read as it went into detail on projects that have changed the world and created innovations that we still benefit from these days (think radar, energy).
  • Secrets of Sand Hill Road: A great book if you want to know more about investing, money and the workings around startups related to investors, deal flows and structures.

What books I’d like to be reading in 2020:


Leave your recommendations on Twitter: @MartijnSch as I’d love to know from others what I should be reading from your recommendations and should be on the list or removed from the list.


Starting & Growing SEO Teams

“Tell me, who/what do I need to hire to run our SEO program? What is the first hire for a new SEO team?” Questions I often get, usually followed by: “Do you know anybody for our team?”. As so many companies around the Bay Area are hiring it makes sense, which also makes hiring harder. I’ve previously blogged about writing a better job description for SEO roles but I also wanted to shed some light into what I’d suggest as good setups for an SEO team and what roles + seniority to hire for depending on your company structure.

Why need SEO support?

Most startup founders or early employees don’t have an extensive background in Marketing or specifically SEO (and they shouldn’t). Most of the time, they have been too busy building the company, getting to product-market fit and iterating on their product/service. In a lot of growing B2C companies, you need to establish plans for long term growth. That’s what SEO can usually bring to these companies: a sustainable long term growth strategy. But in order to get there, you’ll need to bring in extra help to make sure that it actually is sustainable. Instead of employing short term SEO tactics that might put your growth more at risk if you approach it wrong (as many startups also do).

Why create an SEO Team?

So why do you need to create an SEO team, for many of us this is common knowledge as we’re in this on a daily basis? But let’s say you’re getting started, these could be some of the objectives:

  • Dedicated focus on SEO, there are too many other channels to take care off.
  • Need to grow a long-term channel to success.
  • Too many tasks, need to specialize with its own dedicated specialized IC/team.
  • Build out more brand awareness for the company (SEO is a great way of doing this long term).
  • Grow revenue & transactions at a low Customer Acquisition Cost.

Consultant versus Hiring Inhouse

Hiring an SEO Consultant versus an Inhouse SEO

Some teams can’t always hire talent right away (think about the Bay Area where basically all the bigger companies constantly have a need to hire SEO talent) or it might take too long to ramp up SEO. In some other cases, it made more sense for the company to hire a consultant in the short term to take care of some issues and figure out what’s actually needed instead of just hiring somebody with potentially the wrong skill set for the long term.

My take on this is usually that if you already know what you want your SEO team to work on & are able to wait another 2-3 months that you’re better off hiring somebody in-house (if resources are available). In other cases: you have a short term need, you need a technical SEO but want to hire a content person, etc. You’re likely better off starting with a specialized consultant in an area to make sure your issues around that are covered.

Also, read my blogpost on levels & seniority in SEO roles.

Hiring SEOs isn’t good enough: resources!

The Skills of an SEO Leader

Provide them with resources, when I joined Postmates one of the questions that I wanted to make sure was that they provided me with not just resources to set up some tools but also that I had engineers available to run the actual implementations and a designer to create the new pages that we needed there.

  • Engineering: It’s important, as you the SEO can’t make all the changes yourself, you’ll need the team to make actual changes. Most SEOs that I meet don’t have the knowledge about their infrastructure to actually push code or design something that complies with brand guidelines.
  • Design: You need to add additional content, you need more blog posts, but they can’t just be text. There need to be visual add-ons to it, so you need design support.
  • Content: In a bigger company there will be an actual huge need for content (either new or to edit existing content). 

How have you been growing SEO teams, what is missing, what should SEO teams really focus on? Leave a comment!


Announcing my Technical SEO Course on CXL Institute

If there was one thing that I could teach people in SEO, it was always the technical side of SEO that came up first. Mostly, because I think it’s a skill that doesn’t suit too many SEOs and there is already enough (good or bad, you’ll be the judge of that) content about the international, link building or content side of SEO out there. As technical SEO is getting more and more technical and in-depth about the subject itself, I’m excited to announce that I’m launching a new technical course with the folks of CXL institute.

The course will cover everything from structured data to XML sitemaps and back to some more basic on-page optimization. Along the way, I show you my process for auditing a site and coming up with the improvements. I’ll try to teach you about as many different issues and solutions as I could think of.

It’s not going to be ‘the most complete’ course ever on this topic, technical SEO evolves quickly, and likely some things will already be outdated now it’s published, while we have worked on it for months. But I’m going to do my best to inform you here and on CXL Institute about any changes or any improvements that we might be able to make in a future version. If you have any questions about the course or want to cheer me on, reach out via Twitter on @MartijnSch.


Keyword Gap Analysis: Identifying your competitor’s keywords with opportunity (with SEMrush)

Keyword research can provide you with a lot of insights, no matter what tool you’re using they all can provide you a great deal of insight into your own performance but also that of others. But while I was doing some keyword research I thought about writing a bit more about one specific part: gap analysis. In itself an easy to understand the concept, but it can provide a skewed view of your competition (or not). To demonstrate this we’ll take a look at an actual example of some sites while using SEMrush’s data.

What does your competition look like?

You know who your competitors are right? At least your direct ones, but often people who work in SEO or the level above (Marketing/Growth) don’t always know who the actual players are. I worked in the food delivery industry, but more often than not I was facing more competition from totally random sites or some big ones than our competitors (for good reason). So it’s important to know what your overlap is in the search rankings (it’s one of the reasons you should actually be tracking rankings, but that’s a topic for another day) with other sites. This way you know what competitors are rising/declining in your space and what you can learn from their strategies to apply to your own site. But this is exactly where the caveat is, is that actually the case!?

So let’s look at an example, as you can see in this screenshot from SEMrush the playing field for Site A is quite large. They’re ‘ranking’ for tens of thousands of keywords and are placing in a decently sized industry. While they’re ahead of their competition it’s also clear that there are some ‘competitors’ in the space that are behind them in search visibility.

So let’s take the next competitor, we’ll call them ‘Competitor A’. What we see here is that they rank for 250.000 keywords. A significant number still, compared to what we’re ranking for. It doesn’t mean though that all their keywords are what we’re ranking for. So let’s dive into gap analysis.

Keyword Gap Analysis

In short, there are three ways to look at keyword analysis:

  • What keywords am I ranking for, that my competitor is also ranking for (overlap)?
  • What keywords am I ranking for, that my competitor is not ranking for (competitive advantage)?
  • What keywords am I not ranking for, that my competitor is ranking for (opportunity)?

Today we’ll only talk about the last one, what keywords could I be ranking for, as my competitor is already ranking for them, to drive more growth. When using SEMrush you can do this by creating a report like this (within the Keyword Gap Analysis feature):

Screenshot of: Creating a keyword gap analysis report in SEMrush

You always have the three options available to select. In this case, we’ll do the Common Keywords Option. And the result that we should see looks something like this:

Screenshot of the result, with a list of keywords.

What are the keywords with actual opportunity?

So there is apparently xx.xxx keywords that I’m not ranking for (and likely should). That’s significant and almost leads me to believe that we’re not doing a good job. So what the problem often is, which is not a bad thing. Is that the majority of these keywords are being driven by the long tail (specific queries with very low volume). So what ends up happening is that I’m likely looking at tons of keywords that I don’t want to focus on (and hopefully will benefit from by just creating a little bit more generic good content). So when I did this for a competitor and filtered down on keywords that were for them at least ranking position <20 and had a volume >10 monthly I had only 2500 keywords left. That’s just a few % of the keywords that we got started with. It’s required to add that I’m not saying to ignore the other keywords, but now you have the keywords that you have a real opportunity to drive actual results. In the end, you should be able to rank well, as your competitor is already ranking (position: <20), there is actual volume (>10) and you’re not in there at all.

This is just something that I was playing with while exploring some industries, and it’s a topic that I haven’t seen a lot of content about over the last years. While the data is often available it will both help you get new content ideas but also helps you identify the actual value (keyword volume should turn into business results: impressions x CTR x Conversion Rate == $$$) on the revenue side.


Calculating Click Through Rates for SEO, based on Google Search Console Data (in R)

Updated June 4, 2019: Added a YouTube video which will guide you through the setup process in RStudio and how to run the script yourself.


Averages lie & average click-through rates aren’t very helpful! Here I said it. What I do believe in, is that you can calculate click-through rates (CTR) for your own site in a great way though. With data from Google Search Console. Especially while using R, so let’s dive in on how this works and what you can do with it!

Why ‘Averages’ Lie?

Look at the graph below, a great CTR for position 1. But for example, this research shows that the average CTR for position 1 is: 24% (AWR data, Feb 2019). Which one is correct? Neither of them. As it really depends on the industry, what features show up in the search results that might decrease CTR (think rich snippets like local packs, news results). All of this is making it really hard to make a good analysis of what you could expect if you rank higher for a bunch of keywords in your industry. So while I was working at Postmates on ranking certain category pages better we decided to calculate our own CTR and were intrigued by how far CTRs were off from research (the research isn’t wrong! It’s just generalized across industries). Eventually, with the data in hand, we were able to make better estimates/forecasting of how much traffic we could expect when rankings would increase in that segment. In the rest of this post, I’ll go more in-depth on the specific practice on how we calculated this.

Using Google Search Console Data

Visual of Google Search Console Report (this is not RVshare data)

You’ve seen this report in Google Search Console, providing you with a detailed view of the performance of your keywords and the average position for your keywords. In this graph, we see something positive, a CTR & position that go up slightly over time. But what if you would want to know the average CTR for a certain segment of keywords per position. That’s way harder to do in the interface. Because of that, I used the R script from Mark Edmondson that he wrote about here almost three years ago.

It will help you extract the data from Google Search Console in a raw way so you can use it to digest it and create your own visualizations (like the one we’ll talk about next).

Visualizing CTR Curves in R

CTR Curve visualized CTR per Position (note: this is randomized data for an unknown site)

So let’s dive right into how you can do this yourself, I’ll provide you with the full R script and you will need to download RStudio yourself in step 1.

  1. Download and Install RStudio
  2. Download the following .r script from Gist
  3. Run these commands to install the right packages for RStudio:
    1. install.packages(“googleAuthR”)
    2. install.packages(“searchConsoleR”)
    3. install.packages(“dplyr”)
    4. install.packages(“ggplot2”) if necessary
  4. Line #21 – Change this to the property name from Google Search Console
  5. Line #25 – Not neccesary: If you want the CTR curve for positions over 20, change the number.
  6. Line #40 – Recommended: Exclude the word(s) that are part of your brand name. So you get the right CTR curve for non-branded keywords only
  7. Line #41 – Not necessary: This script is taking a ‘sample’ of 50.000 keywords to calculate your CTR curve of. You can increase this limit to more if needed, if you have less than 50.000 keywords it’s not an issue
  8. Run the script! The output should be a visual as shown earlier in this post

Want to take a deep breath and let me help you go over this again? I’ve made a quick screen share video of what to do in RStudio and how to use the R script.

Hopefully, now, you’ve had a better chance to understand what the actual CTR is for your own site and you can use this to visualize CTR curves for specific parts of your site or pages that have a similar META description. Over time you could use this, for example, for measuring the impact on CTR.

Credits where credits are due! There are many use cases for using CTR data by visualizing it with R, and I’m grateful that a while ago Mark Edmondson opened my eyes about this + credits to Tim Wilson’s documentation on using R and improving visualizations.

Want to read this article in Spanish? Read it here.


Adding additional site speed metrics to Google Analytics: measuring First Input Delay (FID)

Web Analytics is still one of my pet peeves, and while I don’t get to spend a ton of time on it anymore these days, I still enjoy digging through blog posts and coming up with new ideas on what to track and how it can help for (speed) optimization. While I was looking through a big travel site’s source code a couple of weeks ago trying to figure out what we could improve, I noticed a Google Analytics event that was being fired that was ‘new’ to me. It was used to sent information about ‘interaction delays’ as an event. After digging, I figured out what it was, and as I couldn’t find a ton of information about the topic itself in relation to Google Analytics I think it’s worth a blog post.

Disclaimer: There is not a lot of new original material in here, a lot of the information can be found in old updates on the Google Developers Web Updates section, and most credits go to Philip Walton. But I think it’s worth giving these topics some more attention and providing some additional solutions on how to integrate this with Google Analytics.

Site Speed Reports in Google Analytics

The site speed reports in Google Analytics have been proven to be useful for optimizing average load times and identifying how long it takes for a certain page to load (page loaded). You can use the data to figure out if your server is slow (Server Connection Time and Server Response Time) or to look at Domain Lookup Time to see how long it takes for a DNS lookup. But have you ever noticed yourself that for some sites it takes a tiny bit while the page is loading to actually start interacting with it while it’s being painted (the JS/CSS scripts) on your screen? Mostly on slow connections, like your phones mobile network, this can be obvious from time to time. That’s why the following metrics can come in handy as they will start measuring the time to the first paint and the first input delay that can happen.

Why is this metric not already part of the reports? Google Analytics can only start measuring the data whenever you’re loading the script. The speed data that it reports on is being gathered through the Speed API in your browser, but other data for a metric like this isn’t part of that. It’s also a fairly technical metric as you will realize after this. So for most basic users, it would cause a lot of confusion I’d imagine.

First Input Delay – FID

The important definitions:

  • FCP – First Content Paint: The time it takes (in milliseconds) for the first paint (pixels) on the screen.
  • TTI – Time to Interactive: The time it takes for the page to start loading and to be fully interactive.
  • FID – First Input Delay: The time between the first interaction (click, scroll, JS) of the user and the time it takes for user input to be acted upon by the browser.

This FCP metric is already part of the reports that you might have seen in Lighthouse. The obvious problem with that is, is that it’s just a one-off metric. It could be different for your users and you likely want to have a higher sample size to measure this.

Measuring First Input Delays (FID)

So let’s talk about how useful this is actually going to be for your site, starting with the implementation for Google Analytics and Google Tag Manager. The ChromeLabs team has done an amazing job providing a small ‘library’ for tracking the performance metrics via JavaScript. So these are the steps to follow for tracking this in Google Analytics (Gtag/Analytics.js) and GTM:

Measuring First Input Delays in Google Analytics

The script that you have just included provides a Listener that can be used to check when an event needs to be fired or just to save it to the DataLayer.

If you’re using analytics.js add this to your HEAD (under the minified script and after initializing GA):

perfMetrics.onFirstInputDelay(function(delay, evt) {
  ga('send', 'event', {
    eventCategory: 'SiteSpeed Metrics',
    eventAction: 'First Input Delay',
    eventLabel: evt.type,
    eventValue: Math.round(delay),
    nonInteraction: true
  });
});

If you’re using gtag.js (GAs latest version) add this to your HEAD (under the minified script and after GA has been initialized):

perfMetrics.onFirstInputDelay(function(delay, evt) {
  gtag('event', 'First Input Delay', {
    'event_category': 'SiteSpeed Metrics',
    'event_label': evt.type,
    'value': Math.round(delay),
    'non_interaction': true
  });
});

Measuring Input Paint Delays in Google Tag Manager

The integration for Google Tag Manager is obviously a little bit more complex as you need to add a new Trigger and Variable.

window.dataLayer = window.dataLayer || [];
perfMetrics.onFirstInputDelay(function(delay, evt) {
  dataLayer.push({
    'event': 'first_input_delay', // Not necessarily needed if this loads before GTM is initialized.
    'first_input_delay_value': Math.round(delay)
  });
});

Create the Value:

Add it to the Google Analytics configuration, so it will be sent along either your Events or your Pageviews (really decide on this for whatever works best in your use case). In this case, I’ve added it to a Custom Dimension on a page level, but you can also easily send this to a Custom Metric to calculate averages.

Custom Reporting on Speed Metrics

When you’re using a custom metric to report on FID you can easily create a metric on a page level to show to average first input delay for a page type or template. In this case, I created an example of a report that will show this only for new visitors (who likely haven’t loaded assets like JS/CSS/Images that are cached).

Adding other speed metrics

This is just the First Input Delay that you could be adding as a speed-related metric to GA. If you do some digging and are interested in this topic I would recommend going through the rest of the events here. That will give you enough information and a similar integration to measure First Paint (FP), Time to Interactive (TTI).

All the resources on this topic:

Like I said in the disclaimer, I mostly wanted to make it easier to implement but all the documentation around how to set this up and what it entails can be found on these resources.


Diversifying Channels for Actual Growth

I’ve worked with many companies who’ve shown exceptional growth, triple digits year over year that brings them to the next levels in their industries (music, education, marketplaces, etc.). But… in some cases, it wasn’t as good as it should have been. Because the main channels that they were using were vastly too big for what they should have been. So let’s dive a bit deeper into what that means.

When 80-90% becomes a problem

For some of these top performers in their space, they all had one fundamental problem: They were far too much relying on one specific channel that was driving their growth. If one channel (which is/was often Social Media or Organic Search) is driving over 80% of your traffic and/or revenue you might be growing but you’re also in immediate danger. As you can see in the following graph, this company was growing greatly for years before this. However what they weren’t realizing is, that they were not particularly setting themselves up for SEO success. They weren’t doing anything wrong, but they also weren’t doing anything in a way that I would consider a world-class SEO program. Then Google decided to change their approach to certain sites in their algorithm and this happened in the span of a few months. They lost over 40% of traffic and with that approximately 20-30% of their business.

So ask yourself, does your site/business have a healthy divide in traffic? Have you looked at the difference in channels for new and returning visitors? Looking at it the wrong way (combined) will likely skew your approach.

What channels this applies to

  • SEO: If the majority of your traffic is coming in from SEO you have a serious problem. Remember the company that I was talking about at the beginning of this article. They saw their traffic drop with over 90% overnight! I repeat: overnight! This meant that their revenue streams itself crashed with over 80% (they weren’t solely relying on SEO at their revenue driver at the time).
  • Paid Search & Social: When you’re thinking about this channel you’re likely thinking about Google AdWords, Bing Ads and for the social channels; Facebook, Twitter, LinkedIn, etc. It makes sense, I do the same and because of that, I can’t blame you for it. But most companies aren’t even using Bing Ads, or only look at one of these channels. Even with, for example, Bing just being a few percents of your spend it will help you diversify your strategy a lot and protect you from Google changes that might hurt you in the long run. In addition, there are also tons of other networks out there which you could combine that could easily drive a few percent of your PPC spend. We looked into this recently and decided to move part of our display budget over to another platform/vendor just to ensure this, they were able to drive the same ROAS metrics and it felt safer to move a few percent of traffic to a smaller player.
  • ‘Email Marketing’: In most cases, I don’t think this is really an acquisition strategy at all. Because from what source are these people actually signing up to be on your newsletter/mailing list? Likely one of the other channels that have been mentioned in this article.
  • Social Media: Think about all the publishers who doubled down on social media a few years ago (Vice, Buzzfeed, etc.). Social media was a great driver of engagement, branding, and traffic for them. But have you noticed that trend over the last two/three years? Most of them have for sure not seen a traffic increase over that period. The reasons why: the social networks decided to change the way they ranked content and are likely less interested in sending more clicks to publishers instead of keeping them on their own platform.

In all these cases, still focus on these channels, they’re great drivers of growth for your business. Don’t rely on just one of them!

Why not other channels?

In my opinion, the problem doesn’t apply in most cases to referral traffic, affiliate marketing, and multi-level-marketing. As the majority of traffic from these is spread (it should) across many different partners and sites.

Valid Traffic Model

To marketers and founders, I would say, think more about the divide of your traffic and what is bringing in the actual revenue. Explore other channels, it’s not bad to kickstart growth on one channel (social/community, SEO, paid acquisition), in most cases I would encourage founders & startups actually to focus on this. It’s better to take a leap of faith and double down on a channel then to suck at a few channels.


Onboarding Marketers: What do you need to know or do in Month 1?

Recruiting, Hiring and building out marketing teams has been what I’ve focused on for the last years. While starting at RVshare in June 2018, my latest role, I wanted to have an impact right away and read into how to best onboard myself in a new environment. But it was also important to provide a direct impact on the rest of the organization, I was their marketing leader after all.

On the flip side, I also spend over the years a lot of time onboarding marketers on my teams and obviously have learned a lot of lessons on that front as well. That’s why today I wanted to spend some time talking about this and provide a framework for how to onboard marketers and what to look for when you’re the newest marketing hire to a team. For this blog post, I’ve only looked at the first month, in the end, you can make a lot of impact in the first 30 days on the job if you approach it the right way!

Before getting started & Day 1

  • Hardware/Software: Depending on if your newest hire will be in an office or will be hired remote it’s going to be important to figure out what they need.
    • Order a laptop
    • Order a screen and all the needed cables
  • Accounts: What kind of tools does your company use, what can you set them up with so they can immediately hit the ground running with the software and SaaS tool that your team is using on a daily basis. Some of the examples of that could be:
    • Google Account/Google Drive
    • Communication: Slack, Email, Outlook?
    • HR & Expenses: Too many vendors in this space to mention
    • Google Sites/Atlassian/Confluence/JIRA/Github
    • Video Conferencing: BlueJeans/Google Hangouts/Zoom/etc.
    • Productivity: Calendly, Grammarly, TextExpander, Monosnap, 1Password/LastPass, etc.
  • HR, Paperwork & Introductions: In most bigger companies you’ll spend a significant amount of time on the first day filling in all your paperwork, receiving your laptop and getting onboarded on the company’s IT systems and an introduction/class into the company(‘s history).

Month 1: The first 30 days of a new marketing hire

The first month is important, a good beginning will help support the future success of a new hire. That’s why I started listing out the things that I try to help out with once somebody new starts.

  • In the first week, you can only get this started, but you want to make sure that you meet as many of your direct colleagues as possible. In our case, as part of the company is remote (including myself) we try to get somebody new out to one of the offices (Ohio/Texas) to meet the teams there.
  • Plan in a regular one on one with your direct report/manager.
  • Explore all the content that is already out there in your function. Read what you need to know, but also don’t feel bad about skipping things that are likely not going to help you right away.
  • Prioritize: After 2-3 days in the first week you have an overload of new information landing in your lap. Can you figure out what information is urgent (you need to know it right now!) or what is a priority (it’s important but nobody will die if you don’t touch it today) in the short term.
  • Introduce yourself to the rest of the company and meet with other teams to get to know them and learn what they’re working on.
  • Make your first improvement/change: As a manager, I really like to make sure that in the first week a new hire has made a change that has an impact to the business. Usually, it’s a small thing that can be done, but often it’s great to show it to the rest of the organization.

  • Vendors: Time to start connecting with the vendors that you might already have in place for this function. If this is a new role it’s unlikely that you already have figured out what tools & vendors you would need in your role.
  • Industry research: Potentially you have already done a bit of research into what the industry looks like, but nothing is more interesting than getting to know the big players and all the industry facts then reading industry reports. The more context you have about the space that you’re operating in, the better.
  • For most marketers, it’s going to be important to get familiar with the Analytics infrastructure (Google Analytics, Mode Analytics, Looker, etc.) to get a good understanding of how they should be measuring their own performance, and that of a channel when they manage that.
  • Deep dives: If there was an existing team, start doing deep-dives on the work that they’ve already been doing. Otherwise, do deep dives with the other channel managers on the team.
  • Customer understanding: Have your new marketer talk to the right people internally that know everything about the customer journey and the biggest pain points that your customers are facing. At RVshare, for example, we have multiple experts that most new people talk to, to get a good understanding of how the product works in detail.
  • Your first 1-1: You’ll have your first actual one on one with your (new to you) manager. In the first weeks, it’s the right time to get any questions answered and get a better understanding of how the team around you will operate. For the manager, it’s a great time to make the goals for the role clear and guide you around the people that you need to work with.

  • Meet with the rest of the team: Depending on the size of the Marketing team you likely haven’t met the people that you won’t be working too closely with. As an example, it’s likely not the top priority for a PR manager to meet the PPC manager in the first hours of the new job.
  • Meet with all your important stakeholders: What other teams is this role working on, in most companies there will be at least overlap with roles in Product, Design, Engineering, Communications, Sales and/or Customer Service.
  • Meet with vendors, part 2: After you have spent a couple weeks on the job and have met your existing vendors you likely have a better understanding of what you would need and what you’re currently missing. That means, that it’s likely time to find some additional vendors to fill in the gaps.
  • Customer research: Have you had a chance yet to talk to customers. We’ve previously talked about the deep dives with other teams and getting a good insight into the pain points of customers. But set yourself up in some meetings/calls with customers and you likely will get a totally different insight into what goes on in the market and what their approach is to your business.
  • First results: Think about the hands-on work that you’ve done so far and the approach that you have followed. Has it brought some early results that can help drive growth for the business (short/long term)? Evaluate what you have done and make sure that you always keep this in mind, it’s a great win if you early on can show the results of your work.
  • Create a plan: After your first 30 days, you should have a better insight into what you can influence, what impact you have and what needs to be done. This plan shouldn’t be static at all and should constantly evolve over the next months on the job. But after 2-3 weeks you should have a good idea on what the low hanging fruit is to pick up.
  • Start rolling out your ideas and set everything in motion to start picking up your projects. Create a 60-90 day plan of what you’ll be working on for the longer term. It will provide the basis of what you’re doing in Month 2 & 3.
  • Write a job ‘profile’ with your manager: Something that I like to do after 30-60 days is to sit down with a new hire, and write down the responsibilities, goals, and support that they’re getting. Think of it as a longer & more detailed version of a job description in which you lay out what is expected of the role. After these many days on the job, you have a better idea because of early insights into what you.

What you have accomplished in Month 1?

A few things are important in the first month of a marketer:

  • Do they feel comfortable talking to you and their new coworkers about the problems that they face? In the end, it’s your job as a manager to make sure they feel OK with what they’re doing.
  • Do they have a better insight into how your business is operating and what your industry looks like? Have they explored the content that you can provide them with the business itself and the industry that you’re operating in?

Execute, Execute, Execute! That’s what likely the main topic will become after your first 30 days on the job. You have picked up enough knowledge to start to become useful for the organization and you can actively contribute to projects that are already running or that you will start up yourself.

Overall, this list isn’t complete yet. I’d like to keep it up to date with all the new learnings that I see with my new hires and myself over the next years. With that, I’ll try to update this post as much as possible to hopefully provide a useful resource for other hiring managers and/or marketers.

What happens after Month 1?

You’ve done a great job so far and already made real progress that will help the team. Time to keep on doing what you’re doing and have an even bigger impact on our progress/growth! Many things on the role will change and you will more continuously evolve. Hopefully, in a future blog post, I’ll shed some light on the second and third month in a new role. As you’re likely not fully ready to operate independently and understand the organization in that period. Food for thought for another blog post in the future!

Other resources

Some books that I have read over the last years that I think are great in this specific use case.

    • The First 90 Days: It’s a classic book, but does its work.
    • The CMO Manifesto: It’s focused on higher level marketing executives, but at the same time it also touches on a lot of areas that are applicable to people that are starting in a new role.