Building a Templated Approach to SEO

SEO, in my opinion, is in many ways about scale. You want to rank for as many relevant keywords as possible as content is the leading factor in achieving that scale. Because let’s face it, who can rank for thousands of keywords with one blog post or piece of content? You want to ensure that ‘content’ can be positioned in such a way that it’s scalable and can do such things. This is where the templated approach for SEO comes in. Whether it’s about optimizing product descriptions or creating filtered pages, it’s often the way to go to target many (both head terms and long-tail) keywords simultaneously.

Getting Started

Step 1: Keyword Research

Especially for industries with lots of search volume, the big topics arise quickly. Quick initial keyword research provides an insight in most cases on where to start with creating templates, while at Postmates, it wasn’t rocket science to figure out that the most popular keyword themes were around:

  • {food category} delivery, for example, sushi delivery, Chinese food delivery.
  • {food category} {city name} delivery, example: burger San Francisco delivery
  • {food category} near me, example: sushi delivery near me

Note: There were many more other categories, but as the ambition was to compete for crucial high volume head terms, this is certainly where we got started.

Step 2: URL Structure

Decide on your URL structure, this isn’t too complicated either. But often, I’d recommend laying out what you’re planning on long-term with other projects as well so that specific projects don’t create overlap. You want to avoid that certain pages will be cannibalized over time by other terms that are served on similar URL structures.

In the case of our examples, we used:

Step 3a: Engineering Briefing

  • Internal Linking: Probably one of the most important pieces of a templated approach as you want to make sure that enough other pages are 
  • Headings, Titles & Content: Most of the content, including headings and titles, will be pre-formatted in a certain way so that it can be easily replicated across all the pages that you’re going live with. Usually, templated pages contain listings (restaurants, homes, products, you name it).
  • Random tags
    • META Description: Obvious, should this be templated, or do you have a way to write this manually for many pages?
    • META Robots & Canonical: Likely always has a default, although you want to override this information in some cases.
  • XML & HTML Sitemaps
    • Build an XML sitemap for all the URLs that are part of your templated approach.
    • HTML Sitemap: Is your overall structure big enough for it to need an HTML sitemap? For a big rollout of potentially thousands or more pages, you might want to think about doing this.
    • Robots.txt: Add the sitemap to the robots.txt, and don’t forget to list it in the XML sitemap index file.

Step 3a: Content Briefing

What type of content do you want to show up on the page? What should the headings and titles be? As you likely have to start on this with a templated approach as well you need to make sure that you can rely on a Content team to take care of this. Depending on the briefing this can be a longer or shorter briefing to gather the right assets.

Step 4: Launch & Iteration

Launch early & often is often my strategy around SEO so that it can be picked up as soon as possible by search engines. The same applies to the templated approach, getting them indexed is the first priority after which you can worry about optimized crawl patterns and getting them to rank higher in the search engines.

Examples

The templated approach isn’t new for many SEO teams in a B2C world, as it’s common to have a large inventory of products or services that can be segmented in many ways. In B2B, the approach is a little less common (but still often a good practice). Where I often see strategies fail around templated approaches is the entire focus of teams on content creation this way versus writing blog posts. They work hand in hand, but often, the long-term value lies more towards templated pages than content as it’s more scalable.

The caveat, a templated approach for smaller sites might take longer to perform than an actual blog post, for which you can right after hitting publish, start racking up visits via other channels (social, email) as well.

Let’s look at an example of good templated pages that have kept SEO in mind:

The Things to Do Places approach of Airbnb (especially for Places) is a great example (previously part of the Neighborhoods project, I believe) of a templated product approach that consistently works for SEO. While in their case, I bet it’s not just made for SEO, it does have a lot of great features that are easy to call out as being important for SEO.

  • Content Structure: It’s clear what the essential headings & titles on the page are.
  • Internal Linking: Take a look at how it collects all the relevant links to Places to Stay, Experiences, etc. There is a reason why they’re there.
  • Find the other ones yourself ;-). It was always a great learning exercise for me to dive into category pages and figure out why every little element on the page is there and how it could add value for SEO.

How to do this yourself?

Are you running an e-commerce store, are you a media company, or are you a marketplace? Chances are, you’re already more than likely set up to support this within your existing business model and organizational structure. So give it a shot if you haven’t explored this and let me know how it’s working out for you!


What’s next after SEO?

Often questions that industry peers ask me resonate with me right away as they’re focused on where they are in their career, where they should go, my role over the past years, and how that has shaped what I do today.

Lately, I got asked the following questions, and I thought it would be good to go a bit deeper into it:

  • How did you get out of SEO, and what skills do you need to get there?
  • How did your previous SEO roles get you where you are today (a marketing leadership role)?

SEO is a career for many. Many (top) industry experts have climbed the career ladder on either the agency or in-house side and are at the top of their field. When I left Postmates, I knew that I didn’t want to go back into yet another SEO role, though. For me, the diversity of additional acquisition channels and marketing functions was more interesting than diving into another SEO playbook. But that’s not the case for everyone, and I want to highlight that this is my personal decision. Lately, I’ve hired for an SEO role (once) again, and it’s great to see so many people build their careers that way. In this post, I want to spend some more time giving an insight into what I think could help diversify skills to help in other areas of Marketing & Growth.

How did you get out of SEO?

  • Build a more diverse skillset. In most cases, SEOs already have to build up additional skills that can also apply to other functions. Think about: web analytics, technical skills (coding), copywriting, understanding of search behavior, CRO. You’re likely not an expert in them right away, but knowing what’s out there is just as important. It’s a matter of building up the T-shaped skills that many have written about before (This post from Rand Fishkin many years covers the concept.).
  • IC versus Management route: How do you want to grow up as a professional in your career, and when do you make those choices? None of the two options is right or wrong as they can both get you there. In many cases, after leading teams, you can also go back to being an IC again.
  • The rise on the Management track: While you grow up as a Manager, if you chose that route (not saying it’s the best route to pick), you eventually lead more people. From being responsible for one function, you can end up leading many as an executive (either on Growth or Marketing).
  • Networking: Regardless of the route you take, building out your network is always valuable. Having moved across the globe once (NL > USA), it made it even more evident that when you get somewhere new, you sometimes have to start from scratch again and can’t entirely rely on your existing network.
  • Size of Organizations: What organizational size suits you best? If you want to grow into bigger companies you might sometimes be better off picking a smaller company first and building out your skills there. The opposite works as well, big companies can give you the insight into how things operate at scale with many specialists being on your teams. If you then move back to a smaller type of organization it can help you get a better sense of what to focus on as well as you likely need to wear more hats.
  • Stage of an Organization: As I’ve spent all my career working in tech companies there is a big difference if your company is in the Seed stage versus past Series C. Teams are bigger, responsibilities are different, for not just SEO roles but for many of them. You’re working on small or long term goals depending on the stage. But it also helps you level set what is important for a career after SEO. In a company that is smoothly transitioning through the proper stages you will likely have the ability to adjust your own role over time. When I was at TNW the different stages of the company enabled me to start a marketing organization and help grow the business. Similar things have applied to my current role at RVshare which is quite different from how I started three years ago.
  • Performance versus Brand Marketing: I’ll take the stance that SEO clearly helps with performance marketing, it certainly can help brands as well but in most cases you’re going after business value by chasing keyword segments and intent from users that leads to transactions.

What skills to build? It’s up to you and the route you want to take. Leave a comment with insight into how you are making your skillset and career for a potential exit out of SEO.


What books am I reading in 2021?

For the last five years, I wrote blog posts (2020, 2019, 2018, 2017 & 2016) listing the books that I read in the past year and that I wanted to be reading during that year. It was a good year for reading. I added many books to my list during the year, read some unexpected ones (4 about pregnancies and babies, who would have thought!?).

This year (2021) will be slightly different as I expect to read a bit less than 2020 (where I hit over 30 books). As we welcomed our daughter into the world in December, I likely can spend less time reading (I also rather spend time with her). So let’s jump into things…

What books I didn’t get to in 2020 and have re-added to the list for 2021:

My favorites from 2020:

  • HBR Strategic Thinking: Being an executive requires me to put more and more time aside to think about where a business/industry and organization is heading. Besides that, I always really like the format of HBR books with concise articles that quickly get to the core.
  • No Rules Rules: Netflix and the Culture of Reinvention: I’ve always been a bit skeptical about the ideas around the culture at Netflix from reading some previous posts. But this book very much surprised me, and I found myself agreeing with tons of the content. I would highly recommend this one to leaders/founders that want to improve their culture.
  • The McKinsey Way: Something I wanted to learn more about this year was consulting companies (not for a particular reason, I’m also not becoming one anytime soon). Reading two books about how McKinsey approaches their practices and sees the world was a fascinating insight.

What books I’d like to be reading in 2021

Next year will be a mix of books on Marketing, investing, and personal development. Let’s see how many books I’ll get to realistically.


Leave your recommendations via @MartijnSch as I’d love to know from others what I should be reading.


Deciding between who to hire: an Agency versus a Contractor versus Hiring?

While you’re scaling the efforts of your team, you’re running into bottlenecks as you grow. The faster you go, the more often you lack the resources to add new initiatives or improve existing channels & functions. Time after time, you find yourself identifying the gaps in your marketing organization (or others) trying to figure out how to stitch those problems. In the end, it’ll likely come down to the answer: you need more people/skills/experience/knowledge/time to go faster.

A few weeks ago, Rand Fishkin posted a similar blog post on the topic of Why You Should Hire Agencies & Consultants (for everything you can). As you could already read in his blog post, as it mentions the tweet that I replied to, it was a topic that resonated with me. I also had a similar past to Rand in which we both, it seemed, chose the hiring (FTE) route often over finding agencies or contractors.

I’m not going as far as Rand by suggesting that you shouldn’t hire. In many cases, in my opinion, this is the right answer. But there is more out there, like agencies, consultants, interim, crowdsourced tools that could help you fulfill the same needs.

This also came to mind during the process that we went through at RVshare leading up to the investment by KKR (read more about that here) a few months ago. One of their advisors asked this specific question while discussing our marketing strategy:

“To scale this function, would you outsource the execution or hire internally?”

There is no right or wrong answer to any of this, as it all depends on the situation you find yourself in as a manager/executive. What all strategies have in common is that they require more resourcing. You have a need for it that you currently can’t fulfill with the (extended) team that you have.

My experiences

At the past companies that I worked for, there was always a slightly different strategy. At The Next Web, we hired people and filled the execution gaps with interns in certain periods (the system for interns works differently in most of Europe than the US as they can support you throughout the whole year where the majority of internships in the US take place in Summer). At Postmates, at the time, it was different, and the focus was primarily on hiring in-house (senior) experts as there wasn’t too much time to train people as the company was blitzscaling.

🌍 & 🌎 – Europe versus the United States

When the question got asked at the beginning, a few thoughts came to mind. I’ve been working and living for close to four years now in the US and previously for many years in Europe. As the US is a bigger country with a different educational system and different wage ranges (even across the US), the approach is often different. Some topics that came to mind about the differences:

  • Interns: I touched slightly on this, but Europe’s system enables to train young people more easily throughout the whole year as most educational setups have year-round periods for internships.
  • Wages: In general, wages are much higher in the United States than they are in Europe. This sometimes causes just issues in hiring, where you could hire somebody for a similar role in Europe for 70K that same person might cost 100K in most of the United States (with exceptions reaching much higher).
  • Experience at Scaling: There are different approaches to this. In the wider Bay Area, more people have grown up in a tech ecosystem that has shown them how big tech companies operate. As Europe, in general, is a bit behind that it sometimes impacts how they can operate at scale.

Again, this is not me judging Europe or the United States to be better. They both have a place in the overall ecosystem of hiring and extending your resources.

What’s the right approach? What to consider?

  • Short versus Long term needs: For short term needs like a copywriting project, designing a slide deck, creating an explainer video, you can’t convince me easily that they’re worth hiring for. You won’t find all those skills in one person, so it makes more sense to hire.
  • Cost: Let’s face it, the costs of a contractor/agency are higher right away, but don’t forget about all the additional costs an FTE brings with them (insurance, travel, office in a non-COVID world).
  • Depth of the Bench: Many sports teams have outstanding players sitting on the bench; this is a huge upside of agencies, for example. They often have well-trained teams that already have experience working with similar clients ready to roll directly onto your team and help out with efforts. Especially in functions like media buying, PR, and many creative services, I’m having a hard time seeing how you would be able to defend hiring for those positions solely internally.
  • Specialist versus Generalist: For smaller startups, it’s not always possible based on costs or the skillset to hire the right person right out of the gate. It’s the reason why many startups take off with a bunch of generalists and, while they grow, start adding more specialists to their teams. For example, I myself used to be a specialist as well (search and analytics). Over time while moving up the ladder, I became more of a generalist (welcome to executive life) than a specialist. For some roles that you’re looking for, it means that you might be better off with a consultant as they can provide the specialist skills that you’re not ready to hire for (just yet).
  • Range of skills / Many Hires: As a follow-up on this, what you face as well as your scale is that you have a range of needs that even a specialist in an area can’t solve for you. This is usually where agencies come into play as they have a range of skills available for you usually in the mix of 1 FTE. I’ve blogged many times before about our working relationships around Analytics. We use Marketlytics there as part of our setup as they know their stuff incredibly well and have many skills on the team (analyst, engineer, project manager).
  • Scale fast: Hiring is slow. There is a reason why big organizations sometimes have hundreds of different roles open at the same time. They just can’t hire people fast enough. This is mainly a problem at the top of the funnel. You don’t know enough people or can’t reach them quickly enough. It’s one of the reasons why you should always be talking to people to get them potentially interested in joining your company long-term. So consultants/contractors could be your temporary fix as they can usually provide a quick specialist approach to your needs. In addition, if you need to prove a business case, they can provide temporary support.
  • Tunnelvision: It’s surely is a thing. If you’ve been staring at the same problems for years and working with the same people for a while, you likely get stuck in this. A new fresh pair of eyes or agency team probably has a different approach that could help bring additional growth.

What am I missing? What are the areas that you prefer to hire against trying to find an agency or consultant? Leave a comment so we can discuss it. This will likely be one of those blog posts that I’ll keep up to date over time as I learn new things.


The top metrics & KPIs other teams care about for SEO

SEOs care about many metrics, often the wrong ones (rankings, DA, PA, you name it, etc.). What is often forgotten are the metrics that are important for the other teams/departments within their organization. In the end, building a company isn’t just done by one company. Over the years, it’s been clear that you work with many departments simultaneously on the same effort, and often they care about your channels’ metrics. Just not always the same ones, so in this blog post, I wanted to shine some additional light on what metrics you should think about for other departments. It’s the followup to this tweet that got quite the attention, and this post gives me the ability to go a bit more into depth on the whys?

This list is likely incomplete and is using some generic names. Your organization might have different names or have additional departments that might not be covered here. Hopefully, this gives you a better insight into how to think about various departments related to SEO.

🏢 C-Level

Depending on what type of organization you work in and how broad your C-suite is, you are likely to report at some level into a COO/CMO that cares about the SEO metrics. But often in 100+ person companies, they don’t have the depth anymore to really deep dive into the SEO cases that you’re facing within an SEO team on a day-to-day basis.

Metrics:

  • Revenue, average order value (this number should in most use cases not be too much different from the performance of other channels), and the number of Transactions.
  • Sessions from organic search as an absolute number but also the percentage of total traffic. Primarily the latter as you want to keep a healthy/diverse balance for your marketing mix. Something that I blogged about before.

🛠 Product

What is Product building that you can benefit from, and how are you working with Product to prioritize the most important changes to the product to drive additional growth from organic search. No product is finished so there is always something that you can help prioritize from an SEO point-of-view.

Metrics:

  • Load time: There has been enough buzz about the importance of site speed for good reason.
  • Number of Pages per Template
  • Growth in Sessions
  • Best Performing Page Segments
  • Conversion Rate from Organic Search, etc.

Not necessarily in that order, but usually, metrics that are impacted with/by the Product organization.

💻 Engineering

Sitespeed, code velocity, sitespeed and load times. Well you get the point. It’s all about how fast the site is and how quickly you can work with an engineering team to get changes that you want fixed implemented.

Metrics:

  • Load times/site speed, traffic to specific sections of the site.
  • Velocity of tickets/items that you want Engineering to implement.

💲 Finance

Metrics that show the potential for growth and the return on investment. In the end, in many companies, Finance is the gatekeeper of money flowing in and out. They want to get a better insight into what you’re spending and how that eventually contributes to the bottom line. Providing a simple version of a P&L for SEO will likely return a happy smile if you’re able to produce that.

Metrics:

  • ROI % (how much have you spend on SEO resourcing: team, tools, other expenses for content) versus what will it return
  • Budget Spend, Returned Revenue, and future growth.

🖼  Marketing

Likely your closest allies in the ‘battle of SEO’ together with Product. Depending on the organizational structure, you probably find the SEO team itself here or in Product. So having enough impact on the metrics that your marketing team cares about is important.

📝 Content: Do you have a separate content team? They’ll likely care about the organic traffic coming to their pages, and they should care about the impact on those business metrics too. Besides that, any insight into specific keywords (volume, CTR) is always useful for a team like this to help optimize existing content.

Metrics:

  • Impact on branded search terms: sessions.
  • Increase/decline so you can measure the uplift of other brand awareness campaigns.

🧳   Sales & Business Development

At what scale are you still able to set up partnerships and does that actually fit into the scope of SEO at scale? Likely the answer is, no. That’s why you want to partner with a sales/biz dev team that can help you solidify partnerships and companies to work within your space. They have better skills and you can likely provide them meanwhile with more useful input on who to go after.

Metrics:

  • The number of big partnerships.
  • A shortlist of partners that you want them to go after, not just for dumb link building (preferably not, in my opinion). Instead, create lasting relationships that impact the industry, TAM (Total Addressable Market), and market presence.

📞  Customer Service

The better, faster, and more quickly you can answer your customers’ questions likely the better your business will thrive in today’s environment. Often this means that providing the answer directly in search (think featured snippets). You can’t just do this alone as an SEO team, you need the input from people in Customer Service, they’re the ones talking to your customers about the (mainly) negative and positive situations. The more you can support them with the metrics that they care about, the more comfortable both your lives might become.

Metrics:

  • Organic Traffic to Support related pages, the number of calls/chats that you avoid by better-optimized pages, etc.
  • The top questions that you can answer directly via featured snippets.
  • The top 100 pages on your support portal, based on organic search segmentation.

📈 Growth

When I was on the Growth team at Postmates the insane velocity that was produced there to grow faster was great to see. As SEO isn’t the fastest-growing channel often (especially not in the short-term, as I can throw 1M towards PPC tomorrow and create near-instant results) it’s important to show how it’s attributing to the mix of long and short-term initiatives of a growth team.

Metrics:

  • Growth % of the SEO channel, compared to MoM, WoW or YoY.
  • Long term contributions of growth, as lots of SEO growth is evergreen and at relatively low costs.

👩‍💻 Human Resources

Admittedly, this one is one of the most distanced departments from just SEO, but if you’re a big organization and recruiting for dozens or even hundreds or roles, how important it could be to help drive traffic to a Careers/Jobs section on a site. If that’s the case showing the importance of driving job applicants could be incredibly helpful to help understand what SEO can do for them.

Metrics:

  • The number of job applicants that applied because they found the jobs via Search.
  • Traffic to a specific segment of the site from Organic Search: Careers/Jobs.
  • The number of pages marked up properly with structured data for Jobs.

What metrics are missing? What do you measure for your organization? There are so many different business models out there that likely this list is far from complete for; for example, B2B cases are likely to be missing here.


Saving Bing Search Query Data from the Bing Webmaster Tools’ API

Over the last year, we spent a lot of time working on getting data from several marketing channels into our marketing data warehouse. The series that we did on this with the team has received lots of love from the community (thanks for that!). Retrieving Search Query data from Bing has proven to be one of the ‘harder’ data points: there is a lack of documentation, there a no real connectors directly to a data warehouse, and as it turns out the returned data (quality) is … ‘interesting’ to say the least. That’s why I wanted to write this blog post, to provide the code to easily pull out your search query data from Bing Webmaster Tools and give more people to evaluate their data. Hopefully, this provides the overall community with a better insight into the data quality coming out of the API.

Getting Started

  1. Create an account on Bing Webmaster Tools.
  2. Add & Verify a site.
  3. Create an API Key within the interface (help guide).
  4. Save the API Key and the formatted site URL.

The code

These days I spent most of my time (whenever I get to write code) coding in Python, that’s why these.

import datetime
import requests
import csv
import json
import re

URL = "https://example.com"
API_KEY = ''

request_url = "https://ssl.bing.com/webmaster/api.svc/json/GetQueryStats?apikey={}&siteUrl={}".format(API_KEY, URL)

request = requests.get(request_url)
if request.status_code == 200:
    query_data = json.loads(request.text)

    with open("bing_query_stats_{}.csv".format(datetime.date.today()), mode='w') as new_file:
        write_row = csv.writer(new_file, delimiter=',', quotechar='"')
        write_row.writerow(['AvgClickPosition', 'AvgImpressionPosition', 'Clicks', 'Impressions', 'Query', 'Created', 'Date'])

        for key in query_data["d"]:
            # Get date
            match = re.search('/Date\\((.*)\\)/', key["Date"])

            write_row.writerow([key["AvgClickPosition"] / 10,
                                key["AvgImpressionPosition"] / 10,
                                key["Clicks"],
                                key["Impressions"],
                                key["Query"],
                                datetime.datetime.now(),
                                datetime.datetime.fromtimestamp(int(match.group(1)) // 1000)])

Or find the same code here in a Gist file on Github.

Steps to take

  • Make sure you have all the needed dependencies installed: json, re, requests, csv.
    • pip install requests json re csv
  • Run the script: python bing_query_stats.py and enter the API Key and Site URL in the constants at the top of the script.
  • If everything is successful the information is saved in this file: bing_query_stats_YYYY-MM-DD.csv

Data Quality

As I mentioned in the intro, the data quality is questionable and leaves very much up to the imagination. It’s one of the reasons why I wanted to share this script, so others can get their data out and we can hopefully learn more together on what the data represents. The big caveat seems that the data is exported at the time of extraction with a date range of XX days and it’s not possible to select a date range. This means that you can only make this data useful if you save it over a longer period of time and based on that calculate daily performance. This is all doable in the setup we have where we’re using Airflow to save the data into our Google BigQuery data lake, but because it isn’t as straight forward this might be harder for others.

So please share your ideas on the data and what you ran into with me via @MartijnSch


Case Study: How Restructuring 6800 Content Pieces For SEO Worked

I presented the content in this blog post about a week ago for the Traffic Think Tank community (highly recommend it), but after a Twitter thread on this topic as well, it’s time to turn it into a blog post.

Sometimes you have to take a stand and make something better when it’s already performing well. Over the last months, the RVshare marketing team worked on some great projects; one of them that I was involved in was restructuring 6800 pieces of content that we created a while ago. The content and pages they were on were performing outstanding (growing +100% YOY without any real effort), but we wanted to do more, to help users and boost SEO traffic. So we got started…

Why restructure content?

A couple of years ago, we published the last WordPress page/post in a series of 600+, the intent: go after a category near and dear to the core of the RVshare business: help more people rent an RV. We did that by creating tons of articles specifically for cities/areas. Now over two and a half years later, the content is driving millions of people yearly, mainly from SEO, but we knew that there was more as it’s not our core business. We also weren’t leveraging all the SEO features that have become available since two years ago, think about additional structured data like FAQs but also monetization that we thought was important. All improvements that we had to go back into every post for if we wanted to take advantage of it.

What we did, leveraging Mechanical Turk.

One of the biggest obstacles wasn’t necessarily rebuilding pages, coming up with a better design, etc. WE have a great team that is nailing this on a daily basis. But having to deal with 650 posts that contained ten sub-elements itself was a struggle. The content was structured in a similar way but some quick proof of concepts identified that scraping wasn’t the solution as the error ratio was way too high as with most projects we wanted to ensure that the content could be restructured at low costs not to avoid this project not having a valid business case (does the actual opportunity outweigh the potential costs to restructure the content?).

Scraping versus Mechanical Turk

As we had initially structured the content the same way: headline, description, etc. we were able to have at least a way to get the data out. When we did some testing to see if we would be able to scrape it looked unfortunate, there were too many edge cases as the HTML itself around it was barely structured enough to get the actual content out of it.

We looked into Mechanical Turk as the second option as it gave us the ability to quickly get thousands of people on a task to look at the content and take out what we needed. We wrote the briefing, divided the project in a few chunks, and within 10-12 hours, we had the content individualized per piece. We did our best to deal with most of the data cleaning from the workers directly in the briefing and form but also had some cleaning scripts ready. After it was cleaned, we imported the data into our headless CMS Prismic.

How to do this yourself?

  1. Create an account on Mechanical Turk.
  2. Create a project focused around content extraction.
  3. Identify what kind of content you want individualized, it works best if there is a current structure (list format, table) that can be followed by the Turks. This way, you can tell them to pick up content piece X, Y, Z, for a specific URL.
  4. Identify the fields that you want to be copied.
  5. Upload a list of URLs that you want them to cover and additionally the # that it has on the list.
  6. Start the project and verify the results.
  7. Upload the data automatically back into your CMS (we used a script that could directly put the content as a batch into our headless CMS Prismic.io)

Rebuilding

We decided to build the content from the ground up, which meant:

  • Build out category pages with the top content pieces by state.
  • Build out the main index page with the top content from all states.
  • Build the ability to showcase this content on all of our other templated pages across RVshare.

By building out the specific templates, it gave us additional power to streamline internal linking, create better internal relevance, build-out structured data but mainly figure out a right way on how to leverage a headless CMS with all its capabilities instead of just having raw (read: ‘dumb’) content that can’t be appropriately structured. We already use the headless CMS Prismic.io to do this, in which you can create custom post types, as you see in this screenshot. You define the custom post type and can pick the kind of fields that you want, which turns itself just another CMS after that. The content can then be leveraged through their API.

How to do this yourself?

We were previously leveraging WordPress ourselves, but all entities were saved as 1 post. If you’re able to do this differently and save pieces individually it’s many times easier to create overview pages by using categories (and/or tags). This is not right away something that you can always do without development support.

Results

Because of the design changes, engagement increased with over 25% because of the new format. Monetization is making it more interesting to keep on iterating on the results. Sessions were unfortunately really hard to measure we launched the integrations a few weeks prior to the kick-off of COVID-19 resulting in a downwards spiral and a surge in demand right after. Hopefully, in the long-term, we’ll be able to tell more about this. We are sure though that we didn’t suffer on SEO results.


Want to see the new structure of the pages? You can find it here as our effort on the top 10 campgrounds across the United States.


Part 5: Airflow on Google Cloud Composer – Building a Marketing Data Lake and Data Warehouse on Google Cloud Platform

In the previous blog posts (part 1, part 2, part 3, and part 4) in this series, we talked about why we decided to build a marketing data warehouse. This endeavor started by figuring out how to deal with the first part: making the data lake. In the fourth blog post, a more technical one, I’ll give some insights into how we’re leveraging Apache’s Airflow to build the more complicated data pipelines, and I give you some tips on how to get started.

This blog post is part of a series of five? (maybe more, you never know), in which we’ll dive into the details of why we wanted to create a data warehouse, how we created the data lake, how we used the data lake to create a data warehouse. It is written with the help of @RickDronkers and @Hussain / MarketLytics, who we’ve worked with alongside during this (ongoing) project.

Getting Started with Cloud Composer

Cloud Composer is part of Google’s Cloud Platform and brings you most of the upside of using Apache Airflow (open source) and barely any of the downsides (setup, maintenance, etc.). Or to follow their main USP: “A fully managed workflow orchestration service built on Apache Airflow.” While we had worked with Airflow before, we weren’t looking forward to spending time having to worry about its management as we planned to spend most time setting up and maintaining the data pipelines. In the end, then you have to stick to create pipelines (DAGs).

What is it suitable for?

You want to load data from the Google Analytics API, store it locally, translate some values to something new, and have it available in Google BigQuery. However, you would build it; it’s multiple tasks and functions that are depending on itself. You wouldn’t want to load the data into BigQuery when the data wouldn’t have been cleaned (trash in, trash out sounds familiar?). With BigQuery, the next task is only being processed if the previous step was successful.

Tasks

Tasks are in almost every case; just one thing: get data from BigQuery, upload a file from GCS into BigQuery, download a file from Cloud Storage to local, process data. What makes Airflow very efficient to work with is that the majority of data processing tasks already have pre-built functions. The first three tasks that I listed here are operators (GoogleCloudStorageDownloadOperator, GoogleCloudStorageToBigQueryOperator) that operate as functions.

Versus Google Cloud Functions

If you mainly run very simple ‘pipelines’ that only exist of 1 function that needs to be executed or have only a handful use cases, it is likely overkill to leverage Cloud Composer; the costs might be too high, you still have overhead with DAGs. In that case, you might be better off with Google Cloud Functions as you can write similar scripts that will enable you to also trigger them with Google Cloud Scheduler to run at a specific time.

Costs

The costs for Google Cloud Composer are doable, for a basic setup, it’s around 450 dollars (if you run the instances 24 hours * 7 days a week) as you leverage multiple (a minimum of 3) small instances. For more information on the costs, I would point to this pricing example.

Building Pipelines

See above an example data pipeline, in typical Airflow fashion every task is depending on the previous task. In other words: notify_slack_channel would not run if any of the previous tasks would fail. All tasks are happening in a particular order from left to right. In most cases, data pipelines become more complicated as you can have multiple flows going on at the same time and combining them at the end.

Tips & Tricks

Google Cloud Build, Repositories

The files for Google Cloud Composer are saved in Google Cloud Storage. Which is smart in itself, but at the same time, you want them to live in a Git repository so you can efficiently work on it together. By leveraging this blog post, you’re able to connect the Cloud Storage bucket to a repository and set up a sync between the two. This will help you build a deployment pipeline basically and make sure that only production-ready code from your master branch ends up in GCS.

Managing Dependencies

After working with it for a few months now, I’m still not sure if managing dependencies through Google Cloud Composer is a good or bad thing, as it creates some obstacles if you want to run a deployment and want to add some Python libraries (as your servers could be down 10-30 mins at a time). For other setups, this usually is a bit more smooth and creates less downtime.

Sendgrid for Email Alerts

One of the upsides of Apache Airflow is that it sends alerts upon failure of tasks. Make sure to set up the Sendgrid notifications while you’re setting up Google Cloud Composer. This will be the most straightforward way of receiving email alerts (for free, as in most cases, you shouldn’t get too many failure emails).

README

Document the crap out of your setup and DAGs. When I took over some of the pipelines that were used at Postmates for XML sitemap generation it was a nightmare, it was hard to read, the code didn’t make a lot of sense, and we had to refactor certain things just because of that. As sometimes pipelines (just like regular code) can be left untouched/unviewed for months (as they literally sometimes only have one job) you want to make sure that you come back and understand what happens inside the tasks.


Again… This blog post is written with the help of @RickDronkers and @Hussain / MarketLytics who we’ve worked with alongside during this (ongoing) project.


Part 4: Visualization with Google DataStudio – Building a Marketing Data Lake and Data Warehouse on Google Cloud Platform

In the previous blog posts (part 1, part 2, and part 3) in this series, we talked about why we decided to build a marketing data warehouse. This endeavor started by figuring out how to deal with the first part: building the data lake. In the fourth blog post, we’ll chat about how we are visualizing all the data we saved in previous steps by using Google DataStudio.

This blog post is part of a series of four? (maybe more, you never know), in which we’ll dive into the details of why we wanted to create a data warehouse, how we created the data lake, how we used the data lake to create a data warehouse. It is written with the help of @RickDronkers and @Hussain / MarketLytics who we’ve worked with alongside during this (ongoing) project.

How we build dashboards

Try to think ahead about what you need: date ranges, data/date comparisons, filters, what type of visualization. This will help you build a better first version right away as it gives you the ability to have a good version right away. What that mainly looked like for us:

  • Date ranges: The business is so seasonal that our Year over Year growth is most important for RVshare, and since we often don’t get to see all the context on metrics on a weekly basis, we default to 30 days.
  • Filters: For some channels (PPC, Social), it’s more relevant to be able to filter down the data on a campaign or social network level. Because in most cases the aggregate level doesn’t tell the whole story right away.
  • Visualization: We need the top metrics: sessions and revenue in view right away with the comparison YoY right away so we know within seconds what is going on and how that can improve things.

Talking to Stakeholders (Part Deux)

In the first blog post, we talked about connecting with our stakeholders (mainly our channel owners) and gathering their feedback to build the first initial versions of their dashboarding (beginning with the end in mind). We used this approach to put the first charts, tables, and graphs on the dashboards after which we connected back again with the owners to see what data points were missing and in some cases to validate the data that they were seeing on their dashboards. This helped us get additional feedback for fast follows and made for quick iterations on data that we had and could also show. For social media, as an example, it turned out that we wanted to show additional metrics that we hadn’t thought of initially but were in our data lake anyway. These sessions provided a good way for us to build additional pieces into our data warehouse while we were at it. These days some of these dashboards are used weekly to report to other teams in the organization or used within the team itself.

Best Practices

Blended Data

Do you want to add this to Google DataStudio, or do you want to create synced/aggregate tables in BigQuery? For most of our use cases, we have opted for using DataStudio to create JOIN blended data sources. It’s easier – we have the ability to quickly pull some new data together versus having to deal with the data structures and complicated queries. In some use cases, we noticed that we were missing data in our warehouse tables (not the lake) and were able to make adjustments/improvements to them by creating dashboards.

Single Account Owner

Because we work with Rick and Hussain as ‘third parties’, we opted for using 1 shared owner account, transferring owner access is incredibly hard when it’s a Google Apps account so we made sure that the dashboards are owned through an @rvshare.com account, it’s not a big topic but could cause tons of headache in the long term.

Keep It Simple St*pid

Your stakeholders probably have the desire / and the time to look at less than you think. Instead of having them jump through too many charts start simple and then add based on feedback if they want to see more, less is more in this case.

This has added benefit of them feeling engaged and more interested in using it. In our own use case, we leverage our reporting on a weekly basis for a team meeting, which makes it already a more often leveraged us case.

Calculated Fields – Yay or Nay?

As we made most of the tables that we leverage in DataStudio from scratch during our ETL process, we had the opportunity to decide if we wanted to leverage calculated fields in BigQuery or if we do the work in the queries itself. Honestly, the answer wasn’t easy, and as we made modifications in the dashboards, it became clear that having them set up in DataStudio wasn’t always scalable and easy as with data modifications or changes in tables they are removed.

Google BigQuery

Tables or Queries? In our case, we often used the table information from BigQuery and the specific columns in there to drive the visualization in DataStudio. The alternative for some of them is that we directly query the data in BigQuery, with the BI Engine reservation that we have in there we can speed up intense queries rather easily.


Again… This blog post is written with the help of @RickDronkers and @Hussain / MarketLytics who we’ve worked with alongside during this (ongoing) project.


Part 3: Transforming Into a Data Warehouse – Building a Marketing Data Lake and Data Warehouse on Google Cloud Platform

In the previous blog posts (part 1 and part 2) in this series, we talked about why we decided to build a marketing data warehouse. This endeavor started by figuring out how to deal with the first part: building the data lake. We’ll try to go a bit more into detail on how you can do this yourself in this post in which we transformed our marketing data lake into an actual data warehouse.

This blog post is part of a series of four? (we found enough content to add more articles ;-)), in which we’ll dive into the details of why we wanted to create a data warehouse, how we created the data lake, how we used the data lake to create a data warehouse. It is written with the help of @RickDronkers and @hu_me / MarketLytics who we’ve worked with alongside during this (ongoing) project.

The Process of Building a Data Warehouse

In our endeavor of building a data warehouse, we had a couple of big initiatives that we first wanted to get done. We needed some reporting and visualization tables and aligned with that, we needed to make sure that we could have data that was cleaned for other purposes (deduplication, standardization: some typical ETL problems).

In order to streamline the processes, we used three different ways of getting the data streamlined: 

Google Cloud Functions

Google Cloud Functions is used for both transforming our data as loading our initial data for a few use cases. Early on we noticed that not every vendor was available through regular data loading platforms, like StitchData. An example of that was Google Search Console, as we didn’t want to have the need to run additional infrastructure for just dealing with Load scripts we leveraged Cloud Functions to run a daily script (with support from Scheduler to make them daily).

After loading the data we also Transform some other tables from our marketing data lake to new production tables using Cloud Functions. All of our scripts are currently written in Python or Node.js, but as Cloud Functions makes it possible to deal with multiple languages it provides us with the flexibility to leverage others over time.

Backfill: As Functions can easily be rewritten and tested within the interface, it also provides us with a good way to backfill data easily as we can easily adjust the dates that a script needs to run.

Scheduled Queries

In some other cases, we can also leverage Google BigQuery’s scheduled queries. In a few instances, we just want to load the data from raw data lake tables into a production table. Mainly because we don’t always need all the columns, we can limit our data drilling and be able to clean the data in the query itself. In that case, scheduled queries can come in pretty handy as they run on a certain schedule, can be easily updated, and already point towards another data set and table.

Airflow

For more complicated data flows we’re currently using Airflow via Google Cloud Composer. Cloud Composer, as we mentioned in a previous blog post, enables us to not have to worry about maintaining the Airflow infrastructure but gives us all the other upside of it. This gives us the ability to focus on creating and maintaining the DAGs that help drive the actual data structuring flows.

How we mainly use Airflow is to combine, clean and enhance data from multiple sources and then reupload it back into Google BigQuery for visualization in other tools. Singular use cases are more easily captured by one or two tasks, but in Airflow we run flows that usually have multiple tasks that need to be executed in a certain order and not at all if one of them fails. This is what Airflow is meant to do, and that’s how we’re leveraging it too. As an example of our affiliate marketing campaigns, we have a structure set up that only pays out once travel is concluded (a very standard approach in the travel industry). This means that we need to retrieve orders from our partner > verify them with our database > create a new format to upload back to our vendor and run the actual upload. In addition, we want to set up some alerting for the team as well. Resulting in 6 tasks in this case that need to be executed in the right order: the perfect use case for Airflow.

Creating Structure

In the previous blog post, I touched on how we wanted to set up raw tables that are transformed once or multiple times. We decided to do this to both make the data more streamlined and also to make them ready for visualization on our channel dashboards. The MarketLytics team did a great job documenting this with a very visual result that you can see here:

As discussed previously, we go through multiple stages with the data that we get into the data lake and transform it to the data warehouse.

Example of data enhancement: One of the most common scenarios that we’ve tried to solve for was connecting existing data from a vendor back to the data that we receive in our web analytics tool: Google Analytics. As an example, if properly tagged we should be able to get the data from a specific newsletter campaign from the UTM parameters and then connect the data to (in our case) Marketo to what we have there on deliverability and open rate (%).


Again… This blog post is written with the help of @RickDronkers and @hu_me / MarketLytics who we’ve worked with alongside during this (ongoing) project.