Why & What alerts in SEO are becoming more important

We have all been there, haven’t we? Quotes like: “SH*T, my sitemaps are broken”, “I have no-indexed half my pages” or: “I have been kicked out of a search engine with way too many pages” sound familiar? Honestly, I can’t blame you. It’s getting harder and harder to keep track of all the changes that are being made regarding SEO on your site and you’re likely the only person involved with SEO for your company while also trying to work on driving traffic through other channels. So let me give you a quick insight in what I am usually tracking within bigger companies and where there is an actual team to react on issues that come up.

Continue Reading


Growing as an SEO (1/4) – Writing better Job Descriptions for SEO Roles

Writing a resume isn’t fun (IMHO) and writing job descriptions is probably even less fun. Over the last years I’ve written many of them, usually following a similar template that would help us define what the role is about. Which isn’t always a good thing, depending on the seniority of the role you want to make sure you use the right approach to hire and make it as personal as possible. Which usually makes for better hiring, most of my best hires came through my network of people I was at least ware of. Over the last months I’ve received many requests if I wanted to take a look at an SEO job description, if I knew people that were looking for a job and wanted to share it with my network, you get it. But what I started noticing is that most SEO job descriptions are incredibly generic and don’t really seem inviting too many people.

“We’re looking for somebody to set up or SEO strategy, we’re looking for somebody to work with our engineering and design team to create content. You’ll pick the right keywords for us to focus on”. Yada yada yada. You’ve seen and heard it all before. Obviously when you’re on a job search in SEO you’ll come across all of these requirements and responsibilities easily. But I think companies need to do better, definitely in an area like Silicon Valley, to hire the right SEO talent or to get them even interested. There isn’t that many of us, but the information you give ‘us’ isn’t always great. That got me thinking on what information should be mentioned in job descriptions for SEOs. But I also wanted to take a look at what job descriptions look like right now:

Saving job descriptions

I must admit, I have a weird obsession, if I see well written (or really poor) job descriptions, for whatever type of role in digital, growth, marketing, you name it, I have a tendency to save them (in Evernote). Over the years that has build up to a nice archive (150+ JDs) that I can use for writing new job descriptions that I’ve used for hiring. The list of 16+ companies that are amongst them: Airbnb, Uber, Groupon, Booking, Zillow, Hulu, Porch, Tesla and the descriptions range from SEO Assistants to more senior positions like Senior Director of SEO. Fill that up with all the job descriptions that you can easily find on most job sites (LinkedIn, Glassdoor) and you can get a good enough understanding of what managers + recruiters are thinking about while sourcing/hiring for SEO roles.

Almost unfortunately, Postmates didn’t have a job description for me. As my previous boss asked me to fill this need within the Growth team, otherwise I would have loved to share that original one.

What companies are looking for?

It doesn’t exist, even when you’re in the right position and you might be able to write your own job description. But most of them have some issues, so I decided to look at all the SEO job descriptions that I could find and see if there are any patterns in what companies are looking for. So let’s look at the two main areas of job descriptions:

Responsibilities

Tag clouds are good for something I guess, that’s why I just threw in all the requirements for a dozen job descriptions and these were the main keywords that came up in the tagcloud. Some of the ones that stood out for me:

    • Performance: This keyword was interesting to me so I did some digging on the context, I expected it to be a requirement to know about performance marketing. Turns out the overwhelming majority of companies wants better performance reporting around their SEO strategy.
    • Content: People in SEO need to have a solid understanding of content, know how to create it and maybe even more important, know how to improve it.
    • Technical: Guess what, these days SEOs need to be technical. As most of the job descriptions are from Bay Area companies, that doesn’t surprise me at all considering that the work with product managers (or in some orgs are even PMs) and engineers most of the day. This is also important regarding technical audits that are usually performed inhouse.
    • Strategies/Initiatives: SEOs need to be able to make strategic decisions. For most companies they’re one of the people working on usually the biggest traffic channel for the site so they need to be able to think strategic as they can make changes to a platform that have a bigger impact than just SEO.
    • Team(s): They either need to be great working in teams (aka teamplayer) and in the more senior positions they need to be great at building up their own teams, or building out.

Missing?

While analyzing this there were a few things that I was missing that I thought were interesting so at least I wanted to mention them.

  • Agencies: A good portion of SEOs that I know work with agencies, but there was barely a mention in job descriptions about working with agencies, finding them, etc.
  • ASO: Most companies that I went through had mobile apps, but ASO was never really part of the job description.

Requirements / Qualifications

  • Experience in SEO: For starter roles this is usually not a requirement, as they can only have experience with the work that they’ve done on the side and not in an actual job/company.
  • Experience in Analysis: Most SEOs needs to be at least familiar on a basic level with a web analytics tool like Google Analytics, Omniture, Adobe Analytics so they can analyze their performance (one of the core responsibilities).
  • Tools: Often I see experience with Google Search Console being mentioned, but I’d love to see more companies mention the other tools in their toolset too. In the end you won’t share that much information with your competition by telling them what tools you’re using.
  • Delivering results: Although you can’t guarantee that your work will help you need to be able to show the progress that you’ve made on other sites and the work that you’ve done there. If it didn’t result in an uplift, at least you’re able to provide answers on the why not and what your original hypothesis was.

Missing?

What I feel is missing in the list of requirements & qualifications is a few things, what about the setup that you already have, or are they diving into a new field of opportunity. Are you going to expand your business, are you operating in new niches? For some companies the future manager will already know what projects (s)he wants work to be done for.

  • Tools? What is your current toolset, if somebody has exceptional expertise with a certain tool that for sure would help. Anybody can learn more about a tool, but experience is important too.
  • How often have they played ‘this’ game before? How many sites have you worked on, what was the scale/business model of the sites? I have way more experience then on average with publishers and marketplace models then probably other SEOs. While somehow I have barely worked for ecommerce sites and SaaS companies thus far. This also gives better insights if they have a certain ‘play book’ on how to approach certain issues.

Writing the Ultimate Job Description

I’m on a journey to change the world. OK slowly. And one by one. But I believe we can do better, making people find the right jobs will make them happier and increase the productivity and output for the company. The first step to get that started would be to improve job descriptions so people have a better idea on what they’re getting into then setting up a very generic one. Not all bullet points will apply to every job description, but you likely get the point:

Responsibilities

  • Define the SEO strategy: we’re wanting to grow (X metric) with approximately XX% this year. SEO is one of the channels that we depend on, so we’re looking for somebody who could build out the channel after an intensive audit and figure out what opportunities we really have.
  • Reporting: be able to use our analytics infrastructure to dive into customer & traffic data to find new insights and opportunities for us to grow SEO as a traffic channel.
  • Reporting Up: be able to talk to our stakeholders and peers in the company about the performance and opportunities that you see within SEO. Be able to communicate the results of the work that we‘ve done.
  • Analytical: be analytical and data driven, are you able to write SQL and work with large amounts of data? Great! We have some of our analysts ready to work with you in supporting the insights that you need to gather.
  • Technical: we have developers ready to work with you, so it would help if you could code and be able to explain in detail what your wishes are for implementations regarding SEO and new features.
  • Content: we’ve been wanting to create & produce more and better content. It would be great if you have worked with copywriters and are able to take our blog & content marketing efforts to the next level. We have copywriters that we work with and also our PR specialists.
  • Build out the team: be a team leader and builder. Currently the team is 2 people that will be supporting you, but we hope to build out the team with your support. So we’d like to see experience leading people & teams.
  • Performance: you need to be able to identify opportunities, build out the resources needed and along the way have a ton of fun while always striving for better results.

Requirements / Qualifications

  • You have X years working experience in online/digital marketing and you know what channels are important for our type of business to be successful.
  • You have worked on (multiple) big sites regarding SEO before, it is important to us that you can show experience building out a strategy for a bigger site (50.000+ pages).
  • You have worked with web analytics tools and understand how you can use these insights to further improve user experience and optimize pages for search engines. Preferred tools would be: Google Analytics, Amplitude Analytics, Adobe Analytics, …, etc..
  • Do you have experience writing or have worked with copywriters before, great! This will help push forward our ideas on content marketing.
  • You have experience managing different products/projects at the same time, our teams are divided between products/projects and some are cross functional (designers, engineers).
  • You have worked before with tools that we already have in our toolset: Google Search Console, Bing Webmaster Tools, Majestic SEO, Screaming Frog, … , etc. but you’re free to look into other SEO tools (up to enterprise budget) and evaluate needs for our organization.

This is not even good enough but hopefully a good start, in the job descriptions that I usually write I also try to give insights into the company, mention what the team looks like and what the perks & benefits are of the role. But most important what type of person we’re looking for and how we think this role will help the bigger team grow & support. In the end it’s a two way stream and we want to make that clear from the start. You need somebody’s skills but you also want them to feel welcome and appreciated!

What’s missing?

What do you think is really missing in job descriptions these days that should be reflect. What are you looking for in a next or first SEO role? Let me know, I’d love this post to become the ultimate SEO job description for the rest of the world. Hit me up on @MartijnSch on Twitter for feedback!

Growing as an SEO

In this series I’ve also blogged about:


Exporting Amplitude Data to Google BigQuery

I’ve written about using Amplitude before on this blog (in Dutch), but what if you want to combine the huge amount of data that you have in Amplitude with your other big data. The Enterprise version gives you the ability to export your data to a Redshift cluster. But a lot of companies these days are also on Google Cloud Platform and want to use Google BigQuery, which is similar in its setup to Redshift.

Amplitude

The Export API from Amplitude lets you download all your events data (regardless of your account plan) for free. There are some limits to the data that they can export but most startups/companies that use them are likely to stay under that on a daily/hourly basis. Which means that you can export the data. So basically everything that you need to do is setup a cronjob that either every hour or day can retrieve the data. It will parse the data and prepare new files that will be temporarily stored in Google Cloud Storage (this will ensure that the data is easier/faster available for upload into Google BigQuery). The next step is loading the data from GCS to GBQ.

Google BigQuery

Over the last years I wanted to do more with Google BigQuery and this project was perfect for it. As BigQuery has many connectors with multiple Google products (Analytics, Data Studio) and other vendors like Tableau it should give companies the ability to analyze their data and connect to other sources.

Schemas

Within Google BigQuery we’re going to save the data in two tables:

  • Events: As everything is an event in Amplitude that’s also one of the tables that you’ll need to use in Google BigQuery, that’s why every event will end up as being it’s own row in Google BigQuery.
  • Properties: Every event can have properties in a few different ways: events, users, groups, group properties and an actual data property. We are connecting them to the data from the events table.

FAQ

  • Do I need to be a paying customer for Amplitude? No, you don’t. The free plans, which I started to use this on will have the support for the Export API as well.
  • What is the cost of the integration? The costs that are associated with this setup are related to Google Cloud Platform. You’ll have to pay for storage in Google Cloud Storage and the storage of Google BigQuery. For the setup that I’m running we’re saving millions of rows monthly and the costs are in total less than 10 dollar.
  • What do I need to do to get this up and running? Check out the README in the repository on Github, it will give you a getting started checklist to ensure that you can run this script.

 

Feedback? / Contribute?

I haven’t been the first person to work on an integration with BigQuery. I’ve seen other solutions for Java and Python, but they all work a bit different. If you have any feedback on the setup, leave an Issue on Github, submit a Pull Request with your proposed changes. In the end, I can code, but I don’t consider myself to be an engineer 😉


Sitemaps; Setup, Monitoring & Metrics for Analysis

In my effort to write longer posts on a specific topic I thought it was time to shed some light on something that we’ve been working on during the last months at Postmates and something that I never thought of as a topic that could become interesting: sitemaps. They’re pretty boring in itself, it’s a technology where you give search engines basically all the URLs for a site, that you want them to know about (indexed) and you take it from there. Even more so, as most sites these days run on a CMS like WordPress where tons of plugins can take care of this for you. Don’t get me wrong, do use them if you are on one! But as I work mainly for companies that don’t have a ‘standard’ CMS I worked multiple times on creating sitemaps and having their integrations work flawless. Over time that taught me a ton of things and recently we discovered that certain additional features in the process can help speed up the process. That’s why I think it was time to write a detailed essay on sitemaps ;). (*barf: definitive guide).

TLDR; How can sitemaps help you get better insights, how to set them up?

  1. Sitemaps will provide you with insights on what pages are submitted and which ones are indexed.
  2. You create create sitemap files by uploading XML or TXT files with dumps of URLs
  3. All your different content on pages can be added to sitemaps: images, video, news.
  4. Different fields for priority, last modified and frequency can give search engines insights in the priority for certain URLs to be crawled.
  5. Create multiple sitemaps with segments of pages, for example by product category.
  6. Add your sitemap index file to your robots.txt so it’s easy to find for a search engine.
  7. Submit your sitemap and ping sitemap files to search engines for quick discovery.
  8. Make sure all URLs in your sitemaps are working and returning a 200 status code, think twice: do you all want them to be discovered?
  9. Monitor your data and crawls through log files and Google Search Console.

Goals

When you start working on sitemaps there is a few things to keep in mind. The ideas that you have around them and the goal: what problem that you have are they solving? For small sitemaps (100 pages) I’m honestly not sure if I would support sitemaps. There is probably a lot of other projects that would have more impact on SEO/the business.

If you’re thinking about setting up sitemaps there is a few goals that it will help you accomplish:

  • Get better insights into what pages are valuable to your site.
  • Provide search engines with the URLs that you want them to index, the fastest way to submit pages at scale.

Overall this means that you want to support the best sitemap infrastructure you can as that will help you get the best insights ever, the quickest way to get these insights and most of all get your pages indexed + submitted as fast as possible.

Setup

Sitemap

Format? XML/Text? Does the format matter, for most companies probably not as they’re using a plugin to support their sitemaps. If you want to go more advanced and get better insights I would go with the XML format myself. From time to time we’re using text file sitemaps where we just dump all the URLs. They’ll help in getting you a sitemap quick and dirty if you don’t have the time or resources quickly.

Types: There are multiple formats for sitemaps to support different content types.

  • Pages: In there you’ll dump all the actual URls that you have on the site and that you want a search engine to know about. You can add images for these specific pages to that Schema as well to ensure that the search engine understands what images are an important part of the page.
  • Images: For both image search as making an impact with the pages you can add sitemaps for images.
  • Videos: Video sitemaps used to have a bigger impact back in the days as the video listings were a more prominent part of the search results page. These days you mostly want to let search engine know about them as they’re usually part of an individual page.
  • News: News is not really its own format as they’re just individual pages. But Google News sitemaps do have their own format. Creating a News Sitemap – Google.
  • HREFLang: This is not really a type of content but it’s still important to think about. If your pages have a translated version, you want to make sure they’re being listed as the ‘duplicate’ version of that. Read more information about that here in Google’s support.

Fields

  • Frequency: Does the page change on a regular basis? Some pages are going to be dynamic and will always change. But for some of them they will change only daily, weekly, monthly. It’s likely worth it to include this as a good signal in combination with the Last Modified field and the header.
  • Last Modified: We do want to let a search engine know what kind of pages have been updated/modified and which ones aren’t. That’s why I’d always recommend to organizations that they should include this in their sitemap. In combination with the Last Modified header, we’ll talk about that in the next step it will be a good enough signal to assess if the page has been modified or not.
  • Priority: This is a field that I wouldn’t spend too much time thinking about. On multiple occasions, Google has mentioned that they don’t put any value or effort into understand this field. Some plugins use it and it won’t hurt. But for custom setups it’s not something that I would recommend adding.

Last Modified

Has the actual sitemap changed since the last time it’s been generated? Yes or No? In some cases your sitemap won’t change. You didn’t add any new products/articles. Have you ever run this in your terminal:

curl -I https://www.example.com/sitemap/sitemap_index.xml

Look at the headers, if you see a Last Modified header, it will be a signal to see when the page has been last modified. We use it to tell the last time it was updated. We combine this with serving a Last Modified Header at the URLs that are in the sitemaps. Sometimes this won’t always work as pages can change momentarily (based on availability of products for example).

Segmenting Pages

For better insights it’s really useful to segment your sitemaps. The limit per sitemap is in the end 50.000 URLs, but there is basically not a required minimum. The way you’ll see sitemaps being segmented is in multiple ways. Based on these you can get more segmented insights, is 1 category of pages better indexed then another one.

Categories: Most companies that I work with are segmenting there pages by the categories they’ve defined themselves. This could be based on region or for example by product categories for an ecommerce site.

Static Pages: Something that most people with custom build sites don’t realize is that there is usually still a ton of pages that aren’t backed up by a database that you you want insights on too. Think about: contact, homepage, about us, services, etc. List all these pages in a different sitemap (static_sitemap.xml) and include this file in your sitemap index too.

Sitemap Index

If you have multiple sitemaps (10-25+) you want to look into creating a sitemap index file, with this you can just submit 1 file and with that the search engine will be able to find all the underlying files that are part of the sitemap. This saves you adding multiple sitemap URLs to Google Search Console/Bing Webmaster Tools and will also give you the ability to add only 1 line to your robots.txt file. In the end it’s another sitemap technically which lists all the different URLs of the other sitemaps.

Robots.txt

You want to make sure that on first entry a search engine will know about your sitemaps. Usually one of the first files a search engines’ crawler will look at is the robots.txt file as it needs to know what it can/can’t look at on a site. As we just talked about the sitemap index, we’re going to list that one in the robots.txt file for your site which should live on https://www.domain.com/robots.txt. It’s just as simple as adding this one line to it:

Sitemap: https://www.domain.com/sitemap/sitemap_index.xml

Obviously the URL can be different based on where you have hosted your sitemap index file.

GZIP

If you’re a big site you likely have servers that won’t go down and can take quite a hit but if you have extensive sitemap files they could easily get up to +50MB that is not a file transfer that can be done in a matter of two seconds. Also it can just slow down things on both your end and the end of the search engine. That’s why we’ve started GZipping our sitemap files to make for a faster download and speed up that process, at the same time you make it 1 step more complicated for people to copy paste your data.

PING Search Engines

Guess what, it has an affect. I thought it was crazy too, but we found a tiny bit of proof that actually pinging a search engine will result in something. As you mostly will likely only care about Google and Bing we still have a way of letting them know about a page:

Submit your sitemap

Probably not worth explaining, you need to make sure that you can get insights into your XML sitemaps and the URLs that are listed in there. So make sure to submit your sitemaps to Google Search Console and Bing Webmaster Tools.

Pubsubhubbub

One of the projects that is very unknown is the PubSubHubbub project, it will let, mostly publishers, be instantly notified (through a specific push protocol) when new URLs are published in a feed. This protocol works through an ATOM feed (do you still know about that protocol?) that you provide. Once you have registered the feed with the right services you can make it easier for them to be notified of new pages.

XSLT

XML Sitemaps aren’t easy to read for a regular person. If you’re not familiar with the format of XML it might be uncomfortable. Luckily a while back people invested XSLT. This will let you ‘style’ the output of XML files to something that is more readable. This would make it easier to see certain elements in the sitemaps that you’ve listed. If you want to make them more readable I would advise looking into: https://www.w3schools.com/xml/xsl_intro.asp.

Quality Signals

Search engines like sites that are of high quality. The pages are the best, the URLs are always working and your site never goes down. Chances are high that all of this doesn’t always apply to your sitemaps as some pages might not be great. Some things to consider when you’re working on this:

  • 301/302/404: Are all URLs in your sitemap responding like they should with a 200 response? In the best case scenario none of your URLs should be responding with another response code then that. In reality most sitemaps always contain some errors.
  • NoIndex: Have you included URLs in your sitemap that are actually excluded by a noindex meta tag or header? Make sure that it’s not the case.
  • Robots.txt: An even bigger problem, are you telling the search engine about URLs that you actually don’t want them to look at?
  • Canonical Pages: Is the actual URL that you’re listing the canonical URL/original URL or are you listing the pages that are still ‘stealing’ the content from another page, like a filter page. Do you really want to list these URLs in your sitemap?

With all of these signals, some might have a big/small impact others won’t matter at all. But at least think about the implications that they might have when you’re building out your sitemaps.

Airflow

Lately I’ve been working a ton with Apache Airflow, it’s the framework that we use at Postmates, invented by the great folks at Airbnb and mostly use for dealing with data pipelines. You want to do X, if X succes you want it to go on to task Y. We’re using that for the generation of sitemaps, if we can generate all sitemaps we want to have them pinged with the search engines, if that succeeds we want to run some quality scripts, if that is done we want to be notified on both email and Slack to tell us at what time the script succeeded.

For some sitemaps we want it to run everyday, for a specific segment we want to have it run on an hourly basis. The insights from Airflow will give us the details to see if it’s failing or not and will notify us when it succeeds/fails. With this setup, we have constant monitoring in place to ensure that sitemaps are being generated daily/hourly.

Monitoring

Eventually you only want to know if your pages are of good enough quality that they’re being indexed by the search engine. So let’s see how can see this in Google Search Console.

Index coverage

A useful report in Google Search Console is the Index Status report (Google Index > Index Status). It will show for the property that you’ve added how many pages have been indexed and what pages have been crawled. As the main goal for a sitemap is driving up the number of pages being submitted for the Google index the following step is making sure that they’re being indexed. This report will give you that first high level overview.

Sitemap Validation: Errors & Amount of URLs

But what about the specifics of the sitemap, are the URLs being crawled properly and are the URLs being submitted to the index. The sitemap reports give you this level of detail (in this case 98% is indexed, which makes sense, the 2% missing are some products that were test ones that Google seemed to have ignored, luckily!). Remember what we talked about before regarding segmenting your pages? If you would have done that you would have seen in this particular example what percent of pages in that sitemap was submitted / indexed. Very useful if you work on big sites where the internal link structure for example is lacking and you want to push that. These reports can (they not always) give you insights into what the balance could be between them.

Quality Assurance

  • Are the URLs working (200 status code)? An unknown fact, but Google doesn’t like following redirects or finding broken URLs in your sitemaps. Spend some time on making sure that these pages aren’t in there or add the right monitoring to prevent it from happening. Since we’ve starting Gzipping our sitemaps that’s become a tiny bit harder as you first need to unpack them. But for quality testing we still have scripts in place that on demand can run a crawl of the sitemap to see if all URLs in there are valid.
  • Page Quality: Honestly, is this page really worth it to be indexed in Google? Some pages are just not of the quality that they should be and so sometimes you should take that into account when building up sitemaps. Did you filter out the right pages?

Metrics & Analysis

So far we’ve talked about the whole setup and how to monitor results. Let’s go a little step further before we close this subject and look at the information in log files. It’s a topic that I became more familiar with and have worked closely with over the last months too:

Log Files

As log files can be stored on the web server that you’re also using for your regular pages you can get additional insights into how often your sitemaps are being viewed and if there are any issues with sitemaps. As we work on them on a regular basis it could be that they break. That’s why we make sure that for example we monitor the status codes for the URLs so that we can see when a certain sitemap doesn’t hit a successful 200 status code.

Proving that pinging works

A while back we started to ping our sitemaps to Google and Bing, both make it clear (Google) that if you have an existing sitemap and you want to resubmit it this is a good way to do it. This sounds weird, Google got rid of their ‘submit a URL’ feature for the index years ago. So we were skeptic to see if this had any impact. As it was really easy to implement, you just fire a GET request to a Google URL with the sitemap URL in there. What we noticed is that we saw Google almost immediately try to look at these URLs. As we refresh this specific sitemap every hour, we also ping it every hour to Google. Guess what happens, every hour for the last weeks they look at the sitemap by now. Who says you can’t influence crawlers? Result? If you want to ensure that Google is actually looking at a page and actively crawling it, pinging seems to prove that, that is actually happening.

Screenshot of this from a Kibana dashboard where we log server requests

What if you can’t ping? Usually I would only recommend pinging a search engine if your whole sitemap generation process is fully automated, it doesn’t make sense to open your browser or have a tiny script for this. If you still want to basically experience the same, use the Resubmit button in Google Search Console > Sitemaps to achieve the same.

Future

This is not all of it and I’ve gone over some topics briefly, I didn’t want to document everything as there’s already a ton of information from Google and other sites about how you can specifically setup sitemaps. In my case, we’re on a route to figure out how we can make our sitemap setup near perfect, what I’m still wanting to investigate or analyze:

  • Adding a Last Modified Header to pages in the sitemap, what is the effect of pinging a sitemap and Google looking at all pages or just the ones that are modified?
  • Segmenting them even further, let’s say I only add 100/1000 pages to a sitemap and start creating just more of them, does that influence crawling, do we get better insights?

Resources

You want to learn more about sitemaps, look into the following resources to learn more about the concept, the idea behind it and the technical specification:

Next steps?

When I started writing I didn’t plan to have this become everything I know about sitemaps. But what did I miss? What optimizations can we apply to sitemaps in order to get better insights, speed up the crawling of pages. This is just one of the areas of technical SEO but probably an important one if you’re looking for deeper insights into what Google or Bing think about your site. If you have questions or comments, feel free to give a shout on Twitter: @MartijnSch


Using Amplitude for Product & Web Analytics

I’ve previously published this blog post in Dutch on Webanalisten.nl.

What if you are looking for a product for web analytics but have a lot of events, a complicated product and sending more and more data over time. Sometimes it wouldn’t just work to go with Google Analytics (360), Adobe Analytics and maybe integrating your custom build solution or Snowplow might be too complicated for your organization. In this post I’d like to show you another tool that might be interesting to you: Amplitude. I’ve been working with it for the last year and it provides some great flexibility over other products in the industry.

What is Amplitude?

“Analytics for modern product teams”

Or in normal language, you can track any event and connect that to a user. All the events that you send can have properties, just like the user can have properties. You can filter by all these data points and move your data around to turn it into multiple types of charts: funnels, event hits, revenue, etc. In this article we’ll be running through how you can be using Amplitude and what it’s good for. Let’s go ahead and dive in!

Why would you be using Amplitude?

You want to measure what is happening within your product and what users are doing. Keeping in mind that all this data can help you improve their workflows and measure the impact certain changes have on their behaviour. In that aspect, Amplitude is directly competing with mostly tools outside of web analytics, like: Google Analytics for Firebase, Snowplow, KISSmetrics, Mixpanel, etc. In the next section we’ll explain why, as a lot of features are interpreted differently from regular web analytics but can still help you a lot in your daily work:

What’s the difference?

 

  • Instant Reporting/DIY: While most analytics tools provide you with a lot of pre configured dashboards. Amplitude will let you do this all on your own, which can be a time consuming task but in my opinion it also lets you think a bit more about the way you set-up your analytics infrastructure.
  • No default metrics: Bounce rate doesn’t exist as any event can be triggered to influence it (plus, would that be your most useful metric anyway?)
  • Funnels: Anything can be a  funnel, in my opinion that makes it very powerful as it doesn’t require you to create any predefined goals and also will make sure you can create funnels retroactively (recognize this pain point in Google Analytics?). If you added events a few weeks ago and now is the time to start creating a funnel of it, you’re able to. Want to change the funnel and add more/less events. You can.

 

 

  • User/Session: Sessions (on Web) don’t really exist in Amplitude. While in my opinion this is a metric that has a very loosely defined definition anyway it can come in handy from time to time to measure retention. It will provide this data on mobile where sessions are way easier to be defined (app open/close).
  • Channel/Traffic Source: If you’re looking for an easy way to track where your visitors are coming from with detailed reports that would be associated with costs. That’s just not what Amplitude is for. While it can easily save the UTM data that you’re sending along it won’t offer your great reporting around it. That’s why there focus is mostly on product analytics.
  • Merging Events/Linking Events: At the beginning of this section we talked about the need for setting up all the dashboards yourself. As you won’t have a very defined plan with your tracking for what’s to come over the next few years it can be hard to follow a certain naming convention from scratch. Usually turning your analytics data into an unstructured chaos over time. Within Amplitude you’re able to merge certain event names and link them together. So you can easily change your old event names to something new and still connect the data together. One of the features I really miss sometimes in other tools when I’m trying to redefine naming conventions and clean them up.

Why data governance is even more important

The role of data governance is becoming more important by using tools like this in combination with having the need for good documentation. If you come into an organization that is already sending hundreds of different events it can be really hard to get started with making a more deep analysis as you’re not always familiar with the

    • Naming conventions: You want to make sure that you’re using the right names for the events and that you’re making sure that their logical in order to send the data. It would be good to give this article, on creating measurement plans, that I wrote for online-behavior.com a read. We’ll talk later about how Amplitude can still help you if you would like to make changes to the events you sent.

 

  • Segments/Cohorts: As most of the data for users can be saved in event or user properties this will also mean that you need to make sure that the data in there doesn’t change too often as it might affect how you’ve set up your segments and cohorts.

 

  • Also funnels and many reports can be impacted by the way you save data.

Overview of features

  • Dashboarding/Charts: As we talked about the flexibility that Amplitude can provide you with, this mostly shows in the way you’re working with charts and adding them to dashboards. You can create dozens of different charts and add them to a dashboard. The dashboards will then, for example, give you the ability to change the date range. If you don’t like that you can still make all the changes from the actual chart.
  • A/B Testing – Significance Calculator: Are you running an A/B test on your site and sending the data to Amplitude. Within certain charts you can segment out the results and immediately calculate if they’re significant for what you’re analyzing. Saves time trying to find a significance calculator.

      • Custom Metrics: Just as many other web analytics tools, Amplitude will give the ability to create custom formulas within a chart to calculate other metrics.

     

  • Retroactive reporting: You have added tracking months ago but only today you’ve figured out that an event should be measure as a goal? You can set up a goal funnel really easily with old data and have all the old data being available to you.
  • Realtime: The fact that all of the events that you send to Amplitude are processed in real time makes it very powerful. Basically within minutes of launching a new feature you can start analyzing the data to see what’s going on or if it’s broken. Gone are the times were you need to wait for hours to have all the data that you are collecting be fully available.
  • Unlimited event/user properties & ‘dimensions’: Every event can have properties that can be related to the event. In addition to that a user can have properties that can be used too. So if I want to mark certain users with an action I can easily send that along with an event to update the records for this.
  • CLTV: Measuring the lifetime value of users will obviously require you to start identifying users (relatively easy to set up). But this will enable you to look into how you’re users are performing over time and if you have high retention for what that means for their customer lifetime value. This is an example report that would provide me with the performance of a segment of users over the last 12 weeks and what they’re worth to the business.

Chart for CLTV

What’s missing?

Google integrations? Obviously some things are missing, while the Cohort feature’ abilities are very powerful and Amplitude can provide you with some cool integrations with other software it still can’t make the connection with the audience data from Google. Which is obviously always going to be a big upside of the Google Analytics Suite.

Transactions/Purchase: The way Amplitude is tracking a conversion is a bit weird. You send all the products that you purchase as different revenue events. There is no concept of a purchase, which seems strange. Also it’s really hard to identify what the Xth purchase was, these are events that you need to setup yourself.

UTM/Traffic Source Reporting: It does exist but it isn’t great and definitely not as powerful as you’re used to in your regular web analytics tools. Does it get the job done for product analytics, Yes it does I’d say. If you’re looking to do anything more advanced with the data you should be building additional capabilities on your own end.

Use Cases

  • Funnels: Every event can be part of a funnel and that makes it very flexible and useful if you want to compare user behaviour. For example connecting certain user actions before a purchase funnel can be the case too.
  • Customer Lifetime Value/Retention:
  • Cohorts: Where you would have segments & audiences in Google Analytics you have the ability to also create cohorts of users to measure the impact of certain properties/events on their behaviour over time. For example this is a cohort that we used often at Postmates where we would look at what users that have come in with the sign up referrer that includes google, yahoo, bing (an organic search user). We would use this cohorts either to export them from other marketing purposes (email/push campaigns) or to analyze their behaviour against other cohorts.
    • How do organic search users in the last month behave different if they have used x feature?
    • How do users who have touched feature x find the other features?

Segmenting users with its Cohort feature.

Conclusion

Overall I’m pretty satisfied with Amplitude and how it can help you with its flexibility in adding/creating events and figuring out later what kind of dashboarding/charts you’ll create on top of this. But it’s likely (for now) not going to replace most of the data that you’re used to in web analytics as that would require a lot of additional setup and tracking. You can use it very effectively within organizations to track certain aspects and user behaviour. All in all a great addition to most analytics teams. All in all I would advise most companies to use these tools together as they can be very useful in providing more insights into what the user is doing specifically.

If you’ve worked with Amplitude and want to share more about your experiences, leave a comment! Currently I’m working on exporting Amplitude data to Google BigQuery for ‘big data’ analysis, in a future post I hope to share with you on how you can set that up yourself.