Duplicate content is (according to questions from new SEOs and people in online marketing) still one of the biggest issues in Search Engine Optimization. I’ve got news for you, it for sure isn’t as there are plenty of other issues. But somehow it still always comes up to the surface when talking about SEO. As I’ve been on both sides of the equation, having worked for comparison sites and a publisher I want to reflect on both angles. Why I think it’s really important that you see both sides of the picture when looking into why sites could have duplicate content and if they do it on purpose or not.
When I started in SEO about 1211 years ago I worked for a company who would list courses from all around the globe on their website (Springest.com, let’s give them some credit), making it possible for people to compare them. By doing this we were able to create a really useful overview of training courses on the subject of SEO for example. One downside of this was that basically none of the content we had on our site was unique. Training courses are often a very strict program and in certain cases are regulated by the government of institutions to provide the right qualification to attendees. Making it impossible to change any of the descriptions on contents, books or requirements as they were provided by the institutions (read: copy pasted)
Having worked at the complete other side with The Next Web where I had the privilege of working with 10-15 full-time editors all around the globe who write unique, fresh and (news) content on a daily basis. Backed up by dozens of people willing to write for TNW where are presented with the opportunity to chose what kind of posts we publish. It made some things easier, but even at TNW we ran into content issues. The tone of voice over time devalues/changes as editors come and go. But also when you publish more content from guest authors it’s hard to maintain the right balance.
These days I’m ‘back’ with duplicated content, working at Postmates where we work on on-demand delivery. Now it makes it easier to deal with the duplicate content that we technically have from all of the restaurants (it’s published on their own site and on some competitors). But with previous experience it’s way easier to come up with so many more ideas based on the (duplicate) content that you already have. It also made me realize that most of the time you’re always working with something that is duplicate, either it be the product info you have in ecommerce, the industry that you operate in. It’s all about the way you slice and dice it to make it more unique.
In the end, search engine optimization is all about content. Either duplicated or not. We all want to make the best of it and there is always a way to provide a unique angle. Although the angle of the businesses and the way of doing SEO for them is completely different there are certain skills required that I think could provide you with a benefit over a lot of people when you’ve worked with both.
Last year I blogged about using 855 properties to retrieve all your Search Analytics data. Just after that Google luckily released that the limits on the API to retrieve only the top 5000 results had been lifted. Since then it’s been possible to potentially pull all your keywords from Google Search Console via their API (hint: you’re not able to get all the data).
Since I’ve started at Postmates now well over two months ago one of the biggest projects that I started with was getting insights into what markets + product categories we’re already performing OK in from an SEO perspective. With over 150.000 unique keywords weekly (and working on increasing that) it is quite hard to easily get a good grasp on what’s working or not as we’re active in 50+ markets that influence the queries that people are searching for (for example, show me all the queries over a longer period of time with only Mexican in the title across all markets, impossible from the interface). That’s why clicking through the Search Analytics feature in Google Search Console was nice for checking specific keywords quickly, but overall it wouldn’t help in getting detailed insights into what’s working and what’s not.
Some of the issues I was hoping to solve with this approach:
- Pull all your data on a daily basis so you can get an accurate picture of the number of clicks and how that changes over time for a query.
- Hopefully get some insights into the actual number of impressions. Google Adwords Keyword Tool data is still vary valuable but as it’s grouped it can be off on occasion. Google Search Console should be able to provide more accurate data on a specific keyword level.
- Use the data as a basis for further keyword research and categorization.
Having used the Google Search Console API a bit before I was curious to see what I could accomplish pulling in the data on a daily basis and making sense of it (and combining it with other data sets, maybe more on that in later blog posts).
- Daily pull in all the keywords, grouped by landing page so you know for sure you get all the different keyword combinations and your data isn’t filtered by the API.
- Save the specific keyword if we haven’t saved it before, so we know if the keyword was a ‘first-hit’ for the first time.
- For every keyword that you return do another call to the API to get the country, landing pages and metrics for that specific query.
In our case we categorize the keywords right after we pull them in to see if it’s matching a certain market or product category. So far this has been really useful for us as it’s providing way better ways for dashboarding.
Some of the things that I ran into while building out this:
What to look out for?
- The API is very much limiting the keywords that you get to see with only impressions. I was able to retrieve some of the data but on a daily basis the statistics for impressions are off with 50% from what I’m seeing in Google Search Console. However clicks seems to only have a small difference, win!
- Apparently they’re hiding some of the keywords as they qualify them as highly personal. So you’ll miss a certain percentage because of that.
- The rate limits of the Google Search Console aren’t very nice, for over 5k keyword it’s taking quite long to pull in all the data as you have to deal with their rate limits.
Most of these items aren’t really being an issue for us, we have better sources for volume data anyway. In the future we’re hoping to gather more data around different sources to extend that. I’m hoping to blog about somewhere in the future.