Over the last year, we spent a lot of time working on getting data from several marketing channels into our marketing data warehouse. The series that we did on this with the team has received lots of love from the community (thanks for that!). Retrieving Search Query data from Bing has proven to be one of the ‘harder’ data points: there is a lack of documentation, there a no real connectors directly to a data warehouse, and as it turns out the returned data (quality) is … ‘interesting’ to say the least. That’s why I wanted to write this blog post, to provide the code to easily pull out your search query data from Bing Webmaster Tools and give more people to evaluate their data. Hopefully, this provides the overall community with a better insight into the data quality coming out of the API.
Getting Started
- Create an account on Bing Webmaster Tools.
- Add & Verify a site.
- Create an API Key within the interface (help guide).
- Save the API Key and the formatted site URL.
The code
These days I spent most of my time (whenever I get to write code) coding in Python, that’s why these.
import datetime
import requests
import csv
import json
import re
URL = "https://example.com"
API_KEY = ''
request_url = "https://ssl.bing.com/webmaster/api.svc/json/GetQueryStats?apikey={}&siteUrl={}".format(API_KEY, URL)
request = requests.get(request_url)
if request.status_code == 200:
query_data = json.loads(request.text)
with open("bing_query_stats_{}.csv".format(datetime.date.today()), mode='w') as new_file:
write_row = csv.writer(new_file, delimiter=',', quotechar='"')
write_row.writerow(['AvgClickPosition', 'AvgImpressionPosition', 'Clicks', 'Impressions', 'Query', 'Created', 'Date'])
for key in query_data["d"]:
# Get date
match = re.search('/Date\\((.*)\\)/', key["Date"])
write_row.writerow([key["AvgClickPosition"] / 10,
key["AvgImpressionPosition"] / 10,
key["Clicks"],
key["Impressions"],
key["Query"],
datetime.datetime.now(),
datetime.datetime.fromtimestamp(int(match.group(1)) // 1000)])
Or find the same code here in a Gist file on Github.
Steps to take
- Make sure you have all the needed dependencies installed: json, re, requests, csv.
pip install requests json re csv
- Run the script: python bing_query_stats.py and enter the API Key and Site URL in the constants at the top of the script.
- If everything is successful the information is saved in this file: bing_query_stats_YYYY-MM-DD.csv
Data Quality
As I mentioned in the intro, the data quality is questionable and leaves very much up to the imagination. It’s one of the reasons why I wanted to share this script, so others can get their data out and we can hopefully learn more together on what the data represents. The big caveat seems that the data is exported at the time of extraction with a date range of XX days and it’s not possible to select a date range. This means that you can only make this data useful if you save it over a longer period of time and based on that calculate daily performance. This is all doable in the setup we have where we’re using Airflow to save the data into our Google BigQuery data lake, but because it isn’t as straight forward this might be harder for others.
So please share your ideas on the data and what you ran into with me via @MartijnSch