Part 4: Visualization with Google DataStudio – Building a Marketing Data Lake and Data Warehouse on Google Cloud Platform
By Martijn Scheijbeler Published May 5, 2020In the previous blog posts (part 1, part 2, and part 3) in this series, we talked about why we decided to build a marketing data warehouse. This endeavor started by figuring out how to deal with the first part: building the data lake. In the fourth blog post, we’ll chat about how we are visualizing all the data we saved in previous steps by using Google DataStudio.
This blog post is part of a series of four? (maybe more, you never know), in which we’ll dive into the details of why we wanted to create a data warehouse, how we created the data lake, how we used the data lake to create a data warehouse. It is written with the help of @RickDronkers and @Hussain / MarketLytics who we’ve worked with alongside during this (ongoing) project.
How we build dashboards
Try to think ahead about what you need: date ranges, data/date comparisons, filters, what type of visualization. This will help you build a better first version right away as it gives you the ability to have a good version right away. What that mainly looked like for us:
- Date ranges: The business is so seasonal that our Year over Year growth is most important for RVshare, and since we often don’t get to see all the context on metrics on a weekly basis, we default to 30 days.
- Filters: For some channels (PPC, Social), it’s more relevant to be able to filter down the data on a campaign or social network level. Because in most cases the aggregate level doesn’t tell the whole story right away.
- Visualization: We need the top metrics: sessions and revenue in view right away with the comparison YoY right away so we know within seconds what is going on and how that can improve things.
Talking to Stakeholders (Part Deux)
In the first blog post, we talked about connecting with our stakeholders (mainly our channel owners) and gathering their feedback to build the first initial versions of their dashboarding (beginning with the end in mind). We used this approach to put the first charts, tables, and graphs on the dashboards after which we connected back again with the owners to see what data points were missing and in some cases to validate the data that they were seeing on their dashboards. This helped us get additional feedback for fast follows and made for quick iterations on data that we had and could also show. For social media, as an example, it turned out that we wanted to show additional metrics that we hadn’t thought of initially but were in our data lake anyway. These sessions provided a good way for us to build additional pieces into our data warehouse while we were at it. These days some of these dashboards are used weekly to report to other teams in the organization or used within the team itself.
Best Practices
Blended Data
Do you want to add this to Google DataStudio, or do you want to create synced/aggregate tables in BigQuery? For most of our use cases, we have opted for using DataStudio to create JOIN blended data sources. It’s easier – we have the ability to quickly pull some new data together versus having to deal with the data structures and complicated queries. In some use cases, we noticed that we were missing data in our warehouse tables (not the lake) and were able to make adjustments/improvements to them by creating dashboards.
Single Account Owner
Because we work with Rick and Hussain as ‘third parties’, we opted for using 1 shared owner account, transferring owner access is incredibly hard when it’s a Google Apps account so we made sure that the dashboards are owned through an @rvshare.com account, it’s not a big topic but could cause tons of headache in the long term.
Keep It Simple St*pid
Your stakeholders probably have the desire / and the time to look at less than you think. Instead of having them jump through too many charts start simple and then add based on feedback if they want to see more, less is more in this case.
This has added benefit of them feeling engaged and more interested in using it. In our own use case, we leverage our reporting on a weekly basis for a team meeting, which makes it already a more often leveraged us case.
Calculated Fields – Yay or Nay?
As we made most of the tables that we leverage in DataStudio from scratch during our ETL process, we had the opportunity to decide if we wanted to leverage calculated fields in BigQuery or if we do the work in the queries itself. Honestly, the answer wasn’t easy, and as we made modifications in the dashboards, it became clear that having them set up in DataStudio wasn’t always scalable and easy as with data modifications or changes in tables they are removed.
Google BigQuery
Tables or Queries? In our case, we often used the table information from BigQuery and the specific columns in there to drive the visualization in DataStudio. The alternative for some of them is that we directly query the data in BigQuery, with the BI Engine reservation that we have in there we can speed up intense queries rather easily.
Again… This blog post is written with the help of @RickDronkers and @Hussain / MarketLytics who we’ve worked with alongside during this (ongoing) project.