Dogfooding at RudderStack: Tracking Plans Part 1
What are Tracking Plans, & why do you need one?
At RudderStack we talk a lot about the importance of owning your own data and the competitive advantage that can come from building robust analytics with complete data. Data trust is fundamental to this construct. In order to trust the data, you must trust the tools that are providing that data. That’s why we built our Data Governance API and the new Tracking Plans feature we are getting ready to beta.
RudderStack Tracking Plans are the latest offering within our Data Governance API and have been one of our most requested features to date. Unlike RudderStack Transformations, which allow you to transform your data in flight, Tracking Plans allow you to plan and prescribe what your data should look like in the first place. Tracking Plans address three fundamental issues related to streaming data:
- Missing or improperly configured data breaks downstream SaaS applications and data warehouses. This causes problems like poorly executed automated campaigns and broken dashboards.
- Poorly named and duplicative events and properties. This creates confusion and mismapping in downstream tools and data warehouses.
- Upstream data providers make changes resulting in altered event streams. This leads to 1 and 2 above with little to no advanced notice or the ability to fix it.
How Tracking Plans solves these issues
RudderStack Tracking Plans allow you to define the specific event names and properties for each of your track, group and identify calls. In addition, you can assign the type of data associated with each property or attribute and specify whether that property or attribute is required. The Tracking Plans API also supports versioning for better control of your data streaming.
With your Tracking Plans in place, you can use the existing Data Governance API’s to evaluate your inbound events, payload samples and metadata to compare them against your plans. You can also use the RudderTyper tool we’re releasing alongside Tracking Plans. RudderTyper is a tool for generating strongly-typed RudderStack analytics library wrappers based on your published tracking plan specs, meaning your data will conform to your defined schema upon capture.
What does the future hold?
Well, that’s where we need your help. We are currently working with a few Alpha customers and using Tracking Plans ourselves on our own production instance of RudderStack. What we have on the roadmap are decisions about what types of errors or schema violations we want to track and then how to handle them. Although not set in stone, here is a sneak peak into what we’re thinking so far:
Violation Type | Description | Action Taken |
---|---|
Unplanned Events | An event for which no schema has been defined. |
Unplanned Properties | The event is defined but the property or attribute does not exist in the schema |
Mismatched Data Type | Data type for a particular property does not match what is defined in the schema |
Required Field Missing | The payload is missing a property set as required in the schema |
Once this feature is fully built, the actions taken on each of these violations could include one or more of the following:
- Rejecting the entire payload
- Accepting the entire payload and sending it to downstream destinations w/ a warning flag
- Rerouting the entire payload to an S3 bucket (aka “dead letter queue”)
- Removing the additional properties from the payload that are not defined in the schema
- Inserting default values for required fields missing in the schema
- More advanced options based on schema comparisons outlined here: https://ajv.js.org/options.html
So, where do you start?
Connect with us on Slack or shoot us an email if you are interested in participating.
In the meantime, let’s take a look at how your typical SaaS business would walk through the steps of designing and implementing Tracking Plans with RudderStack. As a part of the RudderStack Data Governance API, Tracking Plans are first and foremost managed through code, but we understand that designing the plan will be a collaborative effort involving developers and non-developers, so we designed a Tracking Plans Template Google Sheet to help get teams started.
The first step is to get your hands on a copy of the RudderStack Tracking Plans Template which will be available soon. This will help you and your team organize the various events and fields you want to capture from each of your RudderStack sources. The sheet does require that you have a user access token for your account. For help on how to create a user token, check out our Access Token user documentation.
The next step is to create a wish list of events and properties you think you might need. The goal of this first pass is not to create the be-all-end-all list, but primarily to see where data needs intersect amongst the various stakeholders and to begin building out the data architecture for your company. During this exercise, it can be helpful to start with existing higher-level paradigms like the sales and marketing funnel or executive summary reports as the underlying metrics for these are generally already agreed upon. Starting with what you already know you need to measure is a great way to begin drilling into how you measure it and, more specifically, where the data comes from in the first place and what properties or attributes will be measured (i.e., required keys and data types).
For example, let’s take a sample SaaS business that has a funnel measuring the following:
Stage | Team |
---|---|
Unique Site Visitors | Marketing - Paid Digital |
Leads | Marketing - Engagement |
MQLs | Marketing - Engagement |
Opportunities / Free Trials | Sales - Outbound |
Product activation / POC | Sales - Sales Engineering |
Customers | Sales - Coffee Drinkers |
Product usage | Customer Success |
Now that we have each stage defined, let’s dive deeper into exactly what data elements will need to be created and tracked to reproduce our funnel and assign a source for the data. It is important to note that in some cases, such as defining a Marketing Qualified Lead (MQL), there may be multiple sources of information that contribute to qualifying any one particular lead, but in this table we are defining what system retains that information so that, should we ever need to perform an audit, Salesforce (in this example) is the system where we would confirm whether this particular lead was flagged as a MQL or not. As we are defining each metric, we will assign it to a tracking plan on our google sheet.
Funnel Step | Source | Metric | Tracking Plan |
---|---|---|---|
Visitor | Marketing Website & App | Count of Distinct Anonymous ID | Page View (Marketing) Page View (Application) |
Lead | Marketing Website & App | Count of Distinct Email Addresses per domain | Form Submit (Marketing) App Signup (Application) |
MQL | Salesforce | Count of Salesforce Leads (not deleted) with MQL checked | N/A (SFDC ETL) |
Opportunity / Free Trial | Salesforce | Count of Opportunities where Opp Type = Initial | N/A (SFDC ETL) |
Product Activation | App | Has the User Created a Connection | Connections Created |
Customer | Salesforce | Opportunity = Close Won | Opportunity Won |
Product Usage | App | Total Event Volume | N/A (aggregated from warehouse tables) |
Some of our metrics will come from RudderStack ETL sources or other non-RudderStack tables in our data warehouse and therefore will not be defined in our Tracking Plan for event data.
Building out Tracking Plans
In the funnel map above we defined six different events and three different tracking plans that we want to build. This by no means defines the totality of your tracking plans but will be enough to get you started using the tools.
RudderStack Source (Tracking Plan) | User Action Name | RudderStack Event Name |
---|---|---|
Marketing Site | Page View | page_view |
Marketing Site | Form Submit | form_submit |
Application | Page View | page_view |
Application | App Signup | app_signup |
Application | Connection Created | connection_created |
Salesforce Webhook* | Opportunity Won | opp_won |
*Typically Salesforce and other SaaS tools will have data extracted using RudderStack ETL every 24 hours, however critical events like marking an Opportunity as won are important enough to trigger a real-time event being passed back through a Webhook source.
With the sources and events defined, we now need to identify the properties and property types for each event. These should now be added to the Tracking Plans Google Sheet. Each Source should have its own tab copied from the “Import Template”. The tab below is a copy of the Marketing Site tab we created.
Event Name | Description | Property name | Property type | Property description | Req'd |
---|---|---|---|---|---|
page_view | User visits a page | link_source | string | Value of UTM parameter defined as ?link_source={value} | O |
form_submit | User submits a form | page_title | string | Title of the page | R |
- | - | page_URL | string | URL of the page | R |
- | - | form_id | string | The ID of the form (configured in Sanity) | R |
- | - | label | string | Label for Google Analytics events (if needed) | O |
- | - | category | string | Category for Google Analytics events (if needed) | O |
- | - | utm_source | string | Optional utm parameters | O |
- | - | utm_medium | string | Optional utm parameters | O |
- | - | utm_campaign | string | Optional utm parameters | O |
- | - | utm_content | string | Optional utm parameters | O |
- | - | utm_term | string | Optional utm parameters | O |
- | - | raid | string | Optional utm parameters | O |
- | - | search_text | string | The text the user typed into the search field | R |
With the basics of our Marketing Site source plan created, we can now upload it to RudderStack by configuring additional settings in the Google Sheet (more on this when we release the feature).
One exciting part of the Tracking Plans Google Sheet is that you can download the latest version of a tracking plan from the RudderStack Tracking Plan API, then upload any changes you make, ensuring everyone working on the plan has the most recent set of changes.
Once a Tracking Plan has been uploaded to the API via the Google Sheet, you are ready to begin using RudderTyper. Download instructions and tutorials will be made available to beta participants.
Tracking Plans are only one piece of the puzzle
As useful as RudderStack Tracking Plans will be (and already are for our team and beta users), it should also be noted that there will always be scenarios where you still need to transform the data once it arrives from the source, either for enrichment, filtering or massaging based on the needs of the various downstream destination tools. Tracking Plans and Transformations go hand-in-hand to ensure a stable and trustworthy data feed.
There may also be times where you aren’t sure what to do with particular variations of events streamed from your sources and in these cases sending them to a backup bucket such as Amazon S3 or Google Cloud Storage is an elegant solution. Check out our documentation for more information on how to leverage a variety of Cloud Storage Platforms.
Beta registration
As we continue our mission of giving developers full control over their data and their tools, we recognize and appreciate the commitments our customers have made to help improve the product and we thank you. If you would like more information on how to get signed up, please contact katie@rudderstack.com or hit us up on Slack.