Dogfooding at RudderStack: Tracking Plans Part 1

What are Tracking Plans, & why do you need one?

At RudderStack we talk a lot about the importance of owning your own data and the competitive advantage that can come from building robust analytics with complete data. Data trust is fundamental to this construct. In order to trust the data, you must trust the tools that are providing that data. That’s why we built our Data Governance API and the new Tracking Plans feature we are getting ready to beta.

RudderStack Tracking Plans are the latest offering within our Data Governance API and have been one of our most requested features to date. Unlike RudderStack Transformations, which allow you to transform your data in flight, Tracking Plans allow you to plan and prescribe what your data should look like in the first place. Tracking Plans address three fundamental issues related to streaming data:

Missing or improperly configured data breaks downstream SaaS applications and data warehouses. This causes problems like poorly executed automated campaigns and broken dashboards.
Poorly named and duplicative events and properties. This creates confusion and mismapping in downstream tools and data warehouses.
Upstream data providers make changes resulting in altered event streams. This leads to 1 and 2 above with little to no advanced notice or the ability to fix it.

How Tracking Plans solves these issues

RudderStack Tracking Plans allow you to define the specific event names and properties for each of your track, group and identify calls. In addition, you can assign the type of data associated with each property or attribute and specify whether that property or attribute is required. The Tracking Plans API also supports versioning for better control of your data streaming.

With your Tracking Plans in place, you can use the existing Data Governance API’s to evaluate your inbound events, payload samples and metadata to compare them against your plans. You can also use the RudderTyper tool we’re releasing alongside Tracking Plans. RudderTyper is a tool for generating strongly-typed RudderStack analytics library wrappers based on your published tracking plan specs, meaning your data will conform to your defined schema upon capture.

What does the future hold?

Well, that’s where we need your help. We are currently working with a few Alpha customers and using Tracking Plans ourselves on our own production instance of RudderStack. What we have on the roadmap are decisions about what types of errors or schema violations we want to track and then how to handle them. Although not set in stone, here is a sneak peak into what we’re thinking so far:

Violation Type	Description	Action Taken
Unplanned Events	An event for which no schema has been defined.
Unplanned Properties	The event is defined but the property or attribute does not exist in the schema
Mismatched Data Type	Data type for a particular property does not match what is defined in the schema
Required Field Missing	The payload is missing a property set as required in the schema

Once this feature is fully built, the actions taken on each of these violations could include one or more of the following:

Rejecting the entire payload
Accepting the entire payload and sending it to downstream destinations w/ a warning flag
Rerouting the entire payload to an S3 bucket (aka “dead letter queue”)
Removing the additional properties from the payload that are not defined in the schema
Inserting default values for required fields missing in the schema
More advanced options based on schema comparisons outlined here: https://ajv.js.org/options.html

So, where do you start?

Connect with us on Slack or shoot us an email if you are interested in participating.

In the meantime, let’s take a look at how your typical SaaS business would walk through the steps of designing and implementing Tracking Plans with RudderStack. As a part of the RudderStack Data Governance API, Tracking Plans are first and foremost managed through code, but we understand that designing the plan will be a collaborative effort involving developers and non-developers, so we designed a Tracking Plans Template Google Sheet to help get teams started.

The first step is to get your hands on a copy of the RudderStack Tracking Plans Template which will be available soon. This will help you and your team organize the various events and fields you want to capture from each of your RudderStack sources. The sheet does require that you have a user access token for your account. For help on how to create a user token, check out our Access Token user documentation.

The next step is to create a wish list of events and properties you think you might need. The goal of this first pass is not to create the be-all-end-all list, but primarily to see where data needs intersect amongst the various stakeholders and to begin building out the data architecture for your company. During this exercise, it can be helpful to start with existing higher-level paradigms like the sales and marketing funnel or executive summary reports as the underlying metrics for these are generally already agreed upon. Starting with what you already know you need to measure is a great way to begin drilling into how you measure it and, more specifically, where the data comes from in the first place and what properties or attributes will be measured (i.e., required keys and data types).

For example, let’s take a sample SaaS business that has a funnel measuring the following:

Stage	Team
Unique Site Visitors	Marketing - Paid Digital
Leads	Marketing - Engagement
MQLs	Marketing - Engagement
Opportunities / Free Trials	Sales - Outbound
Product activation / POC	Sales - Sales Engineering
Customers	Sales - Coffee Drinkers
Product usage	Customer Success

Now that we have each stage defined, let’s dive deeper into exactly what data elements will need to be created and tracked to reproduce our funnel and assign a source for the data. It is important to note that in some cases, such as defining a Marketing Qualified Lead (MQL), there may be multiple sources of information that contribute to qualifying any one particular lead, but in this table we are defining what system retains that information so that, should we ever need to perform an audit, Salesforce (in this example) is the system where we would confirm whether this particular lead was flagged as a MQL or not. As we are defining each metric, we will assign it to a tracking plan on our google sheet.

Funnel Step	Source	Metric	Tracking Plan
Visitor	Marketing Website & App	Count of Distinct Anonymous ID	Page View (Marketing) Page View (Application)
Lead	Marketing Website & App	Count of Distinct Email Addresses per domain	Form Submit (Marketing) App Signup (Application)
MQL	Salesforce	Count of Salesforce Leads (not deleted) with MQL checked	N/A (SFDC ETL)
Opportunity / Free Trial	Salesforce	Count of Opportunities where Opp Type = Initial	N/A (SFDC ETL)
Product Activation	App	Has the User Created a Connection	Connections Created
Customer	Salesforce	Opportunity = Close Won	Opportunity Won
Product Usage	App	Total Event Volume	N/A (aggregated from warehouse tables)

Some of our metrics will come from RudderStack ETL sources or other non-RudderStack tables in our data warehouse and therefore will not be defined in our Tracking Plan for event data.

Building out Tracking Plans

In the funnel map above we defined six different events and three different tracking plans that we want to build. This by no means defines the totality of your tracking plans but will be enough to get you started using the tools.

RudderStack Source (Tracking Plan)	User Action Name	RudderStack Event Name
Marketing Site	Page View	page_view
Marketing Site	Form Submit	form_submit
Application	Page View	page_view
Application	App Signup	app_signup
Application	Connection Created	connection_created
Salesforce Webhook*	Opportunity Won	opp_won

*Typically Salesforce and other SaaS tools will have data extracted using RudderStack ETL every 24 hours, however critical events like marking an Opportunity as won are important enough to trigger a real-time event being passed back through a Webhook source.

With the sources and events defined, we now need to identify the properties and property types for each event. These should now be added to the Tracking Plans Google Sheet. Each Source should have its own tab copied from the “Import Template”. The tab below is a copy of the Marketing Site tab we created.

Event Name	Description	Property name	Property type	Property description	Req'd
page_view	User visits a page	link_source	string	Value of UTM parameter defined as ?link_source={value}	O
form_submit	User submits a form	page_title	string	Title of the page	R
-	-	page_URL	string	URL of the page	R
-	-	form_id	string	The ID of the form (configured in Sanity)	R
-	-	label	string	Label for Google Analytics events (if needed)	O
-	-	category	string	Category for Google Analytics events (if needed)	O
-	-	utm_source	string	Optional utm parameters	O
-	-	utm_medium	string	Optional utm parameters	O
-	-	utm_campaign	string	Optional utm parameters	O
-	-	utm_content	string	Optional utm parameters	O
-	-	utm_term	string	Optional utm parameters	O
-	-	raid	string	Optional utm parameters	O
-	-	search_text	string	The text the user typed into the search field	R

With the basics of our Marketing Site source plan created, we can now upload it to RudderStack by configuring additional settings in the Google Sheet (more on this when we release the feature).

One exciting part of the Tracking Plans Google Sheet is that you can download the latest version of a tracking plan from the RudderStack Tracking Plan API, then upload any changes you make, ensuring everyone working on the plan has the most recent set of changes.

Once a Tracking Plan has been uploaded to the API via the Google Sheet, you are ready to begin using RudderTyper. Download instructions and tutorials will be made available to beta participants.

Tracking Plans are only one piece of the puzzle

As useful as RudderStack Tracking Plans will be (and already are for our team and beta users), it should also be noted that there will always be scenarios where you still need to transform the data once it arrives from the source, either for enrichment, filtering or massaging based on the needs of the various downstream destination tools. Tracking Plans and Transformations go hand-in-hand to ensure a stable and trustworthy data feed.

There may also be times where you aren’t sure what to do with particular variations of events streamed from your sources and in these cases sending them to a backup bucket such as Amazon S3 or Google Cloud Storage is an elegant solution. Check out our documentation for more information on how to leverage a variety of Cloud Storage Platforms.

Beta registration

As we continue our mission of giving developers full control over their data and their tools, we recognize and appreciate the commitments our customers have made to help improve the product and we thank you. If you would like more information on how to get signed up, please contact katie@rudderstack.com or hit us up on Slack.

September 22, 2021

Benji Walvoord

Product

Get the newsletter

Subscribe to get our latest insights and product updates delivered to your inbox once a month