Dogfooding at RudderStack: Tracking Plans Part 1

Blog Banner

What are Tracking Plans, & why do you need one?

At RudderStack we talk a lot about the importance of owning your own data and the competitive advantage that can come from building robust analytics with complete data. Data trust is fundamental to this construct. In order to trust the data, you must trust the tools that are providing that data. That’s why we built our Data Governance API and the new Tracking Plans feature we are getting ready to beta.

RudderStack Tracking Plans are the latest offering within our Data Governance API and have been one of our most requested features to date. Unlike RudderStack Transformations, which allow you to transform your data in flight, Tracking Plans allow you to plan and prescribe what your data should look like in the first place. Tracking Plans address three fundamental issues related to streaming data:

  1. Missing or improperly configured data breaks downstream SaaS applications and data warehouses. This causes problems like poorly executed automated campaigns and broken dashboards.
  2. Poorly named and duplicative events and properties. This creates confusion and mismapping in downstream tools and data warehouses.
  3. Upstream data providers make changes resulting in altered event streams. This leads to 1 and 2 above with little to no advanced notice or the ability to fix it.

How Tracking Plans solves these issues

RudderStack Tracking Plans allow you to define the specific event names and properties for each of your track, group and identify calls. In addition, you can assign the type of data associated with each property or attribute and specify whether that property or attribute is required. The Tracking Plans API also supports versioning for better control of your data streaming.

With your Tracking Plans in place, you can use the existing Data Governance API’s to evaluate your inbound events, payload samples and metadata to compare them against your plans. You can also use the RudderTyper tool we’re releasing alongside Tracking Plans. RudderTyper is a tool for generating strongly-typed RudderStack analytics library wrappers based on your published tracking plan specs, meaning your data will conform to your defined schema upon capture.

What does the future hold?

Well, that’s where we need your help. We are currently working with a few Alpha customers and using Tracking Plans ourselves on our own production instance of RudderStack. What we have on the roadmap are decisions about what types of errors or schema violations we want to track and then how to handle them. Although not set in stone, here is a sneak peak into what we’re thinking so far:

Violation TypeDescriptionAction Taken
Unplanned EventsAn event for which no schema has been defined.
Unplanned PropertiesThe event is defined but the property or attribute does not exist in the schema
Mismatched Data TypeData type for a particular property does not match what is defined in the schema
Required Field MissingThe payload is missing a property set as required in the schema

Once this feature is fully built, the actions taken on each of these violations could include one or more of the following:

  1. Rejecting the entire payload
  2. Accepting the entire payload and sending it to downstream destinations w/ a warning flag
  3. Rerouting the entire payload to an S3 bucket (aka “dead letter queue”)
  4. Removing the additional properties from the payload that are not defined in the schema
  5. Inserting default values for required fields missing in the schema
  6. More advanced options based on schema comparisons outlined here: https://ajv.js.org/options.html

So, where do you start?

Connect with us on Slack or shoot us an email if you are interested in participating.

In the meantime, let’s take a look at how your typical SaaS business would walk through the steps of designing and implementing Tracking Plans with RudderStack. As a part of the RudderStack Data Governance API, Tracking Plans are first and foremost managed through code, but we understand that designing the plan will be a collaborative effort involving developers and non-developers, so we designed a Tracking Plans Template Google Sheet to help get teams started.

The first step is to get your hands on a copy of the RudderStack Tracking Plans Template which will be available soon. This will help you and your team organize the various events and fields you want to capture from each of your RudderStack sources. The sheet does require that you have a user access token for your account. For help on how to create a user token, check out our Access Token user documentation.

The next step is to create a wish list of events and properties you think you might need. The goal of this first pass is not to create the be-all-end-all list, but primarily to see where data needs intersect amongst the various stakeholders and to begin building out the data architecture for your company. During this exercise, it can be helpful to start with existing higher-level paradigms like the sales and marketing funnel or executive summary reports as the underlying metrics for these are generally already agreed upon. Starting with what you already know you need to measure is a great way to begin drilling into how you measure it and, more specifically, where the data comes from in the first place and what properties or attributes will be measured (i.e., required keys and data types).

For example, let’s take a sample SaaS business that has a funnel measuring the following:

StageTeam
Unique Site VisitorsMarketing - Paid Digital
LeadsMarketing - Engagement
MQLsMarketing - Engagement
Opportunities / Free TrialsSales - Outbound
Product activation / POCSales - Sales Engineering
CustomersSales - Coffee Drinkers
Product usageCustomer Success

Now that we have each stage defined, let’s dive deeper into exactly what data elements will need to be created and tracked to reproduce our funnel and assign a source for the data. It is important to note that in some cases, such as defining a Marketing Qualified Lead (MQL), there may be multiple sources of information that contribute to qualifying any one particular lead, but in this table we are defining what system retains that information so that, should we ever need to perform an audit, Salesforce (in this example) is the system where we would confirm whether this particular lead was flagged as a MQL or not. As we are defining each metric, we will assign it to a tracking plan on our google sheet.

Funnel StepSourceMetricTracking Plan
VisitorMarketing Website & App Count of Distinct Anonymous IDPage View (Marketing) Page View (Application)
LeadMarketing Website & App Count of Distinct Email Addresses per domainForm Submit (Marketing) App Signup (Application)
MQLSalesforceCount of Salesforce Leads (not deleted) with MQL checked N/A (SFDC ETL)
Opportunity / Free TrialSalesforceCount of Opportunities where Opp Type = InitialN/A (SFDC ETL)
Product ActivationAppHas the User Created a ConnectionConnections Created
CustomerSalesforceOpportunity = Close WonOpportunity Won
Product UsageAppTotal Event VolumeN/A (aggregated from warehouse tables)

Some of our metrics will come from RudderStack ETL sources or other non-RudderStack tables in our data warehouse and therefore will not be defined in our Tracking Plan for event data.

Building out Tracking Plans

In the funnel map above we defined six different events and three different tracking plans that we want to build. This by no means defines the totality of your tracking plans but will be enough to get you started using the tools.

RudderStack Source (Tracking Plan)User Action NameRudderStack Event Name
Marketing SitePage Viewpage_view
Marketing SiteForm Submitform_submit
ApplicationPage Viewpage_view
ApplicationApp Signupapp_signup
ApplicationConnection Createdconnection_created
Salesforce Webhook*Opportunity Wonopp_won

*Typically Salesforce and other SaaS tools will have data extracted using RudderStack ETL every 24 hours, however critical events like marking an Opportunity as won are important enough to trigger a real-time event being passed back through a Webhook source.

With the sources and events defined, we now need to identify the properties and property types for each event. These should now be added to the Tracking Plans Google Sheet. Each Source should have its own tab copied from the “Import Template”. The tab below is a copy of the Marketing Site tab we created.

Event NameDescriptionProperty nameProperty typeProperty descriptionReq'd
page_viewUser visits a pagelink_sourcestringValue of UTM parameter defined as ?link_source={value}O
form_submitUser submits a formpage_titlestringTitle of the pageR
--page_URLstringURL of the pageR
--form_idstringThe ID of the form (configured in Sanity)R
--labelstringLabel for Google Analytics events (if needed)O
--categorystringCategory for Google Analytics events (if needed)O
--utm_sourcestringOptional utm parametersO
--utm_mediumstringOptional utm parametersO
--utm_campaignstringOptional utm parametersO
--utm_contentstringOptional utm parametersO
--utm_termstringOptional utm parametersO
--raidstringOptional utm parametersO
--search_textstringThe text the user typed into the search fieldR

With the basics of our Marketing Site source plan created, we can now upload it to RudderStack by configuring additional settings in the Google Sheet (more on this when we release the feature).

One exciting part of the Tracking Plans Google Sheet is that you can download the latest version of a tracking plan from the RudderStack Tracking Plan API, then upload any changes you make, ensuring everyone working on the plan has the most recent set of changes.

Once a Tracking Plan has been uploaded to the API via the Google Sheet, you are ready to begin using RudderTyper. Download instructions and tutorials will be made available to beta participants.

Tracking Plans are only one piece of the puzzle

As useful as RudderStack Tracking Plans will be (and already are for our team and beta users), it should also be noted that there will always be scenarios where you still need to transform the data once it arrives from the source, either for enrichment, filtering or massaging based on the needs of the various downstream destination tools. Tracking Plans and Transformations go hand-in-hand to ensure a stable and trustworthy data feed.

There may also be times where you aren’t sure what to do with particular variations of events streamed from your sources and in these cases sending them to a backup bucket such as Amazon S3 or Google Cloud Storage is an elegant solution. Check out our documentation for more information on how to leverage a variety of Cloud Storage Platforms.

Beta registration

As we continue our mission of giving developers full control over their data and their tools, we recognize and appreciate the commitments our customers have made to help improve the product and we thank you. If you would like more information on how to get signed up, please contact katie@rudderstack.com or hit us up on Slack.

September 22, 2021
Benji Walvoord

Benji Walvoord