Use Profiles’ predictive features to train machine learning models.
7 minute read
Predictions is part of our Early Access Program, where we work with early users and customers to test new features and get feedback before making them generally available. These features are functional but can change as we improve them. We recommend connecting with our team before running them in production.
Predictions extends Profiles’ standard feature development functionality. It lets you easily create predictive features in your warehouse and answer questions like:
Is a customer likely to churn in the next 30 days?
Will a user make a purchase in the next 7 days?
Is a lead going to convert?
How much is a user likely to spend in the next 90 days?
Further, you can add the predicted feature to user profiles in your warehouse automatically and deliver ML-based segments and audiences to your marketing, product, and customer success teams.
The following self-guided tour shows you how to build the predictive traits. You can also follow the Predictions sample project guide and build the project yourself, including sample data.
Use cases
Churn prediction: Predicting churn is one of the crucial initiatives across businesses. Without a predicted churn score, your actions are reactive, whereas you can act proactively with a user trait like is_likely_to_churn. Once you have such features, you can activate them with the appropriate outreach programs to prevent user churn.
Customer LTV prediction: Predictions helps you understand your customers’ purchasing behavior over time. You can predict how much amount a particular customer is likely to spend within the predicted time range.
Python model
You can generate predictive features using a python_model which involves two key steps - train and predict.
The following profiles.yaml file shows how to use a python_model:
models:- name:shopify_churnmodel_type:python_modelmodel_spec:occurred_at_col:insert_tsentity_key:uservalidity_time:24h# 1 daypy_repo_url:https://github.com/rudderlabs/rudderstack-profiles-classifier.git# Do not modify # this value as the actual logic resides in this repo.train:file_extension:.jsonfile_validity:60minputs:&inputs- packages/feature_table/models/shopify_user_featuresconfig:data:label_column:is_churned_7_days label_value:1prediction_horizon_days:7output_profiles_ml_model:*model_nameeligible_users:lower(country) = 'us' and amount_spent_overall > 0inputs:*inputsentity_column:user_main_idrecall_to_precision_importance:1.0preprocessing:ignore_features:[name, gender, device_type]predict:inputs:- packages/feature_table/models/shopify_user_featuresconfig:outputs:column_names:percentile:&percentile_namepercentile_churn_score_7_daysscore:churn_score_7_daysfeature_meta_data:&feature_meta_datafeatures:- name:*percentile_namedescription:'Percentile of churn score. Higher the percentile, higher the probability of churn'
Model parameters
The detailed list of parameters used in the python_model along with their description are listed below:
Parameter
Description
py_repo_url Required
The actual logic for Predictions resides in this remote repository. DO NOT modify this value.
file_extension Required
Indicates the file type. This is a static value and does not need to be modified.
file_validity Required
If the last trained model is older than this duration, then the model is trained again.
inputs Required
Path to the base feature table project. You must add &inputs to it.
label_column Required
Name of the feature (entity_var) you want to predict. It is defined in the feature table model.
label_value
Expected label value for users who performed the event
prediction_horizon_days Required
Number of days in future for which you want to make the prediction.
Eligibilty criteria for the users for which you want to define predictive features. You can set this criteria by defining a SQL statement referring to the different entity_vars. To build a model for all the available users, you can leave this parameter as blank.
For example, if you want to train the model and make predictions only for the paying users from US, then define country='US' and is_payer=true.
config.data.inputs
Path to the referenced project.
entity_column
If you change the value ofid_column_name in the ID stitcher model, you should specify it here. This field is optional otherwise.
recall_to_precision_importance
Also referred to as beta in f-beta score, this field is used in classification models to fine-tune the model threshold and give more weight to recall against precision.
Note: This is an optional parameter. If not specified, it defaults to 1.0, giving equal weight to precision and recall.
ignore_features
List of columns from the feature table which the model should ignore while training.
percentile Required
Name of column in output table having percentile score.
score Required
Name of column in output table having probabilistic score.
description Required
Custom description for the predictive feature.
If you want to run your python model locally using a CLI setup, you must set up a python environment with the required packages and add the python path to your siteconfig.yaml file.
Project setup
This section highlights the project setup steps for a sample churn prediction and LTV model.
Optional: If you are using Snowflake, you might need to create a Snowpark-optimized warehouse if your dataset is significantly large.
Churn prediction/LTV model
1. Create a Profiles project with Feature Table model
Follow the Feature table guide to create a Profiles project. Your project must include the definition of the feature you want to predict.
For example, to predict 30-day inactive churn, you should define it as a feature (entity_var) in the feature table so that the model knows how to compute this for historic users.
entity_var:name:churn_30_daysselect:case when days_since_last_seen >= 30 then 1 else 0 end
2. Create a python model and train it
Create a python_model and pass the Feature table model as an input.
Add the following set of parameters in the train block:
Unlike churn prediction, you should not specify the label_value and recall_to_precision_importance fields.
The LTV model introduces a new parameter called task which you must set to regression. Profiles assumes a classification model by default, unless explicitly specified otherwise.
3. Define predictive features
Add the following set of parameters in the predict block:
predict:inputs:- packages/feature_table/models/shopify_user_featuresconfig:data:*model_data_input_configsoutputs:column_names:percentile:&percentile_namepercentile_churn_score_7_daysscore:churn_score_7_daysfeature_meta_data:&feature_meta_datafeatures:- name:*percentile_namedescription:'Percentile of churn score. Higher the percentile, higher the probability of churn'
predict:inputs:- packages/feature_table/models/shopify_user_featuresconfig:data:*model_data_configspreprocessing:*model_prep_configsoutputs:column_names:percentile:&percentile_namepercentile_predicted_amount_spentscore:predicted_amount_spentfeature_meta_data:&feature_meta_datafeatures:- name:*percentile_namedescription:'Percentile of predicted future LTV. Higher the percentile, higher the expected LTV.'
4. Run your project
Once you have created the project, you can choose to run it using either of the following ways:
Using Profile CLI
If you have created your Predictions Profiles project locally, run it using the pb runCLI command to generate output tables.
Using Profiles UI
Contact us to enable this feature for your account.
This site uses cookies to improve your experience while you navigate through the website. Out of
these
cookies, the cookies that are categorized as necessary are stored on your browser as they are as
essential
for the working of basic functionalities of the website. We also use third-party cookies that
help
us
analyze and understand how you use this website. These cookies will be stored in your browser
only
with
your
consent. You also have the option to opt-out of these cookies. But opting out of some of these
cookies
may
have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This
category only includes cookies that ensures basic functionalities and security
features of the website. These cookies do not store any personal information.
This site uses cookies to improve your experience. If you want to
learn more about cookies and why we use them, visit our cookie
policy. We'll assume you're ok with this, but you can opt-out if you wish Cookie Settings.