Sync real-time data streams from RudderStack to Snowflake using the Snowpipe Streaming API.
7 minute read
Snowpipe Streaming is a powerful tool for handling real-time data streams. It allows you to stream data rows directly in Snowflake tables with minimal latency.
Find the open source code for this destination in the GitHub repository.
When to use Snowpipe Streaming
Snowpipe Streaming is ideal for the following scenarios:
Continuous data streams: If your data sources produce a steady stream of data in small batches, for example, clickstream data, event logs, etc.
Low latency requirements: When your application requires immediate/frequent updates to the Snowflake table as new data arrives. Some use cases include real-time dashboards, streaming analytics, etc.
Cost optimization for streaming data: Snowpipe Streaming can potentially reduce costs for real-time pipelines as it allows micro-batch ingestion without needing a traditional Snowflake warehouse running constantly.
Prerequisites
Before you set up Snowpipe Streaming as a destination in RudderStack, make sure to set the correct user permissions in Snowflake to allow RudderStack to send the data correctly.
To enable RudderStack access, make sure to have the ACCOUNTADMIN role or an account with MANAGE GRANTS privilege.
The following sections show you how to set up a virtual warehouse, a database, a role, and an user in Snowflake:
Create warehouse
In your Snowflake console, create an X-Small warehouse.
RudderStack does not use the warehouse for data loading. It only uses the warehouse to adapt the schema and the update the tables in case of new properties in the events (or entirely new events).
Alternatively, run the following SQL commands to create a new warehouse:
Make sure to set AUTO_SUSPEND to ~10 minutes and enable AUTO_RESUME to avoid any extra costs.
Create database
RudderStack recommends creating a new database to avoid conflicts with your existing data. Note that RudderStack creates its own tables within this database while storing your events.
Alternatively, you can create a new database by running the following SQL command:
CREATEDATABASE"<DATABASE_NAME>";
Create role for RudderStack
Run the following SQL commands in the exact order to create a new role with the required permissions to load your data into the warehouse created above.
Make sure to replace the placeholder names with your preferred names.
In your RudderStack dashboard, add a source. Then select Snowpipe Streaming from the list of destinations.
Assign a name to your destination and click Continue.
Connection settings
Account: Enter the account ID of your Snowflake warehouse - this ID is part of the Snowflake URL. The below table illustrates the slight differences in the account IDs depending on the various cloud providers. See the Snowflake documentation for more information on the account locator formats depending on region or cloud provider.
Database: Enter the name of the database created in the Create database section.
Warehouse: Enter the name of the warehouse created in the Create warehouse section.
User: Enter the name of the user created in the Create user section.
Role: Specify the role to be assigned to the above user. If not specified, RudderStack uses the default role. Make sure your role has the necessary permissions for RudderStack to load the data into the warehouse.
Private Key: Generate private key and specify it in this field. Make sure to include the delimiters.
Private Key Passphrase: Specify the password you set while encrypting the private key. Leave this field blank if your private key is not encrypted.
The user authentication will fail if your private key is encrypted and you do not specify the passphrase.
Namespace: Enter the schema name for the warehouse where RudderStack creates all tables. If not specified, RudderStack sets the namespace to the source name by default. Note that you cannot change the namespace later.
If your sources send data to different schemas, RudderStack recommends creating multiple Snowpipe Streaming destinations (each with a different namespace) and connecting the relevant sources accordingly.
Advanced settings
RudderStack provides the following advanced settings:
Skip Tracks Table: Toggle on this setting to skip sending events to the tracks table.
JSON Columns: This setting lets you ingest semi-structured event data not defined by a fixed schema. You can specify the required JSON column paths in the dot notation, separated by commas. See the JSON Column Support guide for more information.
Consent settings
Consent management settings: Configure the consent management settings for the specified source by choosing the Consent management provider from the dropdown and entering the relevant consent category IDs. See Consent Management in RudderStack for more information on this feature.
IPs to be allowlisted
To enable network access to RudderStack, allowlist the following RudderStack IPs depending on your region and RudderStack Cloud plan:
Plan
Region
US
EU
Free, Starter, and Growth
3.216.35.97
18.214.35.254
23.20.96.9
34.198.90.241
34.211.241.254
52.38.160.231
54.147.40.62
3.123.104.182
3.125.132.33
18.198.90.215
18.196.167.201
Enterprise
3.216.35.97
34.198.90.241
44.236.60.231
54.147.40.62
100.20.239.77
3.66.99.198
3.64.201.167
3.123.104.182
3.125.132.33
All the outbound traffic is routed through these RudderStack IPs.
Migrate from Snowflake destination
This section highlights the steps for transitioning from the Snowflake destination to the Snowpipe Streaming destination for storing your events.
Pre-migration steps
Before you migrate to the Snowpipe Streaming destination, follow these steps to verify whether you can perform the migration successfully:
Set up a Snowpipe Streaming destination with a new schema (configured using the Namespace setting).
Connect an existing source to this destination. Do not disconnect this source from the Snowflake destination.
Send some events and check the existing schema (associated with Snowflake destination) and the new schema (associated with the Snowpipe Streaming destination).
If they are compatible, you can perform the migration successfully.
Case 1: Sources write to the same schema
This case is applicable in the below scenarios:
If all the sources writing to the Snowflake destination use the same schema (namespace).
If only one source is connected to that destination.
Create only one Snowpipe Streaming destination in the RudderStack dashboard and specify the same namespace as the one used in the Snowflake destination.
Connect all the relevant sources to the Snowpipe Streaming destination.
Disconnect the above sources from the Snowflake destination.
To avoid any event loss, make sure to connect all the relevant sources to the Snowpipe Streaming before disconnecting them from the Snowflake destination.
Note that there is a time window between the connection/disconnection process (steps 2 and 3 listed above) where both the destinations deliver events. This should be fine if the Snowflake destination was configured with the merge functionality (Warehouse Append setting turned off) that deduplicates data while merging. Otherwise, there is a possibility of duplicates.
Case 2: Sources write to different schemas
If your sources send data to different schemas, RudderStack recommends creating multiple Snowpipe Streaming destinations (each with a different namespace) and connecting the relevant sources accordingly.
FAQ
What is the difference between Snowpipe Streaming and traditional Snowflake warehouse?
The following table highlights the key differences between Snowpipe Streaming and the traditional Snowflake warehouse:
Feature
Snowpipe Streaming
Traditional Snowflake warehouse
Use case
Real-time data use
Analytical and batch processing
Latency
Real-time or near real-time
Higher latency (batch-oriented)
Data volume
Small, continuous data streams
Large, periodic batches
Cost efficiency
Optimized for streaming and real-time ingestion
Optimized for batch processing
Why am I not seeing the users table in the schema for identify events?
This integration sends the identify events exclusively to the identifies table and skips the users table entirely.
This site uses cookies to improve your experience while you navigate through the website. Out of
these
cookies, the cookies that are categorized as necessary are stored on your browser as they are as
essential
for the working of basic functionalities of the website. We also use third-party cookies that
help
us
analyze and understand how you use this website. These cookies will be stored in your browser
only
with
your
consent. You also have the option to opt-out of these cookies. But opting out of some of these
cookies
may
have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This
category only includes cookies that ensures basic functionalities and security
features of the website. These cookies do not store any personal information.
This site uses cookies to improve your experience. If you want to
learn more about cookies and why we use them, visit our cookie
policy. We'll assume you're ok with this, but you can opt-out if you wish Cookie Settings.