Cursor Column Support Beta

Use cursor columns for efficient incremental upsert syncs.

RudderStack’s Cursor Column Support feature offers a more efficient way of syncing incremental data using the Reverse ETL sources in Upsert mode. With this feature, you can run cost-effective SQL queries to infer the changes to be synced by leveraging cursor columns.

This guide walks you through the cursor column feature in detail.

How the cursor column feature works

Previously, RudderStack identified the incremental data in upsert mode by comparing each column and row to detect changes. This led to longer sync times and increased warehouse costs, especially in case of large table size.

In the cursor mode, RudderStack leverages a cursor column specified by the user to track and identify the changes. It then syncs only those records with a cursor value greater than the last checkpoint (recorded internally by RudderStack), avoiding the need to compare all columns.

Note that:

  • For a full sync, all the records are synced. Generally, a full sync is triggered in the below cases:

    • First sync for a connection.
    • The data mapping is changed.
    • There is a change in the source schema, for example, the data type for one of the mapped columns is changed.
  • For the incremental syncs, RudderStack syncs all the rows whose cursor column value is greater than the last saved checkpoint from the cursor column.

  • RudderStack retries syncing any retryable records that failed to sync previously in the next sync - irrespective of their cursor column value. This includes failures due to errors with 429, 5xx, etc. status codes. See the Example for more information.

Use the cursor column feature

warning
Make sure to see Limitations before using this feature.
  1. Create a new Reverse ETL source and connect it to a destination.
  2. In the Data Mapping settings, set Sync mode to Upsert.
  3. Toggle on the Use cursor column setting.
  4. Specify the Cursor column from the dropdown.
info
RudderStack recommends choosing a cursor column that is comparable, that is, it supports operators like <, >, ORDER BY, etc.
Cursor column settings in RudderStack

Example

The following example illustrates how the cursor column feature works works:

Source table for first sync

emailscoreupdated_at
alex@example.com102024-10-03T05:36:30
alice@example.com202024-10-03T05:37:30
john@example.com302024-10-03T05:38:30

Cursor column selection

Suppose the user selects updated_at as the cursor column in the RudderStack dashboard. RudderStack uses this column to note the checkpoints.

First sync

Now, let’s assume two of the above records failed and one succeeds in the first sync:

emailscoreupdated_atstatus
alex@example.com102024-10-03T05:36:30Success
alice@example.com202024-10-03T05:37:30Failed
john@example.com302024-10-03T05:38:30Failed

Last checkpoint

In this case, RudderStack records the last checkpoint (2024-10-03T05:38:30) for the next incremental sync.

Source table for incremental sync

Next, suppose the source table is updated with new and updated records, as shown:

emailscoreupdated_atNotes
alex@example.com102024-10-03T05:36:30Unchanged record
john@example.com302024-10-03T05:38:30Unchanged record, previously failed
alice@example.com502024-10-03T05:39:30Changed record, previously failed

Next sync (incremental)

In this case, RudderStack leverages the previous checkpoint (2024-10-03T05:38:30) from the cursor column to sync the following records:

emailscoreupdated_atstatus
Notes
alice@example.com502024-10-03T05:39:30Successupdated_at value is greater than the previous checkpoint.

RudderStack does not sync the previously failed record for this email.
john@example.com302024-10-03T05:38:30SuccessRudderStack retries and syncs this record because it failed in the previous sync.

As seen above, RudderStack only syncs the records that are:

  • Updated after the previous checkpoint, or
  • Retryable records that failed in the previous sync
warning
RudderStack does not sync records that come in as a result of late arrival of data, that is, it will not sync the records even if they were updated but their cursor column value is less than the previous checkpoint.

Limitations

  • This feature does not support deletes. Since no diffing is involved, the sync does not track deleted records to perform any action on the deleted rows.
  • The feature only works for Reverse ETL sources of Table and SQL Model type - it does not support the Audience type.
  • This feature does not support the Amazon S3 source.
  • The users are entirely responsible for selecting an appropriate cursor column. RudderStack relies on the provided cursor column value to determine the incremental changes.

Questions? Contact us by email or on Slack