Incremental Replication and Schema Updates
Skyvia allows loading only records that were changed (created/edited/deleted) in the cloud source since the previous replication run and applies these changes to the target database. When using incremental updates, Skyvia also allows tracking source metadata changes and applying some of them to target database without re-creating tables and reloading all the data.
Incremental Replication Requirements
Requirements to Source Cloud App
In some sources Incremental updates are not supported for some of the objects. For incremental updates, an object must have fields, storing record creation and modification timestamps. See details for each source in the Connectors section. If the object allows updating data, at least one of the fields with the creation and modification time must be present.
If you select objects without both such fields in replication (or if you excluded both fields from replication), the result depends on whether you replicate data to a database (like SQL Server, SQL Azure, MySQL, etc.) or a cloud data warehouse — Amazon Redshift, Azure Synapse Analytics, or Google BigQuery. An integration, replicating data to a database, will not run. When running such an integration, you will get an error that the source object does not have the corresponding fields, or they are not selected.
An integration, replicating data to a data warehouse will run successfully and load data. However, if an object does not have both of the fields, Skyvia will display a warning for such object in replication editor. Only the first (full) replication run will load data from this object to a data warehouse. Subsequent replication runs won’t load any data from this object.
Please note that if an object doesn’t have one of these fields (or you excluded one of them from replication), Skyvia won’t apply all the source data changes to the database:
- If an object does not have a field, storing record creation time, but has a field, storing record modification time, subsequent replications will update the cloud data warehouse records modified in the cloud app, but won’t add new records created in the cloud app.
- If an object does not have a field, storing record modification time, but has a field, storing record creation time, subsequent replications will load new records created in the cloud app, but won’t apply any modifications to existing records for updates made in the cloud app.
Besides, an object must have a primary key to replicate it with incremental updates. However, most cloud objects have primary keys/ids.
If the source is a database, you can specify the corresponding fields for each table.
Requirements to Target database
Replication requires privileges for creating and dropping tables and loading data. Note that incremental replication creates temporary tables each time it runs, and after running drops them.
Replication does not create temporary objects if you replicate data to cloud apps, like Airtable or Elasticsearch.
What to Do When There Are No Required Fields in Object
If some objects in your replication integration do not have fields, storing record creation and modification time, probably it would be the best to create two separate replication integrations:
- An integration with objects that support incremental updates. It will use incremental replication and may be scheduled for more frequent runs.
- An integration with objects that do not support incremental updates. It will re-create tables and reload all the data every time it runs. To reduce the number of processed rows, it’s better to schedule it to run seldom.
This approach, however, has a drawback that foreign keys between the tables, created by different integrations, are not created (and it’s better not to create them manually to avoid errors).
Processing Deleted Records
For some data sources, incremental replication not just loads new and updated records to a database, but also deletes records, deleted in the cloud source, from the database. This is supported for data sources and objects that store information about deleted records and allow querying it. For example, Salesforce stores soft deleted records for 30 days, and Skyvia applies soft deletes from Salesforce to database during incremental replication. Hard deletes leave no information about deleted records in Salesforce, and thus, are not applied to the database.
This feature is useful if you want to have an exact copy of cloud data in the database, but it might not be convenient if you want to replicate data for archiving purposes and want to keep your records, that were deleted in the cloud app, in the database.
Metadata Changes and Updating Target Schema
Replication with incremental updates supports automatic updates of the target database schema without the need to re-create tables and reload all the data. This feature can be turned on with the Update Schema checkbox.
This feature does not mean that Skyvia automatically detects changes in the source. This rather means that it can reflect changes you make to the replication integration in the database without losing the data already replicated to the database.
How Updating Schema Works
When replication runs, it checks the target database schema and compares it to the schema it would build based on the selected source objects and fields. If there are differences, Skyvia automatically applies changes that add tables and columns to the database. It ignores any extra tables/columns in the database.
Such differences may appear when you either manually modify the target database or you edit the replication integration. However, if source metadata changes, you still need to edit your replication integration to apply the changes to the database.
Metadata Change Detection
Skyvia does not support automatic detection of metadata changes. If metadata are changed in your cloud data source, replication will continue to run knowing nothing about these changes. If a new field is added to a replicated object, the replication will continue working, but the new field will not be replicated. If a replicated field or a replicated object is deleted or renamed in the data source, replication will start failing until you perform the necessary changes to it.
Besides, if the source connector supports metadata cache, no changes are detected untill the cache is cleared.
To detect and apply source metadata changes in a replication integration, you need to do the following:
- If the source connector supports metadata cache, clear the metadata cache of the source connection.
- Open the replication integration for editing. When you edit it, Skyvia automatically excludes the objects, deleted in the source, from replication.
- If all the object fields are replicated, and some of them were deleted, they are also excluded from the replication automatically. However, if some of the object fields were manually excluded, and some of the replicated fields were deleted in source, you need to manually edit the task and exclude the deleted fields.
- If fields were added to a replicated object, you need to edit the corresponding replication task and select check boxes for the corresponding fields to add them to replication.
- If new objects were created in the source, you also need to select them manually to add them to replication.
How Changes Are Applied
Skyvia does not apply automatically all the metadata changes that it detects. If a Skyvia finds any extra tables and columns in the database, not participating in replication, it ignores them.
Source Change | How It Is Applied to Database |
---|---|
An object is added to replication | The corresponding table is created in the database. However, only the records created/updated since the previous replication run, are replicated. If you need to load legacy data from that object, you can do it manually with a separate integration. |
An object is excluded from replication or deleted | Skyvia ignores it. The corresponding table with all the data stays in the database. |
An object is renamed | Skyvia treats such case as an object deleting and creating a new one. It keeps the table with the old name in the database, and creates a new table with the new name. |
A column is added to replication | Skyvia creates a new column in the corresponding table, filled with NULL values. |
A column is excluded from replication or deleted | Skyvia ignores this deletion and keeps the database column and all the data. For the future rows, the column will have NULL values. |
A column is renamed | Skyvia treats such case as a column deleting and creating a new one. It keeps the column with the old name in the database, and adds a new one with the new name. Data for the new records is loaded to the new column. |
A column data type is changed | Skyvia applies changes that widen the column data type.* |
* Note that different databases support changes to existing tables differently. Not every change can be applied in all supported databases. For example, data type changes may be treated differently in different databases.