Replication Overview

Replication is used to create a copy of cloud application data in a relational database and keep it up-to-date. When performing replication, you define application objects and their fields to copy and configure replication options. Unlike import, replication does not support loading data of different structure in source and in target and using any custom mapping; however, it is much simpler to configure. Replication also does not require a database with a prepared schema — it can create database tables for cloud data automatically.

Replication can be used when you need to backup cloud application data to a relational database, archiving historical data that is no more a subject to change and is not needed in the cloud application. It can be useful in different data integration scenarios, or just for performing data analysis and reporting with powerful data analysis and reporting tools developed for relational databases.

Replication Source and Target

Skyvia’s replication loads data from a cloud application to a database or data warehouse service. So a replication source can only be a cloud application, like Salesforce or Mailchimp, etc. And a target can be a cloud data warehouse service or a database server, either cloud-based, like Amazon RDS or Google Cloud SQL, or on-premise one.

Note that if you want to use an on-premise database server, it must be accessible from the Internet. You can find some advice on configuring your database server in order to access it from Skyvia in the How to Configure Local Database Server to Access It from Skyvia topic.

For cloud data warehouses, Skyvia uses warehouse-specific optimized data uploading. It uploads data to a data warehouse-specific storage as CSV files and then commands the warehouse to import them. Thus, in order to perform replication to a cloud data warehouse, you need to specify not just mandatory connection parameters, but also the optional storage service-related parameters.

Automatic Schema Creation

Skyvia does not require already created database tables for data replication. It can automatically create the necessary database tables with the same structure as the replicated cloud objects. Tables creation is optional. Skyvia can use existing ones, but in this case their structure — columns, data types — must be the same as the structure of the cloud objects. Skyvia can also optionally delete the tables with the corresponding names before creating them.

Skyvia can also create foreign keys in the database, corresponding to the relations between cloud objects. This can be convenient, but for some data sources it is not recommended. Foreign key constraints are enforced in databases, but may not be enforced in some cloud sources, where child records may reference an already deleted, not existing parent record. If you receive foreign key violation errors during replication, try repeating a full replication with re-creating tables, but without creating foreign keys.

Foreign keys are not created for polymorphic relations between the cloud object (relations when the same field of a child object may reference different parent objects).

Keeping Database Up-to-date

Skyvia can keep the database up-to-date with the cloud application. For this, you can schedule your replication for automatic execution. There are two ways to keep the database up-to-date:

  1. Performing a full replication each time with dropping and re-creating tables.
  2. Performing a full replication only the first time, to load legacy data, and then performing incremental replication.

Incremental Replication

Incremental replication means that Skyvia queries only records that were changed (created/edited/deleted) in the cloud source since the previous replication run and applying these changes to the target database. As only changed records are queried and loaded, incremental replication allows using less records from your monthly subscription limit and in most cases less API calls to the cloud source. So it’s recommended to use incremental replication when possible to keep the database up-to-date.

The incremental replication is controlled by the Incremental Updates setting and the LastSyncTime parameter in your replication packages.

Incremental Replication Requirements

In some sources Incremental updates are not supported for some of the objects. For incremental updates, an object must have fields, storing its creation and modification time. See details for each source in the Connectors section. If the object allows updating data, both fields with the creation and modification time must be present. (The only exception is G Suite, for it one Updated field is enough).

If the object allows only inserting new records, but not updating existing ones, the field with record creation time is enough.

For Salesforce objects that support deleting records, the IsDeleted field is also required.

If you select objects without such fields in a replication package, the result depends on whether you replicate data to a database (like SQL Server, SQL Azure, MySQL, etc.) or a cloud data warehouse — Amazon Redshift, Azure SQL Data Warehouse, or Google BigQuery. A package, replicating data to a database, will not run. When running such a package, you will get an error that the source object does not have the corresponding column.

A package, replicating data to a data warehouse will run successfully and load data. However:

  • If an object does not have both of the fields, Skyvia will display a warning for such object in a replication package editor. Only the first (full) replication run will load data from this object to a data warehouse. Subsequent replication runs won’t load any data from this object.
  • If an object does not have a field, storing record creation time, but has a field, storing record modification time, no warnings are displayed. Subsequent replications will update the cloud data warehouse records modified in the cloud app, but won’t add new records created in the cloud app.
  • If an object does not have a field, storing record modification time, but has a field, storing record creation time, no warnings are displayed. Subsequent replications will load new records created in the cloud app, but won’t apply any modifications to existing records for updates made in the cloud app.

Besides, an object must have a primary key to replicate it with incremental updates. However, most cloud objects have primary keys/ids.

What to Do When There Are No Required Fields in Object

If some objects in your replication package do not have fields, storing record creation and modification time, probably it would be the best to create two separate replication packages:

  • A package with objects that support incremental updates. It will use incremental replication and may be scheduled for more often runs.
  • A package with objects, not supporting incremental updates. It will re-create tables and reload all the data every time it runs. To reduce the number of processed rows, it’s better to schedule it to run seldom.

This approach, however, has a drawback that foreign keys between the tables, created by different packages, are not created (and it’s better not to create them manually to avoid errors).

Processing Deleted Records

For some data sources, incremental replication not just loads new and updated records to a database, but also deletes records, deleted in the cloud source, from the database. This is supported for data sources and objects that store information about deleted records and allow querying it. For example, Salesforce stores soft deleted records for 30 days, and Skyvia applies soft deletes from Salesforce to database during incremental replication. Hard deletes leave no information about deleted records in Salesforce, and thus, are not applied to the database.

This feature is useful if you want to have an exact copy of cloud data in the database, but it might not be convenient if you want to replicate data for archiving purposes and want to keep your records, that were deleted in the cloud app, in the database.

Partial Replication

You can replicate all or only part of the data with Skyvia. You can select the objects to replicate data from when creating a replication package. For each of the selected objects, you may exclude any fields from replication. For example, if you don’t want to replicate personal information, like emails, names, etc. to the database, you may simply exclude these fields.

However, please note that you need to have at least the following fields selected in order to use incremental replication:

  • object key fields
  • fields storing record creation and modification time
  • fields, determining whether a record was deleted (for data sources and objects that have such fields)

In addition to excluding fields, you may also apply filters for each replicated object separately and replicate only records, matching certain conditions. This can be combined with both full and incremental replication — incremental replication in such case will load only records, matching the filter AND created/modified since the previous replication run.

Metadata Changes

Note that replication packages don’t detect metadata changes in the source. If metadata is changed in your cloud data source, a replication package will continue to run knowing nothing about these changes. If a new field is added in a replicated object, the replication will continue working, but the new field won’t be replicated. If a replicated field or a replicated object is deleted or renamed in the data source, the replication package will start failing until you perform the necessary changes to it.

Please read the instruction on what you should do when source metadata changes here.