Connecting to Snowflake
Snowflake is a scalable cloud-based data platform, which enables data storage, processing and analytic solutions that are faster, more flexible and easier to use. The Snowflake data platform is not built on any existing database technology or “big data” one, instead it combines a completely new SQL query engine with an innovative architecture natively designed for the cloud.
Snowflake-Specific Features and Limitations
Skyvia has the following limitations for Snowflake:
In the Data Integration product, Snowflake connections are supported only for packages, which use a new data integration runtime.
Make sure you select the Use new runtime check box on the tab bar if you want to use Snowflake in your package. You need to select this check box each time you create a new data integration package.
- Skyvia supports the following types of import operations for Snowflake: INSERT, UPDATE, DELETE. Please note that UPSERT operation is not supported.
- Replication is supported for Snowflake, but it works ONLY in the Bulk Load mode (the check box Use Bulk Import should be selected). To connect to Snowflake, Schema and Files Storage for Bulk parameters are also necessary.
- Synchronization is not supported for Snowflake.
To connect to Snowflake, you need to specify such required parameters as domain, database, user name and password. If you need to perform import or replication, you will also have to specify some additional parameters described below.
You need to specify the following required connection parameters:
- Domain is a Snowflake account domain.
- User is a user name to log in with.
- Password is a password to log in with.
- Database is a database name.
Snowlake also envisages the following optional connection parameters:
- Schema is a current schema name in the database. Although this parameter is optional, you need to specify it when using replication or import in the Bulk Load mode.
- Warehouse is a name of the warehouse used for a database.
- Role is a role name used to connect.
- Command Timeout — this parameter specifies the wait time before terminating an attempt to execute a command and generating an error. Note that it is the time to wait for any server reply since the command was sent to a server, and it doesn’t include the time necessary to fetch data if the command selects some data.
- STRING(36) As Guid — this parameter specifies whether to interpret values in the columns of the STRING (36) type as Guid values.
- Files Storage for Bulk — you need to specify an external temporary file storage service to perform replication or import in the Bulk Load mode in Snowflake. This parameter allows you to select such possible external file storage services as None, Amazon S3 and Azure. Depending on the selected storage service, you need to specify additional parameters, which are mandatory to connect to one or another storage service.
Use Bulk Import — this parameter appears when you select Amazon S3 or Azure as an external file storage. For bulk loading, it is necessary to specify an external service for storing temporary files. You can also disable bulk import by clearing the Use Bulk Import check box. This parameter affects only import packages with the INSERT, UPDATE and DELETE operation and with Snowflake as a target.
By default, such packages import data in the following way: Skyvia writes data into multiple temporary CSV files, uploads them to Amazon S3 or Azure (depending on which one you’ve selected) and then tells Snowflake to import data from these CSV files. These actions are performed simultaneously, providing the best performance. After the CSV files are imported, they are deleted. However, when data are imported this way, it’s not possible to obtain a per-record error log.
If you disable bulk import, Skyvia will use INSERT, UPDATE and DELETE statements for importing data. This allows you to obtain a per-record error log but provides far less performance. Thus, disabling bulk import for Snowflake is not recommended and should only be considered if you need to import not that many records and need to have a per-record error log or you don’t want to use external storage for temporary files.
When selecting Amazon S3 Storage from Files Storage for Bulk, you need to specify the following additional parameters:
- AWS Access Key ID — first part of your Amazon Web Services access key.
- AWS Secret Key — second part of your Amazon Web Services access key. Read more about AWS access keys here.
- AWS Security Token — an alternative option to AWS Access Key ID and AWS Secret Key pair. The Amazon Web Services security token is a temporary limited-privilege credential.
- S3 Region — AWS region, where your S3 storage is hosted.
- S3 Bucket Name — it is an optional parameter. You may specify the name of your S3 bucket to temporarily load the file with imported or replicated data to. If you leave it empty, Skyvia will create a new bucket when importing or replicating data and delete it after the operation is finished.
When selecting Azure Blob Storage from Files Storage for Bulk, you need to specify the following additional parameters:
- Azure Storage Account — Azure storage account name.
- Azure Storage Account Key — your 512-bit storage access key.
- Azure Storage Endpoints Protocol — this parameter determines the protocol to use (HTTPS or HTTP). You may optionally change the protocol to use from default https to http, but this is not recommended.