Connecting to Google BigQuery
Google BigQuery is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data.
Google BigQuery-Specific Features and Limitations
Since Google BigQuery does not have primary or unique keys, Skyvia has the following limitations for Google BigQuery:
- Synchronization is not supported for Google BigQuery.
- UPSERT operation in Import is not supported for Google BigQuery.
- When performing import with the UPDATE or DELETE operation, you need to manually specify columns, which will be considered a primary key.
Google BigQuery Connections
To create a Google BigQuery connection, perform the following steps:
- Click +NEW in the top menu.
- Open the Select Connector page by clicking Connection in the menu on the left.
- In the opened pane, click Google BigQuery.
In the Connection Editor page, in the Untitled field, specify a connection name that will be used to identify the connection.
- Click Sign In with Google.
In the opened window, enter your email or phone and click Next.
Enter your password and click Next once again.
Click the Allow button.
- Specify the Project Id and DataSet Id to connect to.
- Also specify the name of the bucket (Cloud Storage Bucket) that will be used for temporary upload of CSV files when performing Import or Replication to Google BigQuery. The uploaded files will be deleted after the corresponding operations are finished.
- You can also optionally click Advanced Settings and set the Command Timeout interval. It specifies the wait time before terminating an attempt to execute a command and generating an error. Note that it is the time to wait for any server reply since the command was sent to a server, and it doesn’t include the time necessary to fetch data if the command selects some data.
- Here you can also optionally enable the use of legacy SQL syntax by selecting the Use Legacy SQL checkbox.
In Advanced Settings, you can also disable bulk import by clearing the Use Bulk Import checkbox. This setting affects only import packages with the INSERT, UPDATE, and DELETE operation and with BigQuery as a target.
By default, such packages import data in the following way: Skyvia writes data into multiple temporary CSV files, upload them to Google Cloud Storage and then tells Google BigQuery to import data from these CSV files. These actions are performed simultaneously, providing the best performance. After the CSV files are imported, they are deleted. However, when data are imported this way, it’s not possible to obtain a per-record error log.
If you disable bulk import, Skyvia will use the corresponding DML statements for importing data. This allows you to obtain a per-record error log, but provides far less performance. Thus, disabling bulk import for BigQuery is not recommended and should only be considered if you need to import a small amount of data and need to have a per-record error log or you don’t want to use Google Cloud Storage for temporary files.
- Click the Create Connection button to create the connection.
Supported Actions and Actions Specifics
Google BigQuery connector supports the following actions:
- Execute Command in Source, Lookup, and Target Data Flow components and in Import and Export tasks in the Advanced mode.
Execute Query in Source Data Flow components and in Import and Export tasks in the Advanced mode.
- Lookup in Lookup Data Flow components.
- Insert in Target Data Flow components.
- Update in Target Data Flow components.
- Delete in Target Data Flow components.