Databricks

Databricks is a cloud-based data warehouse. It’s designed to let teams store, process, analyze, and model massive amounts of data in one place.

Establishing Connection

To create a connection to Databricks, select the authentication type and specify the required connection parameters. Skyvia supports the following authentication types for Databricks connection:

To connect to Databricks, enable Unity Catalog in your workspace. As of November 8, 2023, Databricks automatically enables it for new workspaces.

Getting Credentials

Personal Access Token

To connect to Databricks using Personal Access Token, you need to get the token first. Perform the following steps:

  1. In your Databricks workspace, click your Databricks username in the top bar, and then select Settings from the drop down.

  2. Click Developer.

  3. Next to Access tokens, click Manage.

    Access tokens

  4. Click Generate new token.

    Generate new token

  5. Add a comment and click Generate.

  6. Copy the token.

Client ID and Client Secret

Before creating a Skyvia connection to Databricks, register Skyvia as an OAuth 2.0 application in your Databricks Account Console and obtain a Client ID and a Client Secret.

  1. Open Databricks Account Console. Navigate to Settings.

  2. Select the App connections tab. Click Add connection.

    App connections page

  3. Enter any Application Name.

  4. Specify this exact Redirect URL:

    https://app.skyvia.com/oauthcallback/databricks

  5. Choose All APIs in Access scopes.

  6. Select the Client secret checkbox.

  7. Click Add and copy Client ID and Client Secret.

Creating Connection Using Personal Access Token authentication

To connect to Databricks using OAuth 2.0, specify these required parameters:

Personal Access Token Connection

  • Domain - a Databricks Workspace URL.
  • Personal Access Token - a user-generated token from your Databricks account.
  • Warehouse - the name of the Databricks SQL warehouse.
  • Catalog - the Unity Catalog to use as the default context for queries.

Creating Connection Using OAuth 2.0 authentication

To connect to Databricks using OAuth 2.0, specify these required parameters:

OAuth connection

  • Domain - a Databricks Workspace URL.
  • Client ID - a unique identifier assigned to your application.
  • Client Secret - an application’s secret key.
  • Token - an OAuth 2.0 bearer token. Click Sign in with Databricks to obtain it.
  • Warehouse - the name of the Databricks SQL warehouse.
  • Catalog - the Unity Catalog to use as the default context for queries.

Additional Connection Parameters

Schema is a current schema name in the database. Although this parameter is optional, you need to specify it when using replication or import in the Bulk Load mode.

Advanced Settings

Command Timeout

Specifies the wait time before terminating an attempt to execute a command and generating an error. Command Timeout doesn't affect the wait time for data fetching. Change this value if command execution takes too long, and this causes timeout errors.

CHAR(38) As Guid

Specifies whether values in columns of type CHAR(38) should be interpreted as GUIDs.

Use Bulk Import

When selected, Skyvia imports data by creating temporary CSV files, uploading them to the Databricks storage, and then instructing it to load the data from those files. This process runs in parallel for better performance. Once done, the CSV files are deleted. However, this method doesn't provide detailed error logs for individual records.

If you clear this checkbox, Skyvia will use standard SQL commands (INSERT, UPDATE, DELETE) to load data. Disabling bulk import provides a per-record error logs, but significantly reduces integration performance. You can disable it if you're importing a small number of records or need a detailed error log.

This option affects only import integrations with the INSERT, UPDATE and DELETE operation and with Databricks as a target.

Supported Actions

Databricks connector supports the following actions: