Skip to content

Programmatic Registration Reference

This document contains a reference for the programmatic DDL commands for connections and registration. To understand more about registering data sources see Connections Overview.

Connections

Create a Connection

Example: creating a connection to a snowflake database

CREATE DATACONNECTION snowflake_connection CXNPROPERTIES
(
  'connection_type'='JDBC',
  'driver'='snowflake',
  'host'='my-host.com',
  'port'='3306',
  'user'='awsps:///my-username',
  'password'='awsps:///my-password',
  'jdbc.db.name'='OKERA_DEMO',
  'jdbc.schema.name'='MARKETING' -- optional, depends on engine
)

Note: Some data sources are sensitive to case, so please specify your database and schema name in the correct case.

Edit a Connection

Example: Altering the default catalog for a connection. You will need to specify all the necessary parameters to create a connection, amending the ones you want to change.

ALTER DATACONNECTION snowflake_connection CXNPROPERTIES
(
  'connection_type' = 'JDBC',
  'driver' = 'snowflake',
  'host' = 'my-host.com',
  'port'= '3306',
  'user'= 'awsps:///my-username',
  'password' = 'awsps:///my-password',
  'jdbc.db.name' = 'NYTAXI',
  'jdbc.schema.name' = 'MARKETING' -- optional, depends on engine
)

Drop a Connection

Warning

Connections can only be dropped if there are no associated registered tables or crawlers.

DROP DATACONNECTION snowflake_connection;

Crawlers

Create a Crawler

Create an Object Storage Crawler

CREATE CRAWLER `<crawler_name>`
SOURCE '<cloud://file_path_to_crawl>' -- s3://..., abfs://..., gs:// 
CRAWLER_PROPERTIES (
  SINGLE_FILE_DATASETS = true; -- Configuring crawler to recognize single file datasets.
);

Note: Okera supports Azure Blob Filesystem Storage (abfs) dfs URIs (*.dfs.core.windows.net), but does not support blob URIs (*.blob.core.windows.net).

Create a Crawler on a Connection

CREATE CRAWLER `<crawler_name>`
SOURCE DATACONNECTION 'snowflake_connection'
CRAWLER_PROPERTIES (
  'jdbc.db.name'='<source_database>',
  'jdbc.schema.name'='<source_schema>'
);

Drop a Crawler

DROP CRAWLER <crawler_name>;

Programmatically Bulk Load Tables Into an Okera Database

If you wish to bulk load tables from a connection into an Okera Database, you can leverage some quick DDL commands to associate a connection with an Okera Database and load definitions from it.

Note

You will need permissions to CREATE a database and ALTER its properties.

Create a New Okera Database Associated With a Connection

CREATE DATABASE snowflake_db
DBPROPERTIES (
 'okera.connection.name' = 'snowflake_connection'
 );

Connect an Existing Okera Database

ALTER DATABASE snowflake_db SET DBPROPERTIES
(
 'okera.connection.name' = 'snowflake_connection'
);

Bulk Load Table Definitions

After creating the database, you will need to actually load the table definitions for the tables and views you want to bring over from your underlying database connection.

There are few different DDL commands that make it easy to register and alter metadata from database connections:

  • Load table definitions for the first time
  • Add additional table definitions into existing metadata info in Okera catalog.
  • Overwrite entire table definitions from the JDBC data source to Okera catalog.

Specify the Specific Tables

Note: These tables should be in the database specified in your connection. The connection leverages your default schema unless specified otherwise.

Example: Using a default schema

The command takes in a list of JDBC data source tablenames.

ALTER DATABASE snowflake_db LOAD DEFINITIONS (TRANSACTIONS, CUSTOMERS);

Note: Remember some databases expect table names to be case-sensitive.

Example: Load definitions from another schema

This is used if no default schema was specified in the connection, or if you wish to load tables from a different schema in the same database.

ALTER DATABASE <okera_db> LOAD DEFINITIONS (<table_name>)
FROM JDBC DATABASE <underlying_database_name> SCHEMA <underlying_schema_name>

Example:

ALTER DATABASE snowflake_db LOAD DEFINITIONS (CUSTOMER)
FROM JDBC DATABASE OKERA_DEMO SCHEMA CUSTOMER_INFO

Load All Tables From a Connection

ALTER DATABASE snowflake_db LOAD DEFINITIONS ();

Override Existing Table Definitions

Running the commands above repeatedly skips the existing table definitions. However, to refresh the entire table definition again from the JDBC source, use the OVERWRITE option.

ALTER DATABASE snowflake_db LOAD DEFINITIONS () OVERWRITE;

Automatically Refresh a Schema

You can refresh your table definition from the underlying data source using the ALTER TABLE <tableName> RECOVER COLUMNS command. This is specifically useful to add or replace existing columns.

Example: Add a new column to the existing definition

-- automatically refresh schema and add column
ALTER TABLE marketingdb.users RECOVER COLUMNS (country) ADD;
-- new column appears in table definition
DESCRIBE marketingdb.users;
  userid    int
  fname varchar(50)
  lname varchar(50)
  fullname  varchar(101)
  emailid   varchar(150)
  country varchar(100)

To remove existing columns and add specific columns, use the REPLACE option.

Example: Replace an existing column definition with a selective new one

ALTER TABLE marketingdb.users RECOVER COLUMNS (userid) REPLACE;
DESCRIBE marketingdb.users;
  userid    int

Passing an empty list of columns would behave similar to ALTER DATABASE LOAD DEFINITIONS OVERWRITE for that specific table.

Example: Replace existing columns with the ones available at the source

ALTER TABLE marketingdb.users RECOVER COLUMNS () REPLACE;
DESCRIBE marketingdb.users;
  userid    int
  fname varchar(50)
  lname varchar(50)
  fullname  varchar(101)
  emailid   varchar(150)
  country varchar(100)