Skip to content

Programmatic Registration Reference

This document contains a reference for the programmatic DDL commands for connections and registration. To understand more about registering data sources see Connections Overview.

Connections

Create a connection

Example: creating a connection to a snowflake database

CREATE DATACONNECTION snowflake_connection CXNPROPERTIES
(
  'connection_type'='JDBC',
  'driver'='snowflake',
  'host'='my-host.com',
  'port'='3306',
  'user'='awsps:///my-username',
  'password'='awsps:///my-password',
  'jdbc.db.name'='OKERA_DEMO',
  'jdbc.schema.name'='MARKETING' -- optional, depends on engine
)

Note

Some data sources are sensitive to case, so please specify your database and schema name in the correct case.

Edit a connection

Example: Altering the default catalog for a connection. You will need to specify all the necessary parameters to create a connection, amending the ones you want to change.

ALTER DATACONNECTION snowflake_connection CXNPROPERTIES
(
  'connection_type' = 'JDBC',
  'driver' = 'snowflake',
  'host' = 'my-host.com',
  'port'= '3306',
  'user'= 'awsps:///my-username',
  'password' = 'awsps:///my-password',
  'jdbc.db.name' = 'NYTAXI',
  'jdbc.schema.name' = 'MARKETING' -- optional, depends on engine
)

Drop a connection

Warning

Connections can only be dropped if there are no associated registered tables or crawlers.

DROP DATACONNECTION snowflake_connection;

Crawlers

Create a crawler

Create an object storage crawler

CREATE CRAWLER `<crawler_name>`
SOURCE '<cloud://file_path_to_crawl>' -- s3://..., abfs://..., gs:// 
CRAWLER_PROPERTIES (
  SINGLE_FILE_DATASETS = true; -- Configuring crawler to recognize single file datasets.
);

Create a crawler on a connection

CREATE CRAWLER `<crawler_name>`
SOURCE DATACONNECTION 'snowflake_connection'
CRAWLER_PROPERTIES (
  'jdbc.db.name'='<source_database>',
  'jdbc.schema.name'='<source_schema>'
);

Drop crawler

DROP CRAWLER <crawler_name>;

Programmatically Bulk Load Tables into an Okera Database

If you wish to bulk load tables from a connection into an Okera Database, you can leverage some quick DDL commands to associate a connection with an Okera Database and load definitions from it.

Note

You will need permissions to CREATE a database and ALTER its properties.

Create a new Okera database associated with a connection

CREATE DATABASE snowflake_db
DBPROPERTIES (
 'okera.connection.name' = 'snowflake_connection'
 );

Connect an existing Okera database

ALTER DATABASE snowflake_db SET DBPROPERTIES
(
 'okera.connection.name' = 'snowflake_connection'
);

Bulk load table definitions

After creating the database, you will need to actually load the table definitions for the tables and views you want to bring over from your underlying database connection.

There are few different DDL commands that make it easy to register and alter metadata from database connections:

  • Load table definitions for the first time
  • Add additional table definitions into existing metadata info in ODAS catalog.
  • Overwrite entire table definitions from the JDBC data source to ODAS catalog.

Specify the specific tables

Note

These tables should be in the database specified in your connection. The connection will leverage your default schema unless specified otherwise.

Example - using default schema The command takes in a list of JDBC data source tablenames.

ALTER DATABASE snowflake_db LOAD DEFINITIONS (TRANSACTIONS, CUSTOMERS);

Note

Remember some databases expect table names to be case-sensitive.

Example - load definitions from another schema This is used if no default schema was specified in the connection, or if you wish to load tables from a different schema in the same database.

Syntax

ALTER DATABASE <okera_db> LOAD DEFINITIONS (<table_name>)
FROM JDBC DATABASE <underlying_database_name> SCHEMA <underlying_schema_name>

Example

ALTER DATABASE snowflake_db LOAD DEFINITIONS (CUSTOMER)
FROM JDBC DATABASE OKERA_DEMO SCHEMA CUSTOMER_INFO

Load all tables from connection

ALTER DATABASE snowflake_db LOAD DEFINITIONS ();

Override existing table definitions

Running the above commands repeatedly would skip the existing table definitions. However, if you wish to refresh the entire table definition again from the JDBC source, use the OVERWRITE option.

ALTER DATABASE snowflake_db LOAD DEFINITIONS () OVERWRITE;

Automatically refresh a schema

You can refresh your table definition from the underlying data source by using the ALTER TABLE <tableName> RECOVER COLUMNS command. This is specifically useful to add or replace existing columns.

Example: Add a new column to the existing definition

-- automatically refresh schema and add column
ALTER TABLE marketingdb.users RECOVER COLUMNS (country) ADD;
-- new column appears in table definition
DESCRIBE marketingdb.users;
  userid    int
  fname varchar(50)
  lname varchar(50)
  fullname  varchar(101)
  emailid   varchar(150)
  country varchar(100)

To remove existing columns and add specific columns, use the REPLACE option.

Example: Replace an existing column definition with a selective new one

ALTER TABLE marketingdb.users RECOVER COLUMNS (userid) REPLACE;
DESCRIBE marketingdb.users;
  userid    int

Passing an empty list of columns would behave similar to ALTER DATABASE LOAD DEFINITIONS OVERWRITE for that specific table.

Example: Replace existing columns with the ones available at the source

ALTER TABLE marketingdb.users RECOVER COLUMNS () REPLACE;
DESCRIBE marketingdb.users;
  userid    int
  fname varchar(50)
  lname varchar(50)
  fullname  varchar(101)
  emailid   varchar(150)
  country varchar(100)