Programmatic Registration Reference¶
This document contains a reference for the programmatic DDL commands for connections and registration. To understand more about registering data sources see Connections Overview.
Connections¶
Note
Some data sources are sensitive to case, so please specify your database and schema name in the correct case.
Edit a connection¶
Example: Altering the default catalog for a connection. You will need to specify all the necessary parameters to create a connection, amending the ones you want to change.
ALTER DATACONNECTION snowflake_connection CXNPROPERTIES
(
'connection_type' = 'JDBC',
'driver' = 'snowflake',
'host' = 'my-host.com',
'port'= '3306',
'user'= 'awsps:///my-username',
'password' = 'awsps:///my-password',
'jdbc.db.name' = 'NYTAXI',
'jdbc.schema.name' = 'MARKETING' -- optional, depends on engine
)
Drop a connection¶
Warning
Connections can only be dropped if there are no associated registered tables or crawlers.
DROP DATACONNECTION snowflake_connection;
Crawlers¶
Create a crawler¶
Create an object storage crawler¶
CREATE CRAWLER `<crawler_name>`
SOURCE '<cloud://file_path_to_crawl>' -- s3://..., abfs://..., gs://
CRAWLER_PROPERTIES (
SINGLE_FILE_DATASETS = true; -- Configuring crawler to recognize single file datasets.
);
Create a crawler on a connection¶
CREATE CRAWLER `<crawler_name>`
SOURCE DATACONNECTION 'snowflake_connection'
CRAWLER_PROPERTIES (
'jdbc.db.name'='<source_database>',
'jdbc.schema.name'='<source_schema>'
);
Drop crawler¶
DROP CRAWLER <crawler_name>;
Programmatically Bulk Load Tables into an Okera Database¶
If you wish to bulk load tables from a connection into an Okera Database, you can leverage some quick DDL commands to associate a connection with an Okera Database and load definitions from it.
Note
You will need permissions to CREATE
a database and ALTER
its properties.
Create a new Okera database associated with a connection¶
CREATE DATABASE snowflake_db
DBPROPERTIES (
'okera.connection.name' = 'snowflake_connection'
);
Connect an existing Okera database¶
ALTER DATABASE snowflake_db SET DBPROPERTIES
(
'okera.connection.name' = 'snowflake_connection'
);
Bulk load table definitions¶
After creating the database, you will need to actually load the table definitions for the tables and views you want to bring over from your underlying database connection.
There are few different DDL commands that make it easy to register and alter metadata from database connections:
- Load table definitions for the first time
- Add additional table definitions into existing metadata info in ODAS catalog.
- Overwrite entire table definitions from the JDBC data source to ODAS catalog.
Specify the specific tables¶
Note
These tables should be in the database specified in your connection. The connection will leverage your default schema unless specified otherwise.
Example - using default schema The command takes in a list of JDBC data source tablenames.
ALTER DATABASE snowflake_db LOAD DEFINITIONS (TRANSACTIONS, CUSTOMERS);
Note
Remember some databases expect table names to be case-sensitive.
Example - load definitions from another schema This is used if no default schema was specified in the connection, or if you wish to load tables from a different schema in the same database.
Syntax
ALTER DATABASE <okera_db> LOAD DEFINITIONS (<table_name>)
FROM JDBC DATABASE <underlying_database_name> SCHEMA <underlying_schema_name>
Example
ALTER DATABASE snowflake_db LOAD DEFINITIONS (CUSTOMER)
FROM JDBC DATABASE OKERA_DEMO SCHEMA CUSTOMER_INFO
Load all tables from connection¶
ALTER DATABASE snowflake_db LOAD DEFINITIONS ();
Override existing table definitions¶
Running the above commands repeatedly would skip the existing table definitions.
However, if you wish to refresh the entire table definition again from the JDBC source, use the OVERWRITE
option.
ALTER DATABASE snowflake_db LOAD DEFINITIONS () OVERWRITE;
Automatically refresh a schema¶
You can refresh your table definition from the underlying data source by using the ALTER TABLE <tableName> RECOVER COLUMNS
command.
This is specifically useful to add or replace existing columns.
Example: Add a new column to the existing definition
-- automatically refresh schema and add column
ALTER TABLE marketingdb.users RECOVER COLUMNS (country) ADD;
-- new column appears in table definition
DESCRIBE marketingdb.users;
userid int
fname varchar(50)
lname varchar(50)
fullname varchar(101)
emailid varchar(150)
country varchar(100)
To remove existing columns and add specific columns, use the REPLACE
option.
Example: Replace an existing column definition with a selective new one
ALTER TABLE marketingdb.users RECOVER COLUMNS (userid) REPLACE;
DESCRIBE marketingdb.users;
userid int
Passing an empty list of columns would behave similar to ALTER DATABASE LOAD DEFINITIONS OVERWRITE
for that specific table.
Example: Replace existing columns with the ones available at the source
ALTER TABLE marketingdb.users RECOVER COLUMNS () REPLACE;
DESCRIBE marketingdb.users;
userid int
fname varchar(50)
lname varchar(50)
fullname varchar(101)
emailid varchar(150)
country varchar(100)