Connecting to Data Sources¶
This document outlines Okera's support for data sources and how to get started with connecting to data.
Okera supports connecting to these data sources out of the box:
- Object Storage (e.g. S3, ADLS, GS)
- Amazon Redshift
- AWS Athena
- Google BigQuery
- SQL Server
You can also connect to a custom JDBC data source by providing your own JDBC driver.
There are two main steps to get started with JDBC data sources:
- Create a connection to the data source
- Create a crawler to automatically register data from a Connection
Who has access to manage connections?¶
|Action||CATALOG (all connections)||DATACONNECTION||DATABASE|
|Ability to see connections page in UI||ALL, CREATE_DATACONNECTION_AS_OWNER||Any access level|
|Create connnections||ALL, CREATE_DATACONNECTION_AS_OWNER|
|Full administrative actions on specfied connection||ALL||ALL|
|Use specified connection for data registration||ALL||ALL, USE||Need ALL or CREATE_AS_OWNER on DATABASE or CATALOG to actually register tables inside a database.|
|Test connections||ALL, CREATE_DATACONNECTION_AS_OWNER||ALL, USE|
|List connection||ALL||Any access level|
|View connection details||ALL||ALL, USE|
Create a connection¶
- Click Create Connection on the Connections page in the UI
- Give your connection a name and select your Connection Type
- Input the relevant connection properties for your data source. For examples of the specific properties for each data source please click on the docs link for your specific data source. The example below is with Snowflake.
Providing secure credentials¶
Sensitive credentials cannot be provided in plain text, and instead must be provided in a secrets file either from local secret sources such as Kubernetes secrets, or from Cloud secrets managers services. It is recommended for auditability that you create a new system user for Okera in your underlying database, and use those credentials in your Okera connection. Note that this system user will need to have read access on your data.
Supported secure credential stores:
awsps://- AWS System Manager Parameter Store
awssm://- AWS Secrets Manager
azurekv://- Azure KeyVault
gcpsm://- GCP Secret Manager
file://- local files (using Kubernetes mounted secrets)
Please ensure you provide the correct permissions for Okera to access your secrets.
- Test your connection to verify the properties are correct
- Click Create connection to finish creating your connection
- Next you can move on to registering data from your connection.
Registering Data from a connection¶
Once you've created a connection you can crawl datasets from that connection and register them inside the Okera catalog. See Crawlers.
Viewing Connection Details¶
You can see connection details by clicking into a connection on the connections page.
Editing your connection¶
Altering a connection will impact access to all tables that have been registered via that connection, and should be done carefully.
You can edit a connection's properties, by clicking the Edit Connection button on the connections list. You should test your connection to ensure the new properties still work.
Registered datasets from a connection¶
You can see datasets that have been registered from a particular connection by clicking on the Registered Datasets tab on a particular connection.
You will only see datasets in this list that you have access to.
Deleting your connection¶
Dropping a connection will impact the ability to access any tables or crawlers registered from that connection. Therefore connections can only be dropped if there are no associated tables or crawlers.
You can delete a connection, by clicking the Delete Connection button on the connection details page.
If you see the error below, but there are no registered datasets in that connection, please ensure you have dropped all associated crawlers from that connection.
You can search for your connection name on the Registration page to quickly find all associated crawlers.
To see the programmatic DDL commands for registration see Programmatic Registration.
Querying data from Relational Data Sources¶
Data from JDBC data sources can be queried regularly like all data sources. When querying via SQL tools through the Okera Gateway Access Pattern, policy enforcement happens through the Pushdown enforcement pattern to ensure optimum performance. For advanced concepts such as in-line SQL views read here.
Okera pushes down predicates for JDBC-backet data sources by default.
To disable predicate pushdown for a particular JDBC-backed database or table, you can specify
'jdbc.predicates.pushdown.enabled' = 'false' in the