Skip to content

OkeraFS Deployment on Databricks (Preview Feature)

Okera provides an authorization layer that intercepts Databricks data requests to S3 to determine whether users have access to the file and data in the request. If they do, the request is passed to S3 for processing. If they do not have access to the file, the request is rejected and returned.

Important

OkeraFS on Databricks is file-format dependent. At this time only Parquet, Delta, and Hive table file formats are supported. No other file formats are supported.

To enable Okera file access control (OkeraFS) for Databricks in AWS S3 environments, you must enable the Okera file system driver and enforce path signing using two environment variables OKERA_ENABLE_OKERA_FS and OKERA_FS_REQUIRE_SIGNED_PATHS. In the Environment Variables section under Clusters -> Advanced Options -> Spark, set the following environment variables to true.

OKERA_ENABLE_OKERA_FS=true
OKERA_FS_REQUIRE_SIGNED_PATHS=true
OKERA_DBX_PATH_SIGN_KEY=<path>

The OKERA_ENABLE_OKERA_FS environment variable installs the Okera file system driver.

The OKERA_FS_REQUIRE_SIGNED_PATHS environment variable enforces paths to be signed for authorization purposes.

The OKERA_DBX_PATH_SIGN_KEY environment variable identifies the location of the Databricks secret sign key used to sign the URLs shared between Databricks and Okera. To set up your Databricks secret sign key, see Databricks Secrets. Once you have defined your secrets sign key, specify the path to it in the OKERA_DBX_PATH_SIGN_KEY environment variable. The <path> is usually specified within double braces. For example:

OKERA_DBX_PATH_SIGN_KEY={{//secrets/your/sign-key}}