OkeraFS Deployment on Databricks (Preview Feature)¶
Okera provides an authorization layer that intercepts Databricks data requests to S3 to determine whether users have access to the file and data in the request. If they do, the request is passed to S3 for processing. If they do not have access to the file, the request is rejected and returned.
OkeraFS on Databricks is file-format dependent. At this time only Parquet, Delta, and Hive table file formats are supported. No other file formats are supported.The default port used by OkeraFS is 5010.
To enable Okera file access control (OkeraFS) for Databricks in AWS S3 environments, you must enable the Okera file system driver and enforce path signing using two environment variables
OKERA_FS_REQUIRE_SIGNED_PATHS. In the Databricks
Environment Variables section under
Clusters -> Advanced Options -> Spark, set the following environment variables to
OKERA_ENABLE_OKERA_FS=true OKERA_FS_REQUIRE_SIGNED_PATHS=true OKERA_DBX_PATH_SIGN_KEY=<path>
OKERA_ENABLE_OKERA_FS environment variable installs the Okera file system driver.
OKERA_FS_REQUIRE_SIGNED_PATHS environment variable enforces paths to be signed for authorization purposes.
OKERA_DBX_PATH_SIGN_KEY environment variable identifies the location of the Databricks secret sign key used to sign the URLs shared between Databricks and Okera. To set up your Databricks secret sign key, see Databricks Secrets. Once you have defined your secrets sign key, specify the path to it in the
OKERA_DBX_PATH_SIGN_KEY environment variable. The
<path> is usually specified within double braces. For example: