Skip to content

Okera Version 2.11 Release Notes

This topic provides Release Notes for all 2.11 versions of Okera.

2.11.8 (1/18/2023)

Bug Fixes and Improvements

  • Optimized the performance of Okera's getPartitions() API endpoint, resulting in lower latency and load on the catalog database.
  • Fixed page errors that occurred when there were conflicts creating permissions.

2.11.7 (1/9/2023)

Security Vulnerabilities (CVEs/CWEs) Addressed

Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.

Bug Fixes and Improvements

  • Fixed a bug where a crawler ignored the default schema specified for an Athena connection.

2.11.6 (12/9/2022)

Athena Connection Changes

When creating or editing an Athena connection, the default source schema field is now optional.

Security Vulnerabilities (CVEs/CWEs) Addressed

Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.

Bug Fixes and Improvements

  • Fixed an issue where the property okera.external.view in Databricks environments did not always match the value of the cerebro.external.view property.

2.11.5 (11/15/2022)

Security Vulnerabilities (CVEs/CWEs) Addressed

Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.

Bug Fixes and Improvements

  • Fixed an issue where users were required to have write privileges to view metadata.

2.11.4 (10/23/2022)

Bug Fixes and Improvements

  • Updated the Okera Spark3 connector to address an incompatibility resulting from a Databricks library change in Okera-supported Databricks runtime versions 9.1 LTS and later.

2.11.3 (10/20/2022)

Support for Referencing Amazon S3 Objects in Okera Configuration Parameters in nScale Amazon EMR Deployments

With this release, you can now reference objects stored in Amazon S3 as Okera configuration parameters for odas-emr-bootstrap. Okera pulls the objects referenced in the configuration parameters and mounts them in the nScale container, making the Amazon S3 paths available to Okera for processing. For example, this is helpful when configuring the SSL certificate and key required to start the OkeraEnsemble Amazon EMR access proxy in TLS/SSL mode:

--external-objects-to-container SSL_CERTIFICATE_FILE=s3://bucket/certificate-object, SSL_KEY_FILE=s3://bucket/key-object

If SSL_CERTIFICATE_FILE specifies the path to the SSL certificates file in Amazon S3 and SSL_KEY_FILE specifies the path to the SSL key file in Amazon S3, these paths can be used by the OkeraEnsemble access proxy for any necessary TLS/SSL processing.

Security Vulnerabilities (CVEs/CWEs) Addressed

Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.

Bug Fixes

The following bugs were fixed in this release:

  • Fixed an issue where some sensitive environment variables were not always redacted in logs.
  • You no longer need to specify the transport protocol (http:// or https://) in the bootstrap script option --rest-server-hostports of odas-emr-bootstrap.

2.11.2 (9/23/2022)

Blocking Access to the Okera UI for Tablets

This release introduces the ability to block access to the Okera UI on tablets. The configuration parameter, BLOCK_WEB_UI_FOR_MOBILE_CLIENTS, controls this behavior. Valid values for this parameter are true (the UI is blocked on mobile devices and tablets) and false (the UI is not blocked on mobile devices and tablets). The default is false.

Updated Input to LDAP Filtering

In this release, we have updated Okera's input for LDAP filtering. Two new configuration parameters have been defined to allow you to specify separate base DNs (distinguished names) for users and groups for LDAP server searches during its authentication processing.

  • Use GROUP_RESOLVER_LDAP_USER_BASE_DN to specify the base DN for users.
  • Use GROUP_RESOLVER_LDAP_GROUP_BASE_DN to specify the base DN for groups.

See Okera Configuration Parameter Reference for a complete list of the configuration parameters available to you for Okera configurations.

OkeraEnsemble nScale Amazon S3 Bucket Access Enhancements

This release enhances OkeraEnsemble nScale mode deployment in Amazon EMR Spark environments by supporting access control of both Amazon S3 buckets defined for Amazon's assume secondary role feature, and optionally those buckets to which the Okera cluster is granted access using its IAM permission.

Since nScale deploys the OkeraEnsemble access proxy with least-privilege access to Amazon EMR, it has no IAM permissions naturally and retrieves its credentials to sign Amazon S3 requests from the Okera Policy Engine (Planner). Consequently, when you deploy OkeraEnsemble in nScale mode, you must provide access to the Amazon S3 buckets using either of two methods:

  1. Using Amazon S3's assume secondary role feature. For Amazon S3 buckets that use assume secondary roles (bucket role map), the OkeraEnsemble access proxy retrieves the AWS Security Token Service (STS) credentials associated with the Amazon Resource Name (ARN) for the Amazon S3 bucket.

  2. By setting a new OKERA_SYSTEM_IAM_ROLE_ARN configuration parameter in the Okera configuration file to the IAM Amazon Resource Name (ARN) associated with the Okera cluster. When this is activated, Okera can grant OkeraEnsemble nScale users access to buckets to which the Okera cluster has access by permission through its IAM role.

For more information about OkeraEnsemble nScale mode deployment in Amazon EMR environments, see OkeraEnsemble nScale Mode Deployment in Amazon EMR Environments.

OkeraEnsemble nScale System Token Duration Controls

You can now specify the duration, in minutes, of the JWT system token for OkeraEnsemble nScale processing. A new environment variable, SYSTEM_TOKEN_DURATION_MIN, can be set on the nScale container using the Okera Amazon EMR odas-emr-bootstrap script to configure the duration of the Okera system token. For example, passing the following arguments with the odas-emr-bootstrap.sh script will configure the system token duration time to 300 minutes. Valid values are positive integers. The default value is equivalent to one day (1440 minutes).

--local-worker-env-vars "-e SYSTEM_TOKEN_DURATION_MIN=300"

This configuration setting only works when the nScale proxy is configured using JWT_PRIVATE_KEY and not with SYSTEM_TOKEN. When configured using JWT_PRIVATE_KEY, the nScale access proxy generates its own token and the SYSTEM_TOKEN_DURATION_MIN setting determines how long that token is good for. When configured with SYSTEM_TOKEN, the SYSTEM_TOKEN_DURATION_MIN setting has no effect because the JWT token identified by the SYSTEM_TOKEN path includes an embedded expiration time that cannot be governed by SYSTEM_TOKEN_DURATION_MIN setting. If both JWT_PRIVATE_KEY and SYSTEM_TOKEN are specified, the JWT_PRIVATE_KEY is used and the SYSTEM_TOKEN is ignored.

Security Vulnerabilities (CVEs) Addressed

Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.

Bug Fixes

The following bugs were fixed in this release:

  • The fix in Okera version 2.11.0 that returned string format instead of tabular format in the output of show create table has been reverted.
  • Upgraded Okera's version of Jackson to 2.13.3.

  • Upgraded Okera's base Alpine version to 3.15.6.

  • The policy synchronization enforcement mechanism used for Snowflake connections is now no longer enabled by default. You must enable it manually using the POLICY_SYNC_SCHEDULER_ENABLED configuration parameter or the okera.policy_sync.enabled advanced parameter in your Snowflake connection.

  • Upgraded Okera's version of Apache Shiro to 1.9.1.

  • Upgraded Okera's version of OpenJDK to 8.345.01-r0.

  • Fixed an error that occurred when querying Athena tables using JDBC pushdown processing. The error received was: An error has been thrown from the AWS Athena client. 1 validation error detected: Value '' at 'queryExecutionContext.catalog' failed to satisfy constraint: Member must have length greater than or equal to 1 [Execution ID not available].

2.11.1

Okera Version 2.11.1 was never distributed. Its updates were rolled into Okera 2.11.2.

2.11.0 (8/11/2022)

Snowflake Policy Synchronization Changes

This release introduces support for multiple changes to Okera's Snowflake policy synchronization enforcement.

  • All the access levels that are supported in both Okera and Snowflake are now supported. In past releases, only SELECT access was supported. This release extends Okera support for ALL, INSERT, DELETE, and UPDATE access as well. For more information about Snowflake policy synchronization, see Policy Synchronization Enforcement Overview.
  • The Snowflake connection dialog in the UI has been updated in this release. Users are now required to choose one of the following user options for policy synchronization when they set up a Snowflake connection in the UI:

    1. They can select a checkbox indicating that synchronization should occur for all users.

    2. They can specify a comma-separated list of users or a tag in a provided entry box.

    You should no longer specify the okera.policy_sync.user_allowed_list advanced connection property in the Advanced properties box in the UI dialog. The list is now managed by the new checkbox and entry box. However, you can continue to use the property when setting up a Snowflake connection using the API.

  • The connection details for Snowflake connections (Connection Details tab for a connection) now more closely matches the details provided for other connections.

  • Instructions and a sample script are now provided for creating a tag in Snowflake for Okera policy synchronization and applying it to your Snowflake user definitions. See Tag Users in Snowflake.

For complete information about Snowflake policy synchronization see Policy Synchronization Enforcement Overview. For information about setting up a Snowflake connection, see Create a Snowflake Connection.

Tag Restrictions

This release introduces restrictions for tagging.

  • Users who do not have permissions to create tags can no longer see the button on the Tags page in the UI.

  • Users who do not have permissions to create tag namespaces can no longer create them on the Create new tag dialog.

  • Users who do not have permissions to remove tags can no longer see the option on the Tags page in the UI.

  • When creating or removing a tag, users can only select the namespaces for which they have privileges.

See Managing Tags for more information about tags.

Deleting Databases From the UI

This release introduces the ability to delete an Okera database in the UI. For more information, see Delete a Database.

Workspace and Preview Changes in the UI

This release introduces the following changes to the Workspace page and to the dataset preview pages available for datasets registered to a database and for the dataset details of a crawler on the Registration page.

  • The Workspace page, when accessed from a dataset details page, now defaults to using the Presto API.
  • The dataset previews now default to using the Presto API for the preview queries.

In past releases, the Okera API was used.

Blocking Access to the Okera UI for Mobile Devices

This release introduces the ability to block access to the Okera UI on mobile devices. A new configuration parameter, BLOCK_WEB_UI_FOR_MOBILE_CLIENTS, has been introduced to control this behavior. Valid values for this parameter are true (the UI is blocked on mobile devices) and false (the UI is not blocked on mobile devices). The default is false.

OkeraEnsemble Amazon EMR nScale Mode Deployment

With this release, you can elect to deploy the OkeraEnsemble access proxy in nScale mode in Amazon EMR environments, so the OkeraEnsemble workload is distributed across your cluster nodes and scales up and down with your clusters. To do this, the OkeraEnsemble access proxy retrieves AWS credentials from the Okera Policy Engine (planner). To communicate with the Okera cluster, the access proxy generates its own system token if it is configured with the JWT private key used by the Okera cluster (via the JWT_PRIVATE_KEY configuration property). This is done for you if you use the odas-emr-bootstrap.sh script with the --install-jwt-key argument (specifying the Amazon S3 path to the key).

For more information, see OkeraEnsemble nScale Mode Deployment in Amazon EMR Environments.

Databricks 10 and 11 Support

This release introduces support for Databricks 10.0 through 10.5 and 11.0. In past versions, Okera only supported versions 8.3, 8.4, 9.0 and 9.1. For more information, see Databricks Integration Steps.

Note: With this release, Okera drops support for Databricks 7.3.

Sample GCP Group Resolution Script

This release introduces a sample script to resolve groups in Okera when using Google Cloud Platform (GCP). The script requires that the following configuration parameters be specified in the Okera configuration file.

  • Parameter GROUP_RESOLUTION_GOOGLE_APPLICATION_CREDENTIALS must provide the fully qualified path to a credentials JSON file for a GCP service account with appropriate admin privileges. The path can be a container path (the JSON file is mounted to the container by Kubernetes), an Amazon S3 (s3:) path, or an ADLS (adl:) path.

  • Parameter GSUITE_GROUP_ADMIN_EMAIL must provide the email of a GCP user with appropriate administrative privileges.

  • Parameter GROUP_RESOLVER_SCRIPTS must specify the fully qualified path /opt/scripts/resolve_groups_gcp_example.py.

When all configuration parameters are specified correctly, GCP group resolution is performed for Okera. See Sample GCP Group Resolution Script.

Configuring Parquet File Resolution Types

Table property parquet.resolve-by.type can now be used to configure how a Parquet data file is resolved. Valid values are ordinal (positional resolution) and name (name resolution). In past releases, resolution was configured globally and by default, resolved by name.

For example:

  ALTER TABLE nation SET TBLPROPERTIES('parquet.resolve-by.type'='name')
  ALTER TABLE nation SET TBLPROPERTIES('parquet.resolve-by.type'='ordinal')

AWS Athena Upgrade and Performance Improvements

This release upgrades Okera to use the AWS Athena 2.0.30 JDBC driver. With this upgrade, the Athena JDBC JAR file is no longer provided by Okera in the Maven repository, so you must download it from https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html. Okera does not require a JDBC driver with the AWS SDK, so download the one without the AWS SDK. In addition, Okera connections to Athena also now require specification of the path to the JDBC JAR file and its class name, specified in the driver.jar.path and driver.class.name properties in the connection. If you are creating an Athena connection in the Okera UI, these properties can be specified in the Driver file path and Driver class name fields. See Athena Data Source Connections.

Starting with Athena 2.0.5, the Athena JDBC connector uses the result set streaming API to improve its performance when fetching query results. To use this new Athena feature:

  1. Include and allow the athena:GetQueryResultsStream action in your IAM policy statement. For details on managing Athena IAM policies, see https://docs.aws.amazon.com/athena/latest/ug/security-iam-athena.html.

  2. If you are connecting to Athena through a proxy server, make sure that the proxy server does not block port 444. The result set streaming API uses port 444 on the Athena server for outbound communications.

Glue Enhancements

The following changes were made in this release to Okera's integration with AWS Glue:

  • A new configuration parameter, OKERA_GLUE_SILENCE_TBL_PAGINATOR_500 has been introduced. This parameter enables and disables Okera's silencing of unknown Glue errors that can affect the dataset counts on the Data page in the Okera UI. Valid values are true (silence the errors) and false (don't silence the errors). The default is false. You only need to set this configuration parameter if you receive an InternalServiceException 500 from Glue, while trying to open the Data page in the Okera UI.

  • Additional log messages have been added to improve any debugging that might be needed in Glue environments.

For more information about Okera's integration with AWS Glue, see Using Glue as a Third-Party Metadata Catalog.

SSL/TLS Enabled for the Okera Catalog

This release introduces configurable SSL and TLS support for Okera MySQL catalog databases and enhanced SSL support for Okera Postgres catalog databases. In past releases, Okera only provided very basic, non-configurable SSL support for Postgres catalogs and did not support TLS for either MySQL or Postgres catalogs. You can enable this enhanced configurable encryption by setting the CATALOG_DB_SSL parameter to "true" (the default is "false") in the Okera configuration file as well as setting the following new Okera parameters (encoded in base64) as described below:

  • CATALOG_DB_SERVER_CERT: Specifies the SSL/TLS certificate for the MySQL or Postgres catalog database server.
  • CATALOG_DB_CLIENT_CERT: Specifies the TLS certificate for the MySQL catalog database client. This parameter is only needed for TLS support.
  • CATALOG_DB_CLIENT_CERT_KEY: Specifies the private key for the MySQL catalog client TLS certificate. This parameter is only needed for TLS support.

Okera can determine which protocol (SSL or TLS) to use based on the certificates provided.

Notes: This change only impacts Okera connections to its MySQL or Postgres catalog and does not establish SSL/TLS configurable support throughout the Okera cluster.

Okera only supports TLS for MySQL catalogs at this time. It does not support Cloud SQL Auth proxy functionality.

For more information see Configure SSL/TLS for Okera Metadata Storage

Active/Active In-Parallel Policy Loading

This release introduces the ability to load Okera policies in parallel when active/active environments start up. This speeds up service start time, particularly for slower RDBMS environments or environments in which many roles must be loaded. Okera uses two thread pools to perform active/active in-parallel policy loading, one for the initial load and one that occurs in the background. The default number of roles loaded in parallel for an initial load is 12; the default number of roles loaded in the background is 2. To control these settings, two new configuration parameters have been introduced:

  • SENTRY_INITIAL_LOAD_THREADS can be used to override the initial in-parallel load default of 12 roles. Specify the number of roles that should be loaded in parallel when an active/active environment is initially started.
  • SENTRY_BACKGROUND_LOAD_THREADS can be used to override the background in-parallel load default of 2 roles. Specify the number of roles that should be loaded in parallel in the background of an active/active environment.

For more information about active/active environments, see Active/Active Deployment in Aurora RDS. For more information about Okera configuration parameters, see Configuration and Okera Configuration Parameter Reference.

Restricting Use of Privacy Functions

This release introduces the ability to restrict use of Okera's privacy functions and user-defined functions (UDFs) to Okera administrators only. To activate this feature, add the RESTRICTED_UDFS configuration parameter to your Okera configuration file. Valid values for this parameter are a comma-separated list of function names. Use of any functions listed in the parameter require administrator privileges. In the following example, the aes_decrypt and nfp_ref_tokenize privacy functions can only be used by administrators.

RESTRICTED_UDFS: aes_decrypt,nfp_ref_tokenize

For complete information about the privacy functions supported by Okera, see Privacy and Security Functions.

Dropping Attributes From Nested Fields

This release introduces the ability to drop attributes from nested fields. See Nested Field Tags.

Transformation Priority Defaults

This release introduces defaults for transformation priorities, when more than one transformation is applied to a single column. The default priority is:

  1. null
  2. zero
  3. sha2
  4. hash
  5. fnv_hash
  6. aes_decrypt
  7. aes_encrypt
  8. tokenize
  9. fp_ref_tokenize
  10. nfp_ref_tokenize
  11. mask
  12. mask_ccn
  13. diff_privacy
  14. phi_age
  15. phi_date
  16. phi_dob
  17. phi_zip3
  18. fp_random
  19. nfp_random
  20. random_ccn

The higher the priority (the later transformations in this list) override the earlier transformations with lower priority (for example, mask_ccn overrides zero). However, you can specify your own prioritization. See Prioritization of Transformations.

Native Delta Lake Table Support (Preview Feature)

This release introduces manifest-less native support for files in Delta Lake tables. This is introduced as an Okera preview feature. Previously, Okera only read Delta Lake tables if a manifest was explicitly created. With native support, this is no longer needed. Okera recommends switching to (manifest-less) native support, if you currently use the manifest method.

Native support is disabled by default, but can be enabled for individual Delta Lake tables or databases by specifying okera.delta.native-support=true as a table or database property. You can also enable it for the entire Okera cluster using the new DELTA_TABLE_NATIVE_SUPPORT configuration parameter in the Okera configuration file. Valid values for these properties are true (use native support, not manifest support) and false (use manifest support, not native support). The default is currently false for the cluster, but will be changed to true in a future release.

Note: Okera currently only supports querying the latest snapshot of a Delta Lake table.

For more information about Delta Lake file support, see Databricks Delta Lake Table Support.

Notable Changes

  • This release drops Okera's support for Kubernetes v1beta1. Okera now only supports Kubernetes v1. The v1beta1 version is deprecated and should no longer be used. This change affects the okctl and Helm charts used by your Okera clusters. The change was necessary because without it, Okera cannot support newer Kubernetes clusters. However because of this change, Okera cannot support clusters older than 2017. If your cluster uses Kubernetes v1beta1 or is older than 2017, please upgrade your Okera environment or contact Okera for assistance.
  • The authorize-query REST server API now requires that authorize_for be set to your user name when you submit an Okera query unless you are an Okera admin. Okera admins do not need to specify authorize_for. See Okera Policy Engine Integration.

  • This release drops support for Databricks 7.3.

  • Cloudera CDH is no longer supported in Okera.

  • Amazon Web Services EMR versions lower than version 5.24 are no longer supported in Okera.

Security Vulnerabilities (CVEs) Addressed

Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.

Bug Fixes

  • Okera's base Ubuntu image has been upgraded to bionic-20220801.

  • Audit log entries are now added for all CREATE_AS_OWNER implied operations.

  • Upgraded the base Alpine image used by Okera to 3.15.5.
  • Fixed an issue in which Okera failed to start when using an Azure database for Postgres.
  • Tooltips in the UI now have an updated, more legible, look.
  • Resolved a problem in which the dataset previews for a BigQuery table failed because the row limit was not applied on queries, and consequently produced very large result sets.

  • Fixed the CSP headers for the REST API documentation.

  • Cloned tables with applied policies in Snowflake no longer break policy synchronization when the original table used for the clone is removed.
  • Upgraded Scala in recordservice-spark-2.0.jar to version 2.11.12.
  • Fixed an issue in which database passwords with special characters were not properly encoded when establishing a connection.
  • Corrected a problem with string data when using Avro complex data types for certain unnesting queries.
  • Users who do not have permissions to create Okera databases now receive an authorization error if they attempt to create a database that already exists.
  • Fixed an issue in which the Presto configuration was written with mismatched closing tags.
  • The Snowflake connection synchronization details tab now provides a dropdown link you can select to show the last error message stored during connection synchronization.
  • Fixed a bug in which users without the correct privileges (for example, granted only SELECT privileges) could add groups to a role using the REST API. This is no longer possible without the correct privileges.

  • Snowflake connection details have been reordered and headings have changed in the Web UI.

  • Fixed a bug in which users without the correct permissions were able to update dataset and dataset column descriptions directly using the REST API.
  • Snowflake pushdown processing now supports queries using column tags (tag-based row filtering).
  • A Content-Security-Policy header is now applied to the REST server resources used by the Okera UI.
  • Fixed a bug in which a crawler's details could not be viewed the first time after deleting another crawler.

  • Upgraded to the latest version (2.1.0.7) of the Redshift JDBC driver.

  • Improved the performance of Okera metadata queries when calling Databricks with JDBC.

  • Improved Okera performance when evaluating ABAC policies.

  • The Google Cloud CLI (gcloud) is now removed from the Okera core image.

  • Improved performance when authorizing datasets from Databricks.

  • Improved Okera performance when authorizing table access from Databricks.

  • Fixed a layout issue with a dataset's Yes this dataset dialog, where the copyable text extended beyond its containing dialog.
  • Improved performance when using policies with transform clauses.

  • Improved Okera performance when loading table metadata.

  • Improved the performance of AuthorizeQuery RPC calls.