Skip to content

Okera Version 2.8 Release Notes

This topic provides Release Notes for all 2.8 versions of Okera.

2.8.6 (3/17/2022)

Bug Fixes and Improvements

  • Google OAuth is now supported. To enable it, include the following Okera configuration setting, in addition to the required OAuth configuration settings: OAUTH_PROVIDER: google. Okera treats the OAuth provider as a generic OAuth 2.0 provider, which will work for many OAuth 2.0 providers without additional configuration. However, if you do not set this configuration setting, it will not work for Google or G Suite (Google Workspace). The only valid value for this setting is google. See Authenticate Using Open Authorization (OAuth).
  • Performance is now improved when evaluating grants of the form HAVING ATTRIBUTE not in <attribute name> and HAVING ATTRIBUTE not <attribute name>.

  • Added support for all Okera AWS CLI versions through the latest version, 1.22.71. Support before this change may have been limited to the default installation of Amazon Linux 2 (AWS CLI version 1.18.147).

  • Added configuration settings PRESTO_HTTP_CLIENT_MAX_CONNECTIONS_PER_SERVER and PRESTO_HTTP_CLIENT_MAX_REQUESTS_QUEUED_PER_SERVER. These settings customize the Presto configuration properties that control the maximum number of concurrent connections for a server(http-client.max-connections-per-server) and the maximum number of requests queued per destination (http-client.max-requests-queued-per-destination). See Configuration. If these values are not specified in Okera, the values specified in your Presto environment are used (20 for http-client.max-connections-per-server and 1024 for http-client.max-requests-queued-per-destination).

  • Fixed a race condition that occurred when invoking Hive UDFs.

  • Partition columns are no longer autotagged.

2.8.5 (2/22/2022)

New Prioritization of Transformations (Preview Feature)

This release introduces support for the prioritization of transformations when multiple transformations are applied to a single column. If you use autotagging, multiple tags can be assigned to a single column, each with its own transformations. Multiple transformations applied to a single column make the column inaccessible with a conflict.

To resolve this problem, you can now prioritize the transformation functions. The prioritization applies across the entire system. Wherever multiple transformations occur for any column, the prioritization is applied.

To prioritize transformations, add the new TRANSFORM_UDF_PRIORITIES configuration property to your configuration settings. Values for this property are the transformation function names, specified in sequence, separated by commas. The last function name in the list has priority over all the preceding functions in the list as well as priority over functions not included in the list.

For complete information, see Prioritization of Transformations.

Notable Changes

  • In older versions of Okera, the token might be included as a query argument in the URL (e.g. for /api/scan). In such cases, the token was escaped. With this release, we have disabled escaping tokens by default. Optionally, you can reenable the legacy behavior by specifying OKERA_LEGACY_TOKEN_ESCAPE=true in the configuration file.

Bug Fixes and Improvements

  • The default internal RPC timeouts used by the Okera client and server have been increased. After SASL authentication, the RPC timeout has been increased from 600000 ms (10 minutes) to 3,600,000 ms (60 minutes). The server RPC timeout has been increased from 120000 ms (2 minutes) to 1,800,000 ms (30 minutes).
  • Added support for longer values for table properties when using Hive Metastore (HMS) 2.3 and later. To use these versions of HMS, the setting of the ENABLE_HMS_2_SCHEMA configuration property must be set to true. When set, it will allow longer table property values to be specified (up to 16M characters now, an increase from 4K in earlier HMS releases). You should only set this property if you have HMS 2 or later installed.

  • For Spark 2 environments, pushdown of date filters can now be performed without the DATE function if you use SparkConf to set spark.recordservice.spark.parser to CatalystScan. By default, this setting is set to PrunedFilteredScan. See Spark Notes.

  • To resolve a bug, domain optimization is now skipped for decimal column filters in Parquet and optimized row column (ORC) tables.
  • Added faster, lightweight API support for show partitions with Spark.
  • When the JWT_JWKS_URL configuration parameter is specified and the RFC key type (kty) parameter is set to RSA, but no algorithm (alg) parameter is supplied, Okera now defaults to RSA256.

  • Fixed an issue that involved scanning partitioned tables backed by Google Storage (GS).

  • Fixed a bug in which the preview of registered tables did not work.
  • Fixed an issue in which an evicted worker pod caused node membership registration to fail.
  • Fixed a bug on the Insights page so that the View SQL button now shows the appropriate SQL statements.
  • Fixed an issue that could cause excessive memory usage when reading many small Avro files.
  • Fixed an issue with Okera's client libraries for Databricks Spark, where some tables could not be queried due to missing metadata.

  • Optimized the memory used when planning queries on tables with many partitions.

  • Corrected a problem in which the View as SQL option for a policy produced an empty POLICYPROPERTIES() SQL statement.
  • Fixed an issue in which tags that were named the same as Okera reserved keywords (for example, pii.date) caused errors. Tag names that match Okera reserved words must be escaped using backtick (`) characters.

2.8.4 (1/12/2022)

Bug Fixes and Improvements

  • Fixed an issue introduced in 2.8.3 that caused some policy permissions to fail.

  • Fixed an issue that occurred when using toPandas() in Databricks.

2.8.3 (1/10/2022)

This release contains an additional upgrade of log4j to resolve additional instances of the Log4Shell vulnerability.

DDL Improvements

Okera now supports dropping multiple partitions in a single SQL command. For example:

ALTER TABLE page_view DROP 
PARTITION (dt='2008-08-08', country='us')
PARTITION (dt='2008-08-09', country='us');

See Support for Adding or Dropping Multiple Partitions in a Single ALTER TABLE Statement.

In addition, Okera now supports dropping partitions by specifying only part of the partition specification. For example, if a table is partitioned on year=INT/month=INT/day=INT, then you can specify only some of the columns and all matching partitions will be dropped. For example, the following statement will drop all partitions that have year set to 2020:

ALTER TABLE my_table DROP PARTITION(year=2020)

See Support for Dropping Partial Partitions in an ALTER TABLE Statement.

Bug Fixes and Improvements

  • Fixed an issue when providing an invalid BUCKET_TO_ROLE_MAP_FILE configuration.
  • Arrays with duplicate names are now unnested successfully.
  • Okera no longer attempts to autotag external views, as this is an undefined operation.
  • Improved performance for queries that require many metadata calls.
  • The database Permissions tab is no longer cut off when a safeguard policy message displays.

2.8.2 (12/20/2021)

Bug Fixes and Improvements

  • This release contains an additional upgrade of log4j to resolve additional instances of the Log4Shell vulnerability.
  • Several enhancements were made in this release to support Databricks file access control. Specifically, support was added for configuring the signing key used by the integration, which can be conifgured using Databricks Secrets. See Enable Okera File Access Control (OkeraEnsemble) for more information. Errors will occur if a signing key is not found, and no static secret is provided.
  • With this release, Okera now supports Databricks versions 8.3, 8.4, 9.0 and 9.1. See Supported Versions.

  • Fixed several issues related to connecting to Databricks via JDBC/ODBC.

  • Fixed an issue with adding the full Spark query to the Okera audit log when issuing queries on Databricks. For more information about enabling this capability, see Activate Spark query audit logging.

Notable and Incompatible changes

  • In past versions, Okera's operational logs did not use a partitioning scheme when uploading. This made it hard to locate the logs you needed and, in some environments, increased the time to list the log files. With this release, a new configuration option, WATCHER_LOG_PARTITIONED_UPLOADS has been added to the configuration file to enable partitioned log uploads. Valid values are true and false. When enabled (true), operational log files will use the ymd=YMD/h=H/component=C partitioning scheme for operational log file uploads. By default, this setting is disabled (false) so older clusters are not affected. However, in a future version of Okera, it will be enabled by default, so Okera recommends that users adopt this in new deployments.

  • For Databricks version 8.0 and up, Okera now only supports the native Databricks integration (where Okera is not on the data path).

2.8.1 (12/13/2021)

This release contains an upgrade of log4j to resolve the Log4Shell vulnerability.

2.8.0 (12/10/2021)

File Access Control (OkeraEnsemble) on Databricks ( Preview Feature)

In past versions, Okera supported authorization for data access via SQL. However, big data environments and cloud object storage environments allow users to access files directly, which introduces the need to control who should be able to perform operations on files and what operations should be allowed. This release of Okera introduces the ability to perform file access control.

Okera's first implementation of file access control is for Amazon S3 in a Databricks environment. With this feature, Okera provides an authorization layer that intercepts Databricks data requests to Amazon S3 to determine whether users have access to the file and data in the request. If they do, the request is passed to Amazon S3 for processing. If they do not have access to the file, the request is rejected and returned.

This feature is implemented automatically, except for two environment variables that must be set in Databricks: OKERA_ENABLE_OKERA_FS and OKERA_FS_REQUIRE_SIGNED_PATHS. For information on how to set these environment variables, see OkeraEnsemble Deployment on Databricks.

Important

OkeraEnsemble on Databricks is file-format dependent. At this time only Parquet, Delta, and Hive table file formats are supported. No other file formats are supported.

Timebound Permissions

You can now add start and end dates and times for permissions. The permission definitions are only enforced during the specified date and time range. You can also specify only a start date and time or only an end date and time. If you specify both, the end date and time must be later than the start date and time. See Set Time-Based Conditions.

Enable and Disable Permissions

You can now enable and disable permissions for a role. By default, a permission is enabled when it is created. Disabled permissions remain assigned to the role but are not enforced. A new toggle has been added to the Permissions dialog that you can use to enable and disable the permission. In addition, a new Enabled column has been added to Permissions lists.

See Disable Permissions and Enable Permissions.

Databricks 8 Support

This release introduces support for Databricks 8 through 8.3. It mostly provides the same scope of functionality as Okera's support for Databricks 7. However, in Databricks 8 integrations that use Spark 3 or later, client-side compression is not currently supported. See Supported Versions.

Improved OAuth Authentication Using a JSON Web Key Set (JWKS) Endpoint

You can now configure Okera with a JWKS endpoint that will be used to dynamically fetch the appropriate public key needed for OAuth authentication from the JWKS content supplied by OAuth services.

We recommend that all OAuth users configure this endpoint to improve OAuth authentication in Okera. This is an improvement over past releases in which you had to manually configure Okera with the appropriate public key from OAuth services that did not provide it in an easily consumable format.

Use the JWT_JWKS_URL configuration setting to supply the URL of your OAuth identity provider (for example, Okta, Auth0, or AzureAD). For more information, see OAuth Authentication.

Pushdown Processing for Optimized Row Column (ORC) Table Query Predicates

Numeric data type query predicates for optimized row column (ORC) tables are now pushed down to ORC libraries for processing.

Novice User Experience

Tooltips have been added to the Okera Portal (the UI) in this release. The first time you access a page, the tooltips display automatically. Thereafter, the tooltips do not automatically display, but are available by selecting the tooltip icon ().

Contextual User Inactivity Analysis

The User Inactivity Report has been retitled User Inactivity Analysis and is now located as a fourth tab on the Databases page and on the dataset details page. It is no longer available as the second tab on the Users page. See User Inactivity Analysis.

Permission List Consistency Enhancements

The permission lists in the Okera Portal (UI) on the Roles page and on the Data page now look and work consistently. See View and Manage Permissions in the UI.

Schema Edit Usability Improvement

This release improves the user experience while editing a dataset schema. You are now no longer required to select the checkmark icon () after making an edit. Instead, just clicking anywhere else on the screen will save your changes. Selecting the checkmark icon continues to save your changes as well.

Okera Portal Table Consistency Enhancements

This release improves the usability and consistency of table behavior in the Okera Portal (the UI). Specifically:

  • The user experience when creating a new table object (such as when creating a new database, a new role, or a new connection) has been made consistent across the UI.
  • The headers are formatted consistently for all tables in the UI.
  • The search (filter) bar and table header are static as you scroll through tables. They remain in a consistent location while scrolling.
  • The look and feel of all tables in the UI are consistent.
  • You can now filter the Roles page table by multiple groups. In the past you could not. In addition, the group and user filter boxes on the Roles page now include dropdown lists from which you can select groups and users for the filter.

Bug Fixes

  • Fixed several issues when rewriting queries (e.g., for Snowflake) related to casing and escaping.
  • Fixed an issue when connecting to an external HMS using Thrift where the Databases page would be blank.
  • Fixed an issue that prevented revoking permissions on tables that no longer existed.
  • Fixed an issue when running SHOW CREATE TABLE when a column name was a reserved keyword.
  • Fixed an issue when creating a crawler over a Hadoop Distributed File System (HDFS) path.
  • Fixed an issue when creating connections where the username or password paths were prefixed with white space.