Skip to content

Okera Version 2.9 Release Notes

This topic provides Release Notes for all 2.9 versions of Okera.

2.9.5 (5/11/2022)

Notable Changes

  • Support is deprecated for the following versions of Amazon Elastic MapReduce (EMR): 5.20, 5.21, 5.22, and 5.23. The earliest version of Amazon EMR currently supported by Okera is version 5.24.

Bug Fixes and Improvements

  • Fixed an issue where Snowflake policy synchronization failed with a connectivity error if the Snowflake connection host name included more than two periods (".").

  • Postgres has been removed from Okera images. Okera builds included Postgres in previous versions.

  • Fixed a bug that occurred when a complicated policy with many conditions would not save. The character limit for Okera Snowflake policies has increased from 1000 characters to 5000 characters in this release.
  • Optimized metadata loading when analyzing ALTER TABLE DROP PARTITION statements.
  • Okera now supports UUID data types for JDBC data sources. They are mapped to the STRING data type in Okera. See JDBC Data Types Mapping.

2.9.4 (5/2/2022)

OkeraEnsemble Assume Secondary Role Support

Assume secondary role support (also referred to as bucket role mapping) is now supported in OkeraEnsemble Amazon S3 environments. This support introduces two new configuration parameters: OKERA_ASSUME_ROLE_DURATION_SECONDS and GO_ACCESS_PROXY_CACHE_LOG_PERIOD. For more information about bucket role mapping and these new parameters, see Amazon S3 Assume Secondary Role Support. For information about OkeraEnsemble, see OkeraEnsemble Deployment on Amazon S3.

Bug Fixes and Improvements

  • Fixed a problem in which a user encounters an error accessing a URI when they are granted catalog/server-scope permissions that have associated attributes.

2.9.3 (4/28/2022)

Active/Active Deployment( Preview Feature)

You can now run multiple Okera clusters in separate regions that share data using an Amazon Aurora relational database service (RDS) with write-forwarding enabled.

The cluster in each region communicates with its local read-only database. Aurora RDS synchronizes updates between the primary instance and its replicas in each region. This environment allows multiple active Okera environments to share data and is called an active/active environment. See Active/Active Deployment in Aurora RDS.

Presto Resource Group Support

This release introduces the ability to enable Presto resource groups that can be used for quality of service and admission control management. These resource groups can be used to constrain the resources used by a single user or query. For more information about Presto resource groups, see Resource Groups.

To enable these Presto resource groups, add the following configuration parameters to your Okera configuration file and restart Okera.

  • PRESTO_SHOULD_USE_RESOURCE_GROUPS: Set this parameter to true to enable the file-based configuration manager for Presto. Valid values are true and false. The default is false.

  • PRESTO_RESOURCE_GROUP_FILE_LOCATION: This parameter identifies the location of a JSON file that contains the Presto resource group definition. Information about the contents of this file can be found in Resource Groups.

Presto Query Logging Support

This release introduces support that logs additional Presto query information. Presto provides high-level information about queries that does not ordinarily get logged in Okera. When you enable this new support, additional information is logged in Okera logs, allowing you to better understand which queries are running, which ones succeed and fail, and how failures occur.

To enable Presto query logging, add the new PRESTO_ENABLE_QUERY_LOGGING configuration parameter to the Okera configuration file, set its value to true, and restart Okera. Valid values for this new configuration parameter are true and false. The default is false.

New Group Resolution Script Support

This release introduces support for a group resolution script. You can use bespoke systems, such as custom REST APIs or data stores to identify the group to which a user belongs. To support this functionality, a new configuration parameter has been added: GROUP_RESOLVER_SCRIPTS. For information on how to set up your script and use this configuration parameter, see Custom Script-Sourced Group Resolution.

Notable Changes

  • Support for AWS IMDSv2, previously announced in Okera version 2.9.2, has been removed and the Okera 2.9.2 release has been archived.

Bug Fixes and Improvements

  • By default, Okera connections now use autocommit=true when connecting to the RDBMS for system state actions.
  • Upgraded the base Alpine image used by Okera to 3.14.6.
  • Reverted the formatted result change made for SHOW CREATE TABLE and SHOW CREATE VIEW statements in version 2.9.0 because it caused errors in some external tests. Plans are in place to reinstate this update in another way.

2.9.2 (4/26/2022)

This release is archived and is no longer available. Support for AWS IMDSv2, previously announced in Okera version 2.9.2, has been removed.

2.9.1 (3/31/2022)

Bug Fixes and Improvements

  • Fixed an issue in which connections to Azure Keyvault failed.
  • Fixed an issue with null handling in BigQuery user-defined functions (UDFs). After upgrading, the UDF creation script needs to be rerun.

  • Fixed an issue in which the Snowflake audit log sync ran on both Okera Policy Engine (planner) and Okera Enforcement Fleet worker instances. It should only happen on the Policy Engine instance.

  • Fixed a bug in which the date offset in Snowflake did not account for daylight savings time.

  • Fixed an issue where nScale Enforcement Fleet workers would not start.

  • OAuth with OpenID Connect now allows a way to configure which part of the id_token is used to identify the user. In previous releases, sub was always used. Now, the OAUTH_OIDC_USER_CLAIM_KEY configuration parameter can be used to identify the id_token key to use as the user identifier. For example, to use email instead of sub, specify OAUTH_OIDC_USER_CLAIM_KEY: email in the Okera configuration file. If not specified, the default is sub.

  • Added support for decimal types in core tokenize user-defined functions (UDFs).
  • Google OAuth is now supported. To enable it, include the following Okera configuration setting, in addition to the required OAuth configuration settings: OAUTH_PROVIDER: google. Okera treats the OAuth provider as a generic OAuth 2.0 provider, which will work for many OAuth 2.0 providers without additional configuration. However, if you do not set this configuration setting, it will not work for Google or G Suite (Google Workspace). The only valid value for this setting is google. See Authenticate Using Open Authorization (OAuth).
  • Added support for all Okera AWS CLI versions through the latest version, 1.22.71. Support before this change may have been limited to the default installation of Amazon Linux 2 (AWS CLI version 1.18.147).

  • Okera's Databricks cluster bootstrap script now supports downloading client libraries from any Apache Maven artifactory.

  • Performance is now improved when evaluating grants of the form HAVING ATTRIBUTE not in <attribute name> and HAVING ATTRIBUTE not <attribute name>.

2.9.0 (3/11/2022)

New Snowflake Policy Synchronization

In recent releases, Snowflake introduced several advanced security and governance features such as row access policies, column-level security, and dynamic data masking. To support these Snowflake features, Okera 2.9 introduces a new policy enforcement mechanism called policy synchronization.

Using policy synchronization enforcement, Okera functions as the central policy manager, pushing universal data access policies into Snowflake. This applies Okera's fine-grained access controls onto Snowflake objects, such as roles, permissions, and row access policies, allowing Snowflake to enforce policies defined and managed in Okera, while removing Okera from the Snowflake query execution path. Your Snowflake users can continue to use the full suite of Snowflake features, including Snowflake SQL, drivers, and tools, but the data they can access is governed by Okera. Prior versions of Okera did not allow Snowflake to be used natively and had limited support for Snowflake’s feature suite.

Authoring and managing access policies in Okera is easier than defining access policies in Snowflake. Okera's constructs are simple and easy to understand. And Okera is platform-agnostic; a single policy can be applied to multiple data platforms, including Snowflake.

Policy synchronization enforcement requires special configuration steps in Snowflake and in Okera. Some new Okera configuration parameters are introduced in this release as part of these configuration steps. In addition, when you change the Okera permissions for a Snowflake connection, the Snowflake connection must be synchronized so the Okera policy is applied to the corresponding Snowflake databases. This synchronization occurs automatically at a specified interval, but can also be instigated manually, as needed.

For more information on how to integrate and use Okera's policy synchronization enforcement with Snowflake, see Snowflake Data Source Connections and Configure and Use Snowflake Policy Synchronization Enforcement.

OkeraEnsemble: File Access Control ( Preview Feature)

This release introduces OkeraEnsemble, file-based access control, in Amazon Web Services (AWS) S3 environments.

Okera file access control (OkeraEnsemble) extends Okera’s access control to files and objects in cloud object stores, such as Amazon S3. Administrators can now grant team members access to create, modify, copy, or delete objects under a URI. See OkeraEnsemble Overview.

DDL Improvements

The following DDL improvements were made in this release:

  • Okera now supports dropping multiple partitions in a single SQL command. See Adding or Dropping Multiple Partitions in a Single ALTER TABLE Statement.

  • Okera now supports dropping:

    • Partitions by specifying only part of a partition specification.
    • Nested partitions within a parent partition when you drop the parent partition.

    See Dropping Partial or Nested Partitions in an ALTER TABLE Statement.

  • The CREATE TABLE statement no longer requires an ALL grant for the URI. If the table is EXTERNAL, only a READ grant is required. If a table is not external, it requires READ and WRITE grants. Similar grant requirements are also now applied to the ALTER TABLE SET LOCATION statement. To revert to the previous behavior that required an ALL grant, change the new configuration setting called ENABLE_LEGACY_URI_CHECKS to true. By default this setting is false.

    Note: Because Okera required ALL grants in previous versions, this should not impact any current grants as ALL implies both READ and WRITE.

  • You can now reference dynamic parameters in a URI when doing a permission grant. The only currently supported parameter is ${user}. This feature introduces the use of new configuration settings called ENABLE_PARAMETRIZED_URI_GRANTS and ENABLE_PARAMETRIZED_GRANTS, which must be set to true to enable this capability. See Support for Referencing Dynamic Parameters in a URI.

  • You can also now use the ${user} dynamic parameter for database and table grants. This new configuration setting called ENABLE_PARAMETRIZED_GRANTS, which must be set to true to enable this capability. See Support for Referencing Dynamic Parameters in Database and Table GRANTs.

Customizing Group Resolution

You can now customize the classes used when resolving the groups to which a user belongs. Okera's default order is to use JSON web tokens (JWTs) when both JWT and LDAP are configured. To customize the classes used and the order in which they are used, you can use a new configuration setting called CUSTOM_GROUP_RESOLVERS. For more information, see Group Resolution.

Databricks Support

With this release, Okera now supports Databricks versions 8.3, 8.4, 9.0 and 9.1. See Supported Versions.

New Prioritization of Transformations

This release introduces support for the prioritization of transformations when multiple transformations are applied to a single column. If you use autotagging, multiple tags can be assigned to a single column, each with its own transformations. Multiple transformations applied to a single column make the column inaccessible with a conflict.

To resolve this problem, you can now prioritize the transformation functions. The prioritization applies across the entire system. Wherever multiple transformations occur for any column, the prioritization is applied.

To prioritize transformations, add the new TRANSFORM_UDF_PRIORITIES configuration property to your configuration settings. Values for this property are the transformation function names, specified in sequence, separated by commas. The last function name in the list has priority over all the preceding functions in the list as well as priority over functions not included in the list.

For complete information, see Prioritization of Transformations.

Notable Changes

  • In older versions of Okera, the token might be included as a query argument in the URL (e.g. for /api/scan). In such cases, the token was escaped. With this release, we have disabled escaping tokens by default. Optionally, you can reenable the legacy behavior by specifying OKERA_LEGACY_TOKEN_ESCAPE=true in the configuration file.

  • In this release, Okera upgraded its supported version of MySQL from 8.0.27 to 8.0.28 and the version of Java required to support this later version of MySQL (from Java 3.3.1 to 4.0.3). These MySQL changes also require TLS 1.2. Okera recommends that you upgrade your MySQL databases to a newer version (that supports TLS 1.2) or that you disable SSL in Okera client configurations. If you do not, TLS errors will result.

  • In past versions, Okera's operational logs did not use a partitioning scheme when uploading. This made it hard to locate the logs you needed and, in some environments, increased the time to list the log files. A new configuration option, WATCHER_LOG_PARTITIONED_UPLOADS has been added to the configuration file to enable partitioned log uploads. Valid values are true and false. When enabled (true), operational log files will use the ymd=YMD/h=H/component=C partitioning scheme for operational log file uploads. By default, this setting is disabled (false) so older clusters are not affected. However, in a future version of Okera, it will be enabled by default, so Okera recommends that users adopt this in new deployments.

  • The AUDIT_LOGS_SYNC_FREQUENCY configuration parameter has been renamed AUDIT_LOGS_SYNC_FREQUENCY_MINS.

Bug Fixes

  • Fixed an issue where a very large query run from the UI Workspace page would block a user from logging in again.

  • Fixed a bug in the Registration page where the SQL command for creating the table needed an EXTERNAL qualifier.

  • Fixed an issue where the REST server returned content even when the HTTP response code was 204 No Content.

  • Added configuration settings PRESTO_HTTP_CLIENT_MAX_CONNECTIONS_PER_SERVER and PRESTO_HTTP_CLIENT_MAX_REQUESTS_QUEUED_PER_SERVER. These settings customize the Presto configuration properties that control the maximum number of concurrent connections for a server(http-client.max-connections-per-server) and the maximum number of requests queued per destination (http-client.max-requests-queued-per-destination). See Configuration. If these values are not specified in Okera, the values specified in your Presto environment are used (20 for http-client.max-connections-per-server and 1024 for http-client.max-requests-queued-per-destination).

  • Fixed a race condition that occurred when invoking Hive UDFs.

  • Fixed an issue where evicted pods caused Enforcement Fleet worker registration to fail.

  • Fixed an error that occurred when unnesting array columns with duplicate column names.

  • Improved performance when listing roles.

  • For Spark 2 environments, pushdown of date filters can now be performed without the DATE function if you use SparkConf to set spark.recordservice.spark.parser to CatalystScan. By default, this setting is set to PrunedFilteredScan. See Spark Notes.

  • Reduced Okera's memory footprint when working with partitioned tables.

  • Fixed a bug in which the preview of registered tables did not work.

  • Fixed a bug in which the View SQL button () did not work for reports on the Insights page.

  • Fixed a bug in which the Database Permissions tab was cut off when the safeguard policies message displayed.

  • Fixed a bug in which the Registration page did not load for a newly created crawler with many datasets.

  • Fixed a bug in which the unregistered and registered dataset counts showed as undefined.

  • To resolve a bug, domain optimization is now skipped for decimal column filters in Parquet and optimized row column (ORC) tables.

  • Fixed a UI bug in which a non-string column value was implicitly cast to string in row filters.

  • Corrected a problem in which the View as SQL option for a policy produced an empty POLICYPROPERTIES() SQL statement.

  • Fixed an issue in which tags that were named the same as Okera reserved keywords (for example, pii.date) caused errors. Tag names that match Okera reserved words must be escaped using backtick (`) characters.

  • Added support for all Okera AWS CLI versions through the latest version, 1.22.71. Support before this change may have been limited to the default installation of Amazon Linux 2 (AWS CLI version 1.18.147).

  • When adding partitions, partition keys can now be of type CHAR(n).

  • Optimized the memory used when planning queries on tables with many partitions.

  • Added support for longer values for table properties when using Hive Metastore (HMS) 2.3 and later. To use these versions of HMS, the setting of the ENABLE_HMS_2_SCHEMA configuration property must be set to true. When set, it will allow longer table property values to be specified (up to 16M characters now, an increase from 4K in earlier HMS releases). You should only set this property if you have HMS 2 or later installed.

  • When the JWT_JWKS_URL configuration parameter is specified and the RFC key type (kty) parameter is set to RSA, but no algorithm (alg) parameter is supplied, Okera now defaults to RSA256.

  • Increased the Kubernetes healthcheck timeout setting (timeoutSeconds) specified in Okera's 06-catalog.yaml files from one second to 15 seconds.

  • The default internal RPC timeouts used by the Okera client and server have been increased. After SASL authentication, the RPC timeout has been increased from 600000 ms (10 minutes) to 3,600,000 ms (60 minutes). The server RPC timeout has been increased from 120000 ms (2 minutes) to 1,800,000 ms (30 minutes).

  • Fixed a bug in which the SHOW CREATE TABLE did not display the OpenCsvSerde processing specified for a table.

  • Disabled multi-column tagging in tables.

  • Resolved a problem so that uppercase characters in tag names referenced in ALTER ADD or ALTER DROP statements no longer cause those statements to fail. Okera tags are now case-insensitive.

  • A formatted result is now returned for SHOW CREATE TABLE and SHOW CREATE VIEW statements.

  • Okera no longer attempts to autotag external views, as this is an undefined operation.

  • Improved the speed at which partitions are recovered when two-tier partitioning is used.

  • Fixed a problem in which ALTER TABLE DROP PARTITION statements failed for delta tables with empty or missing inferred partitions.