Skip to content

Okera Version 2.14 Release Notes

This topic provides Release Notes for all 2.14 versions of Okera.

2.14.3 (1/20/2023)

Security Vulnerabilities (CVEs/CWEs) Addressed

Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.

Bug Fixes and Improvements

  • Optimized the performance of Okera's getPartitions() API endpoint, resulting in lower latency and load on the catalog database.

  • Improved the performance of SHOW CREATE TABLE statements.

  • Fixed a bug that caused null pointer exceptions after an upgrade from Okera 2.11.x. This bug caused problems logging into the UI as a non-admin user.

  • Fixed page errors that occurred when there were conflicts creating permissions.

2.14.2 (12/19/2022)

OkeraEnsemble Updates

Okera has updated how you should deploy OkeraEnsemble nScale mode support with Amazon EMR 5 and Amazon EMR 6. Differences in the two Amazon EMR versions require that OkeraEnsemble nScale mode be deployed differently, based on the version of Amazon EMR you are using.

When deploying OkeraEnsemble nScale in an Amazon EMR 5 environment, set the core-site.xml flag called fs.s3a.s3.client.factory.impl to org.apache.hadoop.fs.s3a.OkeraS3ClientFactory. When deploying OkeraEnsemble in an Amazon EMR 6 environment, set the core-site.xml flag called fs.s3a.s3.client.factory.impl to com.okera.recordservice.hadoop.OkeraS3ClientFactory.

For more information, see OkeraEnsemble nScale Mode Deployment in Amazon EMR Environments.

2.14.1 (12/10/2022)

Amazon EMR 6.5.0 and Spark 3.1.2 Support

With this release, Okera supports Amazon EMR 6.5 and Spark 3.1.2 environments, with one limitation (at this time). The limitation is that you cannot perform an insert operation on a non-TEXT-type partitioned table (for example, ORC, Parquet, Avro) if the Hive recordservice.spark.client-bypass configuration setting is set to true (a requirement today when writing to a SQL table using spark.sql on an Okera-integrated Amazon EMR cluster).

OkeraEnsemble Updates

OkeraEnsemble now supports RSA256 as a JWT algorithm. In past releases, only RSA512 was supported, although Okera itself has always supported both RSA256 and RSA512. The algorithm type used in your environment should be set using the JWT_ALGORITHM configuration parameter.

BigQuery Updates

The following updates have been made for BigQuery connections in this release:

  • You can now inject the Okera connection query ID into BigQuery history and in the Okera audit logs. This ID can be used to correlate the BigQuery project history with the logging in Okera audit logs.

    To support this functionality, a new connection configuration parameter inject.query-id has been added. Valid values are true (enable okera ID injection) and false (do not enable okera ID injection). When enabled for a connection, the ID is injected as a comment in the Okera-generated SQL sent to the connection and appears in BigQuery history. For most connections, the default for inject.query-id is false, but for BigQuery connections, the default is true. See Inject the Okera Connection Query ID Into BigQuery History.

  • You can now register cross-project BigQuery tables from the same Okera connection. For example, using a single connection that references one BigQuery project, you can create a second Okera crawler to crawl the same connection using a second BigQuery project. This new functionality ensures that defining multiple BigQuery connections in Okera is no longer necessary, allowing Dataproc cross-project join queries to complete successfully. It also enables cross-project joins using Presto pushdown, which moves the compute actions to the BigQuery engine and away from the Okera Enforcement Fleet (workers). Finally, it reduces your BigQuery chargeback complexity because all queries get consolidated into a single Okera connection.

Security Vulnerabilities (CVEs/CWEs) Addressed

Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.

2.14.0 (11/30/2022)

OkeraEnsemble (OkeraFS) General Updates

The following updates were made to OkeraEnsemble in this release.

OkeraEnsemble UI and ABAC Support (Preview Feature)

OkeraEnsemble extends Okera's pre-existing fine-grained access controls to unstructured data (URIs). Unstructured data is data that cannot be mapped in a tabular structure, such as a library of images (for example, medical X-rays) or an individual video or sound file. With this release, you can now register and apply permissions to your unstructured data using OkeraEnsemble and the Okera UI. Your unstructured data can be tagged, and its access can now be controlled using Okera's attribute-based access control (ABAC). For more information, see Register Unstructured Data URIs.

OkeraEnsemble System Token Generation Changes in Default Mode

With this release, the system token required by OkeraEnsemble in default (non-nScale) mode is not automatically generated from a private key.

When deployed in nScale mode, the OkeraEnsemble access proxy requires a JWT token to authenticate to the Okera cluster. This token can be provided using either the JWT_PRIVATE_KEY or SYSTEM_TOKEN configuration parameters. When the JWT_PRIVATE_KEY configuration parameter is specified, the OkeraEnsemble access proxy automatically generates its own JWT token with the provided private key. When the SYSTEM_TOKEN configuration parameter is specified, it defines the location of the system JWT token file. If both are specified, the JWT_PRIVATE_KEY takes precedence and is used, by default, to generate the required JWT token..

However, when OkeraEnsemble is deployed in default (non-nScale) mode on the Okera cluster, the access proxy now defaults to using the token defined by the SYSTEM_TOKEN configuration parameter. It no longer will generate the token from a private key.

Apache Ranger Migration Script (Preview Feature)

With this release, Okera provides a Python script you can use to extract Hive, Hadoop Distributed File System (HDFS), and Starburst Enterprise (Trino) policies from an Apache Ranger policy server. The script connects to Ranger, queries for the policies, and generates equivalent Okera DDL in a JSON file. The script can also automatically run the resulting Okera DDL against a running Okera cluster. For more information, see Apache Ranger Migration.

Tag Updates in the UI

This release introduces a new search bar on the Tags page in the UI. Use this search bar to locate a tag listed on the tags page. For more information, see Search for Tags.

UI Changes

  • To support OkeraEnsemble's use in the UI, a new Files tab has been added to the Databases page. From this tab you can register unstructured data (URIs or files), tag it, and apply access permissions to it. See Register Unstructured Data (URIs and Files).

  • The title of the Databases page changed to Data because both structured data (tables) and unstructure data (files and URIs) are now supported.

  • The names of the following buttons or options changed:

    • The Create new connection button changed to Create connection.
    • The option changed to when creating autotags.
    • The Save button on the Create auto tagging rule dialog changed to Create.
    • The Add button on the Create tag dialog changed to Create*.
  • The following dialog titles have changed:

    • The Create new connection dialog title changed to Create connection.
    • The New automatic tagging rule dialog title changed to Create auto tagging rule.
    • The Editing automatic tagging rule dialog title changed to Editing auto tagging rule.

Casing Changes for Table Name Lookups

With this release, Okera functionally ignores case for table name lookups and performs table lookups in a case-insensitive manner. For example, if you query user_Table_1 when the Okera crawler found user_TABLE_1, Okera now generates a query against user_TABLE_1 rather than returning a "no object" response.

Warning

If you have multiple objects with the same name but distinguished by different casing, Okera provides no means to differentiate between them. For example, if TABLE_1 and table_1 are defined in the same database, a query against table_1, tABLE_1, TABLE_1, or any other permutation maps to either TABLE_1 or table_1 but which table it maps to is a function of the original database scan and cannot be controlled. Further note that Okera may generate queries with quoted identifiers, so an input query that does not specify casing may generate a more specific query. For example, if Okera has table_1 defined (unquoted) and is presented with a query that reads select id from TabLe_1, Okera may generate and issue the more specific query: select "id" from "table_1".

Athena Connection Changes

When creating or editing an Athena connection, the default source schema field is now optional.

API Updates

  • A new API endpoint /api/v2/tags/{name}/tagging-rules, with four methods (GET, POST, PUT, and DELETE) was introduced in this release. Use this endpoint to list, create, update, and delete tagging rules.

  • A new API endpoint /api/v2/uri/, with three methods (GET and POST) has been added in this release. Use this endpoint to list, fetch, and register an unstructured data URI.

For information about any Okera API endpoint, see the Okera API documentation, available after you log into the Web UI by appending /api/v2-docs/api/ after the web UI port number (8083). For example: https://my.okera.installation:8083/api/v2-docs/api/.

Security Vulnerabilities (CVEs/CWEs) Addressed

Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.

Bug Fixes and Improvements

  • Fixed an issue where some user attributes were missing in the Okera UI after they were added using the DDL.
  • Fixed the pagination of the list of roles on the Roles page.
  • Fixed a bug that occurred during local worker startup in a Google Cloud Platform Dataproc environment running in nScale mode.

  • Fixed a bug in Okera's phi_zip3 transformations to correctly deidentify string zip codes to the first three digits. In addition, numeric zip codes that fall outside the range of 0 through 99999 now return a null value, rather than the original value. In other words, phi_zip3 transformations are only supported for five-digit numeric zip codes (however, if the zip codes are in string format, this limitation does not apply).

  • Fixed an issue where users were required to have write privileges to view metadata.
  • Fixed an issue where the property okera.external.view in Databricks environments did not always match the value of the cerebro.external.view property.
  • Fixed an issue in which a crawler failed even when abort on error was set to false.
  • Fixed an out-of-resources error that occurred with the OkeraEnsemble access proxy.
  • Fixed the no module named ez_setup errors you might have encountered when installing the OkeraEnsemble plugin.
  • Corrected a bug in error handling during dataset renames.