Skip to content

Okera Version 2.0 Release Notes

This topic provides Release Notes for all 2.0 versions of Okera.

2.0.2

Bug Fixes and Improvements

  • Fixed an issue where many concurrent CREATE TABLE or CREATE VIEW statements could be slowed down waiting on a shared resource.
  • Fixed an issue when authorizing queries on views with complex types.
  • Added an option to use the SYSTEM_TOKEN as the shared HMAC secret for signing and validating tasks (in the Okera Policy Engine (planner) and Okera Enforcement Fleet worker services) rather than using ZooKeeper. This option can be enabled by setting SYSTEM_TOKEN_HMAC: true in the configuration file.
  • Fixed an issue where it was not possible to connect to a Postgres instance that did not have public in the default search_path.
  • Added the ability to specify whether the connection to the database should be done using SSL (this was typically auto-discovered, but in some cases the auto-discovery failed). This can be enabled by setting CATALOG_DB_SSL: true in the configuration file.
  • Fixed an issue where schema upgrades did not work for remote Postgres instances.
  • Fixed an issue where the Workspace UI would scroll beyond the window if there was a long error.

2.0.1

Bug Fixes and Improvements

  • Added the ability to edit dataset and column descriptions in the Okera UI.
  • Fixed an issue in which datasets could not be registered if they had columns with type definitions that exceeded 4,000 characters.
  • Added more control options for LDAP group resolution configuration:

  • GROUP_RESOLVER_LDAP_POSIX_GID_FIELD_NAME

  • GROUP_RESOLVER_LDAP_POSIX_UID_FIELD_NAME
  • GROUP_RESOLVER_LDAP_MEMBEROF_FIELD_NAME

  • Fixed an issue where Avro datasets that had a union type with a single child (e.g., union(int)) would throw an error. These types of unions are now fully supported.

  • Fixed an issue where decimals that were stored as a byte_array in Parquet files were not read correctly.
  • Added a configuration option to control the maximum number of allowed Sentry and HMS connections:
    • CATALOG_HMS_MAX_THREADS
    • CATALOG_SENTRY_MAX_THREADS
  • Fixed an issue in which changing the description of the view (or a column in it) via DDL was not supported.
  • Fixed an issue where columns that contained arrays or maps with embedded null values were not handled correctly in the Java-based clients.
  • Fixed an issue in PyOkera where it would incorrectly decode negative decimal values with a precision higher than 18.
  • Fixed an issue when --allow_nl_in_csv=True was set and the CSV file used a different quote character than " - it would improperly use the " to escape line breaks.
  • Improve the Crawler's ability to automatically use the OpenCSV SerDe when necessary.
  • Fixed issues for handling complex types that had several nested arrays/structs/maps with null values interspersed.
  • Fixed an issue where reserved keywords were not possible to be used (as escaping them wouldn't work) as attribute namespaces and attribute keys (e.g., myns.true).
  • Add the ability to use CREATE TABLE LIKE TEXTFILE, which will automatically deduce the schema from the CSV file (this assumes the headers are the first line).
  • Improved handling of non-parseable SQL statements when accessing a view that was created outside Okera (e.g., in Hive). This capability is enabled by an environment flag ALLOW_NONPARSEABLE_SQL_IN_VIEWS: true set in the configuration file for the cluster.
  • Fixed an issue where the same tag could appear twice in the UI.
  • Fixed an issue in which dropping an external table referencing a non-existent bucket fails.
  • Fixed an issue where the crawler Data Registration page for a given crawler would display incorrect "Registered" tables if their path was a simple prefix of the crawler root path.
  • Added support for using a dedicated Postgres server (e.g., on RDS) as the backing metadata database.

2.0.0

New Features

Bucketed Tables

Okera now supports bucketed tables and applying efficient joins to them. You can find more details here.

AWS Glue

Okera now supports using AWS Glue as the metastore storage, allowing you to connect Okera to an existing Glue catalog. You can read more about this support and enabling it in the Glue Integration page.

Autotagging Improvements

  • Okera now employs an ML-based engine for some of the out of the box autotagging rules, such as address and phone number detection.

  • You can now create and manage the regular expression-based rules that are used by the autotagging engine in the UI. You can read more about this in the Tags page.

  • The number of datasets tagged with a tag is now shown in the UI.

  • Okera can continuously autotag your existing catalog in the background. You can enable this by setting the ENABLE_CATALOG_MAINTENANCE setting in your configuration file.

  • Okera will now autotag the data inside nested complex types and apply the discovered tag(s) at the root column-level.

Azure ADLS Gen2 Support

Okera now supports ADLS Gen2 data storage for both querying and data crawling. You can register these data sources by specifying a path with either the abfs:// or abfss:// prefixes.

Note: Okera supports Azure Blob Filesystem Storage (abfs) dfs URIs (*.dfs.core.windows.net), but does not support blob URIs (*.blob.core.windows.net).

Web UI Updates

  • The Okera Web UI has been revamped to be easier to use and update the look-and-feel.

  • A Roles page has been added, allowing you to fully manage roles (create/update/delete) and their group and permission assignments. You can read more about this on the Roles page.

  • The 'About' dialog has been replaced by a System page.

JDBC Data Sources

  • Redshift External Tables are now supported for JDBC data sources of type redshift.

ABAC Updates

  • There are now SQL commands to work with tags, namely:

    • DESCRIBE <table>, DESCRIBE FORMATTED <table>, DESCRIBE DATABASE <database> will now output tag assignments.
    • CREATE ATTRIBUTE <attr> and DROP ATTRIBUTE <attr> will create/remove attributes.

      Note: Namespaces will be automatically created if they don't already exist.

    • SHOW ATTRIBUTE will show the list of currently existing attributes.
    • ALTER TABLE and ALTER VIEW now have new operations of ADD ATTRIBUTE <attr>, REMOVE ATTRIBUTE <attr>, ADD COLUMN ATTRIBUTE <col> <attr> and REMOVE COLUMN ATTRIBUTE <col> <attr> to add/remove attributes at the table-/view- and column-levels respectively.
    • ALTER DATABASE now has new operations of ADD ATTRIBUTE <attr>, REMOVE ATTRIBUTE <attr> to add/remove attributes at the database-level.
    • CREATE TABLE and CREATE VIEW can now take an optional set of attributes during table creation. For example:
      CREATE TABLE mydb.mytable (
          col1 int COMMENT "some comment1" ATTRIBUTE myns.myattr1,
          col2 int COMMENT "some comment2" ATTRIBUTE myns.myattr2,
          col3 int COMMENT "some comment3" ATTRIBUTE myns.myattr3
      )
      
  • Rule definitions now accept a "name" field. For backwards compatibility and convenience, the "name" is auto-generated if not specified.

Bug Fixes and Improvements

  • Okera has updated Docker images that update many dependencies including the base OS, Python, OpenSSL and more.
  • Added a way to configure the structure of the data files the crawler will use while crawling. See Create a Crawler for more.
  • Added crawler search box on the data registration page.
  • Added additional validation for the crawler name and path when creating a new crawler.
  • There is now an ability to re-run the autotagging rules on an individual dataset within the Datasets page, by using the Re-autotag button.
  • Fixed an issue where datasets with complex types that had a MAP embedded in a STRUCT embedded in ARRAY would not be handled correctly.
  • Added the ability to revoke grants on objects that no longer exist.

Incompatible Changes

  • Previously by default users would only see reports for datasets they had ALL access to. Since many stewards may not have ALL access on the data, this has now been changed so they will see reports for all data they have SELECT access to. If necessary, this can be configured back to ALL by editing the view definition of okera_system.steward_audit_logs dataset.
  • Starting from 2.0.0, Okera only supports Amazon EMR versions greater than 5.11.0 up to 5.28.0.

    Note: Versions of Amazon EMR less than 5.10.0 continue to work but Okera recommends that you upgrade to a recent Amazon EMR version for latest Okera compatibility.

  • The behavior of using REVOKE on permissions (e.g., REVOKE SELECT) has been changed to not cascade by default. For example, in 1.5.x and earlier versions, REVOKE SELECT ON TABLE mytable would also revoke any
  • Starting in 2.0.0, the published Okera client libraries for PrestoDB support PrestoDB versions greater than 0.225 and above. You can use published Okera client libraries from prior Okera versions (which will continue to work against an Okera 2.0.x and higher cluster) to support earlier PrestoDB versions.
  • The Permissions page has been removed - all links to it (e.g., in bookmarks) will no longer work.
  • Private tags on datasets have been removed. Datasets can no longer by filtered by private tags.

SQL Keywords

The following terms are now keywords, starting in 2.0.0:

  • EXECUTE
  • INHERIT
  • TRANSFORM

Deprecation Notice

  • Starting in 2.0.0, we are deprecating the ocadm and odb CLI utilities. If you desire to continue using odb, the binary from 2.0.x and prior releases should continue to work against. However, in future releases we will not ship new binaries of these utilities.