Okera Version 2.0 Release Notes¶
This topic provides Release Notes for all 2.0 versions of Okera.
Bug Fixes and Improvements¶
- Fixed an issue where many concurrent
CREATE VIEWstatements could be slowed down waiting on a shared resource.
- Fixed an issue when authorizing queries on views with complex types.
- Added an option to use the
SYSTEM_TOKENas the shared HMAC secret for signing and validating tasks (in the Okera Policy Engine (planner) and Okera Enforcement Fleet worker services) rather than using ZooKeeper. This option can be enabled by setting
SYSTEM_TOKEN_HMAC: truein the configuration file.
- Fixed an issue where it was not possible to connect to a Postgres instance that did not have
publicin the default
- Added the ability to specify whether the connection to the database should be done using SSL (this was typically auto-discovered, but in some cases the auto-discovery failed).
This can be enabled by setting
CATALOG_DB_SSL: truein the configuration file.
- Fixed an issue where schema upgrades did not work for remote Postgres instances.
- Fixed an issue where the Workspace UI would scroll beyond the window if there was a long error.
Bug Fixes and Improvements¶
- Added the ability to edit dataset and column descriptions in the Okera UI.
- Fixed an issue in which datasets could not be registered if they had columns with type definitions that exceeded 4,000 characters.
Added more control options for LDAP group resolution configuration:
Fixed an issue where Avro datasets that had a union type with a single child (e.g.,
union(int)) would throw an error. These types of unions are now fully supported.
- Fixed an issue where decimals that were stored as a
byte_arrayin Parquet files were not read correctly.
- Added a configuration option to control the maximum number of allowed Sentry and HMS connections:
- Fixed an issue in which changing the description of the view (or a column in it) via DDL was not supported.
- Fixed an issue where columns that contained arrays or maps with embedded
nullvalues were not handled correctly in the Java-based clients.
- Fixed an issue in PyOkera where it would incorrectly decode negative decimal values with a precision higher than 18.
- Fixed an issue when
--allow_nl_in_csv=Truewas set and the CSV file used a different quote character than
"- it would improperly use the
"to escape line breaks.
- Improve the Crawler's ability to automatically use the
OpenCSVSerDe when necessary.
- Fixed issues for handling complex types that had several nested arrays/structs/maps with
- Fixed an issue where reserved keywords were not possible to be used (as escaping them wouldn't work) as attribute namespaces and attribute keys (e.g.,
- Add the ability to use
CREATE TABLE LIKE TEXTFILE, which will automatically deduce the schema from the CSV file (this assumes the headers are the first line).
- Improved handling of non-parseable SQL statements when accessing a view that was created outside Okera (e.g., in Hive).
This capability is enabled by an environment flag
ALLOW_NONPARSEABLE_SQL_IN_VIEWS: trueset in the configuration file for the cluster.
- Fixed an issue where the same tag could appear twice in the UI.
- Fixed an issue in which dropping an external table referencing a non-existent bucket fails.
- Fixed an issue where the crawler Data Registration page for a given crawler would display incorrect "Registered" tables if their path was a simple prefix of the crawler root path.
- Added support for using a dedicated Postgres server (e.g., on RDS) as the backing metadata database.
Okera now supports bucketed tables and applying efficient joins to them. You can find more details here.
Okera now supports using AWS Glue as the metastore storage, allowing you to connect Okera to an existing Glue catalog. You can read more about this support and enabling it in the Glue Integration page.
Okera now employs an ML-based engine for some of the out of the box autotagging rules, such as address and phone number detection.
You can now create and manage the regular expression-based rules that are used by the autotagging engine in the UI. You can read more about this in the Tags page.
The number of datasets tagged with a tag is now shown in the UI.
Okera can continuously autotag your existing catalog in the background. You can enable this by setting the
ENABLE_CATALOG_MAINTENANCEsetting in your configuration file.
Okera will now autotag the data inside nested complex types and apply the discovered tag(s) at the root column-level.
Azure ADLS Gen2 Support¶
Okera now supports ADLS Gen2 data storage for both querying and data crawling.
You can register these data sources by specifying a path with either the
Note: Okera supports Azure Blob Filesystem Storage (abfs)
*.dfs.core.windows.net), but does not support
Web UI Updates¶
The Okera Web UI has been revamped to be easier to use and update the look-and-feel.
Rolespage has been added, allowing you to fully manage roles (create/update/delete) and their group and permission assignments. You can read more about this on the Roles page.
The 'About' dialog has been replaced by a
JDBC Data Sources¶
- Redshift External Tables are now supported for JDBC data sources of type
There are now SQL commands to work with tags, namely:
DESCRIBE FORMATTED <table>,
DESCRIBE DATABASE <database>will now output tag assignments.
CREATE ATTRIBUTE <attr>and
DROP ATTRIBUTE <attr>will create/remove attributes.
Note: Namespaces will be automatically created if they don't already exist.
SHOW ATTRIBUTEwill show the list of currently existing attributes.
ALTER VIEWnow have new operations of
ADD ATTRIBUTE <attr>,
REMOVE ATTRIBUTE <attr>,
ADD COLUMN ATTRIBUTE <col> <attr>and
REMOVE COLUMN ATTRIBUTE <col> <attr>to add/remove attributes at the table-/view- and column-levels respectively.
ALTER DATABASEnow has new operations of
ADD ATTRIBUTE <attr>,
REMOVE ATTRIBUTE <attr>to add/remove attributes at the database-level.
CREATE VIEWcan now take an optional set of attributes during table creation. For example:
CREATE TABLE mydb.mytable ( col1 int COMMENT "some comment1" ATTRIBUTE myns.myattr1, col2 int COMMENT "some comment2" ATTRIBUTE myns.myattr2, col3 int COMMENT "some comment3" ATTRIBUTE myns.myattr3 )
Rule definitions now accept a "name" field. For backwards compatibility and convenience, the "name" is auto-generated if not specified.
Bug Fixes and Improvements¶
- Okera has updated Docker images that update many dependencies including the base OS, Python, OpenSSL and more.
- Added a way to configure the structure of the data files the crawler will use while crawling. See Create a Crawler for more.
- Added crawler search box on the data registration page.
- Added additional validation for the crawler name and path when creating a new crawler.
- There is now an ability to re-run the autotagging rules on an individual dataset within the
Datasetspage, by using the
- Fixed an issue where datasets with complex types that had a
MAPembedded in a
ARRAYwould not be handled correctly.
- Added the ability to revoke grants on objects that no longer exist.
- Previously by default users would only see reports for datasets they had
ALLaccess to. Since many stewards may not have
ALLaccess on the data, this has now been changed so they will see reports for all data they have
SELECTaccess to. If necessary, this can be configured back to
ALLby editing the view definition of
- Starting from 2.0.0, Okera only supports Amazon EMR versions greater than 5.11.0 up to 5.28.0.
Note: Versions of Amazon EMR less than 5.10.0 continue to work but Okera recommends that you upgrade to a recent Amazon EMR version for latest Okera compatibility.
- The behavior of using
REVOKEon permissions (e.g.,
REVOKE SELECT) has been changed to not cascade by default. For example, in
1.5.xand earlier versions,
REVOKE SELECT ON TABLE mytablewould also revoke any
- Starting in 2.0.0, the published Okera client libraries for PrestoDB support PrestoDB versions greater than 0.225 and above. You can use published Okera client libraries from prior Okera versions (which will continue to work against an Okera 2.0.x and higher cluster) to support earlier PrestoDB versions.
Permissionspage has been removed - all links to it (e.g., in bookmarks) will no longer work.
- Private tags on datasets have been removed. Datasets can no longer by filtered by private tags.
The following terms are now keywords, starting in 2.0.0:
- Starting in 2.0.0, we are deprecating the
odbCLI utilities. If you desire to continue using
odb, the binary from 2.0.x and prior releases should continue to work against. However, in future releases we will not ship new binaries of these utilities.