Okera Version 2.0 Release Notes¶
This topic provides Release Notes for all 2.0 versions of Okera.
2.0.2¶
Bug Fixes and Improvements¶
- Fixed an issue where many concurrent
CREATE TABLE
orCREATE VIEW
statements could be slowed down waiting on a shared resource. - Fixed an issue when authorizing queries on views with complex types.
- Added an option to use the
SYSTEM_TOKEN
as the shared HMAC secret for signing and validating tasks (in the Okera Policy Engine (planner) and Okera Enforcement Fleet worker services) rather than using ZooKeeper. This option can be enabled by settingSYSTEM_TOKEN_HMAC: true
in the configuration file. - Fixed an issue where it was not possible to connect to a Postgres instance that did not have
public
in the defaultsearch_path
. - Added the ability to specify whether the connection to the database should be done using SSL (this was typically auto-discovered, but in some cases the auto-discovery failed).
This can be enabled by setting
CATALOG_DB_SSL: true
in the configuration file. - Fixed an issue where schema upgrades did not work for remote Postgres instances.
- Fixed an issue where the Workspace UI would scroll beyond the window if there was a long error.
2.0.1¶
Bug Fixes and Improvements¶
- Added the ability to edit dataset and column descriptions in the Okera UI.
- Fixed an issue in which datasets could not be registered if they had columns with type definitions that exceeded 4,000 characters.
-
Added more control options for LDAP group resolution configuration:
-
GROUP_RESOLVER_LDAP_POSIX_GID_FIELD_NAME
GROUP_RESOLVER_LDAP_POSIX_UID_FIELD_NAME
-
GROUP_RESOLVER_LDAP_MEMBEROF_FIELD_NAME
-
Fixed an issue where Avro datasets that had a union type with a single child (e.g.,
union(int)
) would throw an error. These types of unions are now fully supported. - Fixed an issue where decimals that were stored as a
byte_array
in Parquet files were not read correctly. - Added a configuration option to control the maximum number of allowed Sentry and HMS connections:
CATALOG_HMS_MAX_THREADS
CATALOG_SENTRY_MAX_THREADS
- Fixed an issue in which changing the description of the view (or a column in it) via DDL was not supported.
- Fixed an issue where columns that contained arrays or maps with embedded
null
values were not handled correctly in the Java-based clients. - Fixed an issue in PyOkera where it would incorrectly decode negative decimal values with a precision higher than 18.
- Fixed an issue when
--allow_nl_in_csv=True
was set and the CSV file used a different quote character than"
- it would improperly use the"
to escape line breaks. - Improve the Crawler's ability to automatically use the
OpenCSV
SerDe when necessary. - Fixed issues for handling complex types that had several nested arrays/structs/maps with
null
values interspersed. - Fixed an issue where reserved keywords were not possible to be used (as escaping them wouldn't work) as attribute namespaces and attribute keys (e.g.,
myns.true
). - Add the ability to use
CREATE TABLE LIKE TEXTFILE
, which will automatically deduce the schema from the CSV file (this assumes the headers are the first line). - Improved handling of non-parseable SQL statements when accessing a view that was created outside Okera (e.g., in Hive).
This capability is enabled by an environment flag
ALLOW_NONPARSEABLE_SQL_IN_VIEWS: true
set in the configuration file for the cluster. - Fixed an issue where the same tag could appear twice in the UI.
- Fixed an issue in which dropping an external table referencing a non-existent bucket fails.
- Fixed an issue where the crawler Data Registration page for a given crawler would display incorrect "Registered" tables if their path was a simple prefix of the crawler root path.
- Added support for using a dedicated Postgres server (e.g., on RDS) as the backing metadata database.
2.0.0¶
New Features¶
Bucketed Tables¶
Okera now supports bucketed tables and applying efficient joins to them. You can find more details here.
AWS Glue¶
Okera now supports using AWS Glue as the metastore storage, allowing you to connect Okera to an existing Glue catalog. You can read more about this support and enabling it in the Glue Integration page.
Autotagging Improvements¶
-
Okera now employs an ML-based engine for some of the out of the box autotagging rules, such as address and phone number detection.
-
You can now create and manage the regular expression-based rules that are used by the autotagging engine in the UI. You can read more about this in the Tags page.
-
The number of datasets tagged with a tag is now shown in the UI.
-
Okera can continuously autotag your existing catalog in the background. You can enable this by setting the
ENABLE_CATALOG_MAINTENANCE
setting in your configuration file. -
Okera will now autotag the data inside nested complex types and apply the discovered tag(s) at the root column-level.
Azure ADLS Gen2 Support¶
Okera now supports ADLS Gen2 data storage for both querying and data crawling.
You can register these data sources by specifying a path with either the abfs://
or abfss://
prefixes.
Note: Okera supports Azure Blob Filesystem Storage (abfs)
dfs
URIs (*.dfs.core.windows.net
), but does not supportblob
URIs (*.blob.core.windows.net
).
Web UI Updates¶
-
The Okera Web UI has been revamped to be easier to use and update the look-and-feel.
-
A
Roles
page has been added, allowing you to fully manage roles (create/update/delete) and their group and permission assignments. You can read more about this on the Roles page. -
The 'About' dialog has been replaced by a
System
page.
JDBC Data Sources¶
- Redshift External Tables are now supported for JDBC data sources of type
redshift
.
ABAC Updates¶
-
There are now SQL commands to work with tags, namely:
DESCRIBE <table>
,DESCRIBE FORMATTED <table>
,DESCRIBE DATABASE <database>
will now output tag assignments.CREATE ATTRIBUTE <attr>
andDROP ATTRIBUTE <attr>
will create/remove attributes.Note: Namespaces will be automatically created if they don't already exist.
SHOW ATTRIBUTE
will show the list of currently existing attributes.ALTER TABLE
andALTER VIEW
now have new operations ofADD ATTRIBUTE <attr>
,REMOVE ATTRIBUTE <attr>
,ADD COLUMN ATTRIBUTE <col> <attr>
andREMOVE COLUMN ATTRIBUTE <col> <attr>
to add/remove attributes at the table-/view- and column-levels respectively.ALTER DATABASE
now has new operations ofADD ATTRIBUTE <attr>
,REMOVE ATTRIBUTE <attr>
to add/remove attributes at the database-level.CREATE TABLE
andCREATE VIEW
can now take an optional set of attributes during table creation. For example:CREATE TABLE mydb.mytable ( col1 int COMMENT "some comment1" ATTRIBUTE myns.myattr1, col2 int COMMENT "some comment2" ATTRIBUTE myns.myattr2, col3 int COMMENT "some comment3" ATTRIBUTE myns.myattr3 )
-
Rule definitions now accept a "name" field. For backwards compatibility and convenience, the "name" is auto-generated if not specified.
Bug Fixes and Improvements¶
- Okera has updated Docker images that update many dependencies including the base OS, Python, OpenSSL and more.
- Added a way to configure the structure of the data files the crawler will use while crawling. See Create a Crawler for more.
- Added crawler search box on the data registration page.
- Added additional validation for the crawler name and path when creating a new crawler.
- There is now an ability to re-run the autotagging rules on an individual dataset within the
Datasets
page, by using theRe-autotag
button. - Fixed an issue where datasets with complex types that had a
MAP
embedded in aSTRUCT
embedded inARRAY
would not be handled correctly. - Added the ability to revoke grants on objects that no longer exist.
Incompatible Changes¶
- Previously by default users would only see reports for datasets they had
ALL
access to. Since many stewards may not haveALL
access on the data, this has now been changed so they will see reports for all data they haveSELECT
access to. If necessary, this can be configured back toALL
by editing the view definition ofokera_system.steward_audit_logs
dataset. - Starting from 2.0.0, Okera only supports Amazon EMR versions greater than 5.11.0 up to 5.28.0.
Note: Versions of Amazon EMR less than 5.10.0 continue to work but Okera recommends that you upgrade to a recent Amazon EMR version for latest Okera compatibility.
- The behavior of using
REVOKE
on permissions (e.g.,REVOKE SELECT
) has been changed to not cascade by default. For example, in1.5.x
and earlier versions,REVOKE SELECT ON TABLE mytable
would also revoke any - Starting in 2.0.0, the published Okera client libraries for PrestoDB support PrestoDB versions greater than 0.225 and above. You can use published Okera client libraries from prior Okera versions (which will continue to work against an Okera 2.0.x and higher cluster) to support earlier PrestoDB versions.
- The
Permissions
page has been removed - all links to it (e.g., in bookmarks) will no longer work. - Private tags on datasets have been removed. Datasets can no longer by filtered by private tags.
SQL Keywords¶
The following terms are now keywords, starting in 2.0.0:
EXECUTE
INHERIT
TRANSFORM
Deprecation Notice¶
- Starting in 2.0.0, we are deprecating the
ocadm
andodb
CLI utilities. If you desire to continue usingodb
, the binary from 2.0.x and prior releases should continue to work against. However, in future releases we will not ship new binaries of these utilities.