Skip to content

Okera Version 2.1 Release Notes

This topic provides Release Notes for all 2.1 versions of Okera.

2.1.10

Bug Fixes and Improvements

  • Fixed a forward-compatibility issue with 2.2.0.

2.1.9

Bug Fixes and Improvements

  • Fixed an issue where a user could create external views in any database using Presto's CREATE VIEW DDL, even though they may not have the appropriate grant on that database.

2.1.8

Bug Fixes and Improvements

  • Fixed an issue where schema inference (used in Data Registration and CREATE TABLE LIKE FILE) for JSON-based tables would incorrectly remove leading underscores and double underscores from column names.

2.1.7

Bug Fixes and Improvements

  • Added the ability to specify additional Presto configuration values using the PRESTO_ARGS configuration value, e.g., PRESTO_ARGS: "task.concurrency=16 task.http-response-threads=100". Using this capability should be done in coordination with Okera Support.
  • Fixed an issue where the REST Server pod would not restart quickly enough if a failure happened on startup.
  • Fixed an issue where the error dialog in the Data Registration page could not be closed.
  • Improved Presto behavior on creating and closing connections to the Okera Enforcement Fleet workers.
  • Changed the default Presto maximum stage count to 400.
  • Fixed an issue where an ABAC policy that included row filters would generate the WHERE clause missing parentheses around.
  • Fixed an issue where newer Parquet files that included the INT_32 and INT_64 logical types would cause a Parquet read error.

2.1.6

Bug Fixes and Improvements

  • Fixed an issue in Data Registration UI that made pagination behave erratically when using Glue as the backing metastore.
  • Fixed an issue in Data Registration UI where auto-discovered tags would not show up if the column was not editable.
  • Fixed an issue where the in-memory group cache would be overridden with empty groups.
  • Fixed an issue where CSV files that had empty strings would not be automatically converted to NULL values.

2.1.5

Bug Fixes and Improvements

  • Added the ability to increase the REST and UI timeouts to arbitrary values (previously limited to 60 seconds).
  • Removed a restriction when unnesting nested types that did not allow WHERE clauses to be used in those queries.

2.1.4

Bug Fixes and Improvements

  • The HMS length restriction removal will now run at startup for all clusters (unless disabled), not just upgraded clusters.
  • Fixed an issue where keywords were not always escaped in ABAC transforms and filters.
  • Fixed an issue in the UI where the privacy function dropdown in the Visual Policy Builder had the wrong default.
  • Fixed an issue where Okera errors were not propagating to Presto when creating an external view from Presto.

2.1.3

Bug Fixes and Improvements

  • Updated Presidio to not require any network connectivity in all cases.
  • Fixed an issue where the Datasets UI would render table headers over some dropdowns.
  • Improved the performance of the Datasets page when loading individual datasets.

2.1.2

Bug Fixes and Improvements

  • Fixed an issue when creating a crawler with single-file datasets, causing the registered datasets to use the directory path instead of the file path.
  • Fixed an issue where editing policies in the Policy Builder could in some cases cause an error on saving the edited policy.
  • Fixed an issue where using restricted keywords in Policy Builder would not be escaped properly in some cases.
  • Fixed an issue when using MySQL as the backing database could cause some data types to not be converted correctly via JDBC in some cases, causing exceptions.

2.1.1

Bug Fixes and Improvements

  • Several improvements to handling of S3 errors and failure conditions for very large files.
  • Fixed an issue where in some cases (typically large) Parquet files would cause an error when being queried.
  • Fixed an issue in the Databricks connector where a table would be missing the SerDe path parameter when the table was not cluster local.
  • Fixed an issue in policies where if you had two ABAC policies, one which included a transform and one which did not, they would not compose correctly (this resulted in giving less access than desired in all cases).
  • Fixed an issue when upgrading from 1.5.x where the DB schema upgrade could fail under certain conditions.
  • Fixed an issue in the Presto connector where if a JDBC client issued a query against INFORMATION_SCHEMA with underscores, Presto would error out.

2.1.0

New Features

Extending Attribute-Based Access Control Policies to Support Data Transformation Functions and Row Filtering

Attribute-based access control policies now support data transformation functions and row filtering. This is supported with an extension to the current ABAC grant syntax. Read more here.

This can significantly simplify how policies can be managed, reduce, or eliminate the need to create views, and make it much easier to manage complex policies. You can easily create these policies in the UI by specifying ABAC access conditions in the policy builder. See examples of the different policies you can create using Okera's Policy Engine here.

Tag Cascading and Inheritance

Attributes assigned on tables/views and their columns will automatically cascade to all descendant views. Read more about this capability here.

View Lineage

Okera will now maintain the lineage of datasets created after 2.1. It is now possible to know for a given dataset (table or view) what are all the views that descend from it, and for a view to know all its ancestors. This information is also exposed in the UI. Read more here.

Improved Privacy Functions

Okera has a revamped set of privacy-related functions to aid in anonymization with different guarantees. Read more about Okera's privacy and security functions here

Users Page and Inactivity Report

The web UI now includes a new Users page, where all the users that have authenticated in the system can be viewed, as well as their groups as per the last time they made a request through Okera. This makes it easier to understand if a user should have access to something or not.

The Users page also lets you generate a User Inactivity Report, that shows you all the users who have any level of access on a database but have not queried that database within a given timeframe. This report helps identify users who may not need access to data anymore since they are not utilizing, thereby improving least privilege.

Enable access to the Users page in the UI by granting a user or group access to the okera_access_review_role.

GRANT role okera_access_review_role to group marketing_steward_group;

Read more about his capability here.

Access Control for Attribute Namespaces

You can now control access to management of tags by namespaces. ATTRIBUTE NAMESPACE has been added as a new object type and CREATE ADD ATTRIBUTE and ALL access levels are supported on it. For example, if you wanted to give access to a role to create, drop and assign attributes from a particular attribute namespace, you would use the below:

GRANT ALL on ATTRIBUTE NAMESPACE marketing TO ROLE marketing_steward;
In addition, to this if you wish to grant access to the tags page in the UI, so that a user can create and manage tags there, grant okera_tags_role to that user's group.

Note: To assign attributes on data you will still need to have the correct privileges for the data you are trying to assign on. See Controlling Who Can Assign Tags on Objects for more details.

Other Tag Management Updates

  • Only editable tags (ones the ones a user has CREATE or ALL on) show up on the Tags page.
  • Adding/removing tags from a dataset will ignore tags a user does not have privileges on.
  • Tags page now requires SELECT access on okera_system.ui_tags. The built-in okera_tags_role has this privilege by default.

VIEW_AUDIT Privilege to Control Access to Audit Logs

You can now grant VIEW_AUDIT privilege on data, to enable a user to view audit log information for that object. For example, if the user only had VIEW_AUDIT on two databases, they would only see reports for those two databases in the UI or when querying the okera_system.reporting_audit_logs view. To see the Insights page in the UI, you also need the okera_reports_role. See Access to the Insights Page for more information.

Note: The default privilege required to view audit log records for objects has been changed from SELECT to VIEW_AUDIT.

Presto SQL in Workspace

The workspace now features Presto SQL mode, which allows executing queries against an Okera cluster using Presto. See Workspace for more details.

Creating Views Using Presto

It is now possible to create and delete external views via Presto directly. These views will be stored in the Okera catalog (as external views) and be accessible via Presto.

To do this, execute a DDL like this in Presto (e.g., via the Okera Workspace or an application such as SQL Workbench or DBeaver):

CREATE VIEW some_db.some_view AS SELECT ....

To support this, Okera has added extensions to the CREATE VIEW DDL statement when executed in Okera:

CREATE EXTERNAL VIEW <db>.<view> (
    <col name> <col type>,
    ...
) SKIP_ANALYSIS USING VIEW DATA AS 'SELECT ...'

This DDL requires the user to specify the full set of columns that the view statement produces (including types), as the view statement is not parsed or analyzed.

Improved JSON File Format Support

  • Starting from 2.1.0, Okera uses simdjson to read JSON file format data.
  • Several improvements for auto-inference of JSON file formats with support for appropriate data types. Extensive testing on various JSON files on auto generated files and several internet sources.

Oracle Data Source Support

Oracle is now supported as a JDBC data source. The Oracle JDBC driver will need to be configured as a custom driver. Read more on how to configure this here.

More Metadata Available on Dataset Details

There are several improvements to the dataset details view in the UI:

  • Much more detailed technical metadata is included
  • It is now possible to edit the description of a dataset
  • It is now possible to edit column comments in the dataset schema
  • View parent/child lineage information is available for views created in Okera
  • Column-level tags are included in the details view along with table-level tags
  • Dataset schema can be filtered by data with column-level tags

Ability to Create Views From the UI

Admin users can now create an internal view based on an existing view or table from the datasets page. Choosing the destination database and view name and selecting the columns to be included in the new view are supported. For more information, see Create a View of Data.

Permission Management Improvements

  • A Permissions tab has been added to the Details tab for a dataset. Like on the Roles page, you'll be able to fully manage permissions associated with the specified dataset. You can read more about this on the Datasets page.
  • Data transforms and row filtering added to Policy Builder UI.
  • Ability to edit existing policies in the UI. To learn more about editing and managing policies, go to Editing Permissions.
  • An admin user can now create a view from a dataset

Reports Page Improvements

The Reports page has a number of major improvements, including:

  • New reports for Activity overview, Active users over time, Top accessed tags, and Recent queries.
  • SQL used to generate the reports is available in-page and can be run in Workspace.
  • Custom time ranges are available within the last 90 days.
  • Reports queries use human-readable times instead of unix timestamps.
  • Reports can now be filtered by dataset and tag as well as database.
  • Reports filters now allow for multi-selection.
  • Visual updates.

For more details, see the Reports page documentation.

UI Visual and Interaction Updates

  • There are small visual updates and improvements throughout the UI focused on clarity and better use of screen real estate.
  • The output of the workspace has been reworked to better retain a user's context and show history.

Updates to Reporting and Audit Views

  • New audit table and view have been added to okera_system database - analytics_audit_logs and reporting_ui_analytics. These are populated by the cdas-rest-server container and are primarily used to track and analyze usage of the UI. For now, the UI only writes there on page visit. The data is stored in the same logging directory as regular audit logs in its own subfolder.

  • The view used by reports, okera_system.reporting_audit_logs, now includes start_time_utc and end_time_utc columns of type TIMESTAMP_NANOS for better readability.

Improved REST Server Diagnostics Logging

  • Logs now include timestamp and log level.
  • Log level can be set using the REST_SERVER_LOG_LEVEL configuration parameter. Valid values are DEBUG, INFO, WARNING, ERROR, and CRITICAL. The default is DEBUG.

Bug Fixes and Improvements

  • Upon renaming tables, the attributes from the old tables are now carried to the new renamed table.
  • Performance improvements to parallelize queries with UNION ALL in it. With this enhancement, queries with UNION ALL leverage Okera multitasks across Enforcement Fleet workers versus single tasks for UNION ALL prior to this fix.
  • Performance improvements on dropping tables with large number of partitions.
  • Performance improvements on DROP DATABASE CASCADE to drop all tables under the database.
  • For JDBC data sources, large numeric/decimal types (>38 precision) are now handled. The precision is capped at 38 for the larger numeric/decimals or unspecified p/s in the source. If unspecified, for scale, 6 is the default scale post Okera version 2.1.0 .
  • Fix to handle negative decimals in JDBC data scan. The scale is also treated HALF_DOWN for rounding large scale decimals.
  • For create view command if database is not specified, default database name is considered for the view.
  • Fix for parse errors on views with JDBC tables in the view definition (Joins between JDBC and non-JDBC tables).
  • Arrays of arrays are now supported in Okera.
  • log4j2 support: Okera now uses log4j2 as default for logging. A backwards compatibility bridge as recommended by Apache project is used for libraries that still use log4j, like certain Hadoop/Hive libraries.
  • Support for LIMIT on JDBC data sources. This improves the preview of data from JDBC data sources where the data is limited to 100 by default from the Okera Web UI.
  • Better error handling on JDBC data source auto-inference errors on unsupported datatypes.
  • Fix for a regression on authorization on CTEs (WITH Clause) with aggregations in the query.
  • For views involving avro file format, that have column definitions, like complex structs that can have > 4000 characters, use the schema from the avro file instead of creating the physical columns in the database.

    Note: The describe formatted for such tables/views with > 4k columns still do not show the column details. The describe <table/view> would show the correct definitions.

  • Several bug fixes to handle parquet file format issues gracefully. For example, parquet files with unsupported DataPageHeaderV2 would crash the Enforcement Fleet workers. These are now handled with a graceful error message.
  • Reduce pinger verbose level from error to warn for the Sentry/Hive pinger. This will improve error diagnostics for real catalog exceptions. Earlier, this used to flood the logs with invalid errors.
  • A bug fix for count(*) on a JDBC view to return results instead of a failure.
  • Ability to specify Glue AWS region which can be separate from the cluster default region.
  • The recordservice catalog in presto is disabled by default starting from 2.1.0 .
  • Additional controls for JDBC (PrestoDB) -> Okera configurations. For example, the rpc timeouts are now parameters that can be controlled from an environment setting. OKERA_PRESTO_PLANNER_RPC_MS and OKERA_PRESTO_WORKER_RPC_MS
  • Minor improvement to remove SerDe info from SHOW CREATE TABLE command. Prior to this fix, re-running the output from SHOW CREATE TABLE command would error out due to the duplication of SerDe info and the FILE FORMAT info. Post this fix, the SHOW CREATE TABLE command would not have the SerDe info and hence re-running the output would work as is.
  • Fix for an avro file format error that has a union with default values in it.
  • UI: Better row hover state highlighting on grouped table rows.
  • UI Error boundaries introduced for increased stability in JavaScript.
  • Policy Builder layout and formatting improvements.
  • Contextual restrictions on Policy Builder UI including conditional disabled create/edit/delete.
  • More nuanced permission conflict reasons.
  • Upgraded node to 12.15.0.
  • The Presto connector has several improvements for performance, utilizing more efficient APIs and serialization/deserialization formats.
  • Several performance improvements for queries over Parquet files and queries with joins.
  • In the Okera Policy Engine (planner) and Okera Enforcement Fleet worker debug UI, the number of queries displayed has been increased to 256.
  • The audit log has a new field added to it, ae_attribute, which captures all attributes accessed as part of this query.
  • Fixed an issue in the /scan API where some Decimal values would not be serialized correctly.
  • Several improvements to schema detection for TEXT-based files (especially CSV).
  • Added support for md5() (based on the Hive UDF).
  • The has_access() built-in function now supports checking against all privilege levels (previously it only supported ALL and SELECT).
  • Fixed an issue where it was not checked whether an attribute existed or not in some DDL statements that modified attributes.
  • Fixed an issue where the CREATE_AS_OWNER privilege at the catalog level incorrectly gave the SHOW privilege at that scope as well.
  • Improvements to error handling and recovery of metadata operations.
  • Improved default tuning parameters in large memory environments.
  • PyOkera now properly converts all values to JSON-serializable types when scan_as_json is used.
  • Improved admission control when Enforcement Fleet workers are over-subscribed on either active connections or memory metrics.
  • For Gravity-based deploys, Gravity has been upgraded to 6.1.16 LTS.
  • Improved error handling and recovery of the data registration crawler in case of failures.
  • Added the ability to increase the timeout for initializing the catalog on cluster startup by setting the CATALOG_INIT_STARTUP_TIMEOUT configuration value.
  • Fixed an issue where some system tables were not dropped prior to creating them on startup, which can cause an issue on upgrades.
  • Fixed an issue where the audit logs would have incorrect values in case of an error during initialization of an incoming request.
  • Added the ability to specify a column list when executing ALTER VIEW, in the same manner as CREATE VIEW.
  • Improved error message when using non-absolute S3 bucket paths.
  • Improved error handling when parsing a view definition that Okera cannot parse for an external view.
  • Fix an issue where service discovery would consider Kubernetes objects in a different namespace.
  • Fixed an issue where the system would generate unnecessary baseline queries, creating log noise.
  • Added the ability to specify a privilege level filter for the GetTables and GetDatabases APIs.
  • Fixed an issue in PyOkera when handling the CHAR type when there are null values in the data.
  • Fixed an issue where the ae_role column was not always populated for some role-related DDLs.
  • Improved the logging in the Okera REST Server.
  • Added the ability to configure the Okera Policy Engine (planner) and Enforcement Fleet worker RPC timeouts in Okera's Presto, using the OKERA_PRESTO_PLANNER_RPC_MS and OKERA_PRESTO_WORKER_RPC_MS configuration values respectively. The defaults are 300000ms and 1800000ms respectively.
  • Improved retry handling for retriable S3 errors (such as Server Busy, etc.).
  • Fixed a bug where database names were not escaped when created in the registration UI.
  • 're-autotag' button on the datasets page now causes the new tags to be fetched upon completion.
  • The UI has several new icons.
  • Workspace now includes an execution timer for queries.
  • Improved errors are reported for bad schema found during registration.
  • Fixed a bug where they UI allowed users to 'tag' partitioning columns, but such tags had no effect.
  • Now all dataset views show their view string.
  • "Queries by duration of planner request" is no longer part of the Reports page.

Notable and Incompatible Changes

  • Starting from 2.1.0, the published Okera client libraries for PrestoDB support PrestoDB versions greater than 0.234.2 and above.
  • ZooKeeper has been removed as a system component - Okera will now leverage Kubernetes to maintain the Okera Enforcement Fleet worker membership list.
  • The default per-user okera_sandbox database has been removed.
  • When creating Okera views (i.e., internal/secure views), it is now required for the creator to have the ALL privilege on all referenced datasets. This is done to ensure that these tables cannot be incorrectly exposed by users with lesser permissions.
  • Removed the 4000-character limitation on column types.

    Note: This changes the underlying HMS schema, and if connected to a shared HMS, should be disabled by setting the HMS_REMOVE_LENGTH_RESTRICTION configuration value to false. This is only done for new HMS databases - if you have an existing one from a prior installation, contact Okera Support for migration procedures.

  • The default privilege required to view audit log records for objects has been changed from SELECT to VIEW_AUDIT. This means some users may no longer be able to see audit logs for their data (if they previously only had SELECT access to it) and will need to be granted VIEW_AUDIT on data they wish to view audit logs for.
  • ML and decision-tree-based autotagging is now enabled by default.
  • OKERA_REPORTING_TIME_RANGE can no longer be used to restrict the available time range in Okera reports.
  • In 2.1.x, many data correctness issues will now fail queries as opposed to silently ignoring them (e.g., converting data into NULL, etc.) as in previous versions. To revert the behavior, add --abort_on_error=false to RS_ARGS.

SQL Keywords

The following terms are now keywords, starting in 2.1.0:

  • CXNPROPERTIES
  • DATACONNECTION
  • DIAGNOSTICS
  • DO
  • EXCEPT
  • TIMESTAMP_NANOS
  • VIEW_AUDIT
  • VIEW_COMPLETE_METADATA

Known Issues

  • The Okera PrestoDB Connector shipped with this version is compatible with PrestoDB 0.233 and higher. This connector is currently not compatible with any released version of PrestoDB on EMR, as the version of PrestoDB shipped is older than 0.233. This will be fixed in a subsequent 2.1.x maintenance release.