Okera Version 2.14 Release Notes¶
This topic provides Release Notes for all 2.14 versions of Okera.
2.14.3 (1/20/2023)¶
Security Vulnerabilities (CVEs/CWEs) Addressed¶
- CVE-2022-41946 Information Exposure
Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.
Bug Fixes and Improvements¶
-
Optimized the performance of Okera's
getPartitions()
API endpoint, resulting in lower latency and load on the catalog database. -
Improved the performance of SHOW CREATE TABLE statements.
-
Fixed a bug that caused null pointer exceptions after an upgrade from Okera 2.11.x. This bug caused problems logging into the UI as a non-admin user.
- Fixed page errors that occurred when there were conflicts creating permissions.
2.14.2 (12/19/2022)¶
OkeraEnsemble Updates¶
Okera has updated how you should deploy OkeraEnsemble nScale mode support with Amazon EMR 5 and Amazon EMR 6. Differences in the two Amazon EMR versions require that OkeraEnsemble nScale mode be deployed differently, based on the version of Amazon EMR you are using.
When deploying OkeraEnsemble nScale in an Amazon EMR 5 environment, set the core-site.xml
flag called fs.s3a.s3.client.factory.impl
to org.apache.hadoop.fs.s3a.OkeraS3ClientFactory
. When deploying OkeraEnsemble in an Amazon EMR 6 environment, set the core-site.xml
flag called fs.s3a.s3.client.factory.impl
to com.okera.recordservice.hadoop.OkeraS3ClientFactory
.
For more information, see OkeraEnsemble nScale Mode Deployment in Amazon EMR Environments.
2.14.1 (12/10/2022)¶
Amazon EMR 6.5.0 and Spark 3.1.2 Support¶
With this release, Okera supports Amazon EMR 6.5 and Spark 3.1.2 environments, with one limitation (at this time). The limitation is that you cannot perform an insert operation on a non-TEXT-type partitioned table (for example, ORC, Parquet, Avro) if the Hive recordservice.spark.client-bypass
configuration setting is set to true
(a requirement today when writing to a SQL table using spark.sql
on an Okera-integrated Amazon EMR cluster).
OkeraEnsemble Updates¶
OkeraEnsemble now supports RSA256
as a JWT algorithm. In past releases, only RSA512
was supported, although Okera itself has always supported both RSA256
and RSA512
. The algorithm type used in your environment should be set using the JWT_ALGORITHM
configuration parameter.
BigQuery Updates¶
The following updates have been made for BigQuery connections in this release:
-
You can now inject the Okera connection query ID into BigQuery history and in the Okera audit logs. This ID can be used to correlate the BigQuery project history with the logging in Okera audit logs.
To support this functionality, a new connection configuration parameter
inject.query-id
has been added. Valid values aretrue
(enable okera ID injection) andfalse
(do not enable okera ID injection). When enabled for a connection, the ID is injected as a comment in the Okera-generated SQL sent to the connection and appears in BigQuery history. For most connections, the default forinject.query-id
isfalse
, but for BigQuery connections, the default istrue
. See Inject the Okera Connection Query ID Into BigQuery History. -
You can now register cross-project BigQuery tables from the same Okera connection. For example, using a single connection that references one BigQuery project, you can create a second Okera crawler to crawl the same connection using a second BigQuery project. This new functionality ensures that defining multiple BigQuery connections in Okera is no longer necessary, allowing Dataproc cross-project join queries to complete successfully. It also enables cross-project joins using Presto pushdown, which moves the compute actions to the BigQuery engine and away from the Okera Enforcement Fleet (workers). Finally, it reduces your BigQuery chargeback complexity because all queries get consolidated into a single Okera connection.
Security Vulnerabilities (CVEs/CWEs) Addressed¶
- Alpine-13661 Alpine314: Alpine-13661
- CVE-2018-25032 Alpine314: Out-of-bounds Write
- CVE-2021-46828 Alpine314: Allocation of Resources Without Limits or Throttling
- CVE-2022-0778 Alpine314: Loop with Unreachable Exit Condition ('Infinite Loop')
- CVE-2022-1097 Alpine314: OpenJDK
- CVE-2022-1271 Alpine314: Improper Input Validation
- CVE-2022-2097 Alpine314: Inadequate Encryption Strength
- CVE-2022-2309 Alpine314: NULL Pointer Dereference
- CVE-2022-21540 Alpine315: OpenJDK
- CVE-2022-21541 Alpine315: OpenJDK
- CVE-2022-21549 Alpine315: OpenJDK
- CVE-2022-21619 Alpine315: OpenJDK
- CVE-2022-21624 Alpine315: OpenJDK
- CVE-2022-21626 Alpine315: OpenJDK
- CVE-2022-21628 Alpine315: OpenJDK
- CVE-2022-22576 Alpine314: Improper Authentication
- CVE-2022-25647 Alpine315: Deserialization of Untrusted Data
- CVE-2022-27404 Alpine314: Out-of-bounds Write
- CVE-2022-27405 Alpine314: Out-of-bounds Read
- CVE-2022-27406 Alpine314: Out-of-bounds Read
- CVE-2022-27774 Alpine314: Insufficiently Protected Credentials
- CVE-2022-27775 Alpine314: Curl
- CVE-2022-27776 Alpine314: Insufficiently Protected Credentials
- CVE-2022-27781 Alpine314: Loop with Unreachable Exit Condition ('Infinite Loop')
- CVE-2022-27782 Alpine314: Improper Certificate Validation
- CVE-2022-28391 Alpine314: BusyBox
- CVE-2022-29458 Alpine314: Out-of-bounds Read
- CVE-2022-29824 Alpine314: Integer Overflow or Wraparound
- CVE-2022-32205 Alpine314: Allocation of Resources Without Limits or Throttling
- CVE-2022-32206 Alpine314: Allocation of Resources Without Limits or Throttling
- CVE-2022-32207 Alpine314: Incorrect Default Permissions
- CVE-2022-32208 Alpine314: Out-of-bounds Write
- CVE-2022-34169 Alpine315: Incorrect Conversion between Numeric Types
- CVE-2022-35252 Alpine314: Curl
- CVE-2022-37434 Alpine314: Out-of-bounds Write
- CVE-2022-39399 Alpine315: OpenJDK
- CVE-2022-40303 Alpine314: Integer Overflow or Wraparound
- CVE-2022-40304 Alpine314: XML External Entity (XXE) Injection
- CVE-2022-40674 Alpine314: Use After Free
- CVE-2022-42898 Alpine315: KRB5
- CVE-2022-43680 Alpine314: Use After Free
Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.
2.14.0 (11/30/2022)¶
OkeraEnsemble (OkeraFS) General Updates¶
The following updates were made to OkeraEnsemble in this release.
-
OkeraFS was renamed OkeraEnsemble.
-
Python 2.7 is no longer supported for OkeraEnsemble installations.
-
You can now determine the version of the OkeraEnsemble Amazon S3 plugin. See Determine the OkeraEnsemble Amazon S3 Plugin Version.
OkeraEnsemble UI and ABAC Support (Preview Feature)¶
OkeraEnsemble extends Okera's pre-existing fine-grained access controls to unstructured data (URIs). Unstructured data is data that cannot be mapped in a tabular structure, such as a library of images (for example, medical X-rays) or an individual video or sound file. With this release, you can now register and apply permissions to your unstructured data using OkeraEnsemble and the Okera UI. Your unstructured data can be tagged, and its access can now be controlled using Okera's attribute-based access control (ABAC). For more information, see Register Unstructured Data URIs.
OkeraEnsemble System Token Generation Changes in Default Mode¶
With this release, the system token required by OkeraEnsemble in default (non-nScale) mode is not automatically generated from a private key.
When deployed in nScale mode, the OkeraEnsemble access proxy requires a JWT token to authenticate to the Okera cluster. This token can be provided using either the JWT_PRIVATE_KEY or SYSTEM_TOKEN configuration parameters. When the JWT_PRIVATE_KEY configuration parameter is specified, the OkeraEnsemble access proxy automatically generates its own JWT token with the provided private key. When the SYSTEM_TOKEN configuration parameter is specified, it defines the location of the system JWT token file. If both are specified, the JWT_PRIVATE_KEY takes precedence and is used, by default, to generate the required JWT token..
However, when OkeraEnsemble is deployed in default (non-nScale) mode on the Okera cluster, the access proxy now defaults to using the token defined by the SYSTEM_TOKEN configuration parameter. It no longer will generate the token from a private key.
Apache Ranger Migration Script (Preview Feature)¶
With this release, Okera provides a Python script you can use to extract Hive, Hadoop Distributed File System (HDFS), and Starburst Enterprise (Trino) policies from an Apache Ranger policy server. The script connects to Ranger, queries for the policies, and generates equivalent Okera DDL in a JSON file. The script can also automatically run the resulting Okera DDL against a running Okera cluster. For more information, see Apache Ranger Migration.
Tag Updates in the UI¶
This release introduces a new search bar on the Tags page in the UI. Use this search bar to locate a tag listed on the tags page. For more information, see Search for Tags.
UI Changes¶
-
To support OkeraEnsemble's use in the UI, a new Files tab has been added to the Databases page. From this tab you can register unstructured data (URIs or files), tag it, and apply access permissions to it. See Register Unstructured Data (URIs and Files).
-
The title of the Databases page changed to Data because both structured data (tables) and unstructure data (files and URIs) are now supported.
-
The names of the following buttons or options changed:
- The Create new connection button changed to Create connection.
- The
option changed to
when creating autotags.
- The Save button on the Create auto tagging rule dialog changed to Create.
- The Add button on the Create tag dialog changed to Create*.
-
The following dialog titles have changed:
- The Create new
connection dialog title changed to Createconnection . - The New automatic tagging rule dialog title changed to Create auto tagging rule.
- The Editing automatic tagging rule dialog title changed to Editing auto tagging rule.
- The Create new
Casing Changes for Table Name Lookups¶
With this release, Okera functionally ignores case for table name lookups and performs table lookups in a case-insensitive manner. For example, if you query user_Table_1
when the Okera crawler found user_TABLE_1
, Okera now generates a query against user_TABLE_1
rather than returning a "no object" response.
Warning
If you have multiple objects with the same name but distinguished by different casing, Okera provides no means to differentiate between them. For example, if TABLE_1
and table_1
are defined in the same database, a query against table_1
, tABLE_1
, TABLE_1
, or any other permutation maps to either TABLE_1
or table_1
but which table it maps to is a function of the original database scan and cannot be controlled. Further note that Okera may generate queries with quoted identifiers, so an input query that does not specify casing may generate a more specific query. For example, if Okera has table_1
defined (unquoted) and is presented with a query that reads select id from TabLe_1
, Okera may generate and issue the more specific query: select "id" from "table_1"
.
Athena Connection Changes¶
When creating or editing an Athena connection, the default source schema field is now optional.
API Updates¶
-
A new API endpoint
/api/v2/tags/{name}/tagging-rules
, with four methods (GET
,POST
,PUT
, andDELETE
) was introduced in this release. Use this endpoint to list, create, update, and delete tagging rules. -
A new API endpoint
/api/v2/uri/
, with three methods (GET
andPOST
) has been added in this release. Use this endpoint to list, fetch, and register an unstructured data URI.
For information about any Okera API endpoint, see the Okera API documentation, available after you log into the Web UI by appending /api/v2-docs/api/
after the web UI port number (8083). For example: https://my.okera.installation:8083/api/v2-docs/api/
.
Security Vulnerabilities (CVEs/CWEs) Addressed¶
- CVE-2020-16156 Ubuntu 18.04 - Perl (Improper Verification of Cryptographic Signature)
- CVE-2021-43618 Ubuntu 18.04 - gmp (Integer Overflow or Wraparound)
- CVE-2021-46848 Alpine Curl - Out-of-bounds Read
- CVE-2022-21589 Ubuntu 18.04 - MySQL Server Vulnerability
- CVE-2022-21592 Ubuntu 18.04 - MySQL Server Vulnerability
- CVE-2022-21608 Ubuntu 18.04 - MySQL Server Vulnerability
- CVE-2022-21617 Ubuntu 18.04 - MySQL Server Vulnerability
- CVE-2022-32221 Ubuntu 18.04 - Curl
- CVE-2022-39253 Ubuntu 18.04 - git (Link Following)
- CVE-2022-39260 Ubuntu 18.04 - git (Out-of-bounds Write)
- CVE-2022-42915 Alpine Curl - Double Free
- CVE-2022-42916 Alpine Curl - Cleartext Transmission of Sensitive Information
Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.
Bug Fixes and Improvements¶
- Fixed an issue where some user attributes were missing in the Okera UI after they were added using the DDL.
- Fixed the pagination of the list of roles on the Roles page.
-
Fixed a bug that occurred during local worker startup in a Google Cloud Platform Dataproc environment running in nScale mode.
-
Fixed a bug in Okera's
phi_zip3
transformations to correctly deidentify string zip codes to the first three digits. In addition, numeric zip codes that fall outside the range of 0 through 99999 now return a null value, rather than the original value. In other words,phi_zip3
transformations are only supported for five-digit numeric zip codes (however, if the zip codes are in string format, this limitation does not apply).
- Fixed an issue where users were required to have write privileges to view metadata.
- Fixed an issue where the property
okera.external.view
in Databricks environments did not always match the value of thecerebro.external.view
property.
- Fixed an issue in which a crawler failed even when
abort on error
was set to false.
- Fixed an out-of-resources error that occurred with the OkeraEnsemble access proxy.
- Fixed the
no module named ez_setup
errors you might have encountered when installing the OkeraEnsemble plugin.
- Corrected a bug in error handling during dataset renames.