Okera Version 2.11 Release Notes¶
This topic provides Release Notes for all 2.11 versions of Okera.
2.11.8 (1/18/2023)¶
Bug Fixes and Improvements¶
- Optimized the performance of Okera's
getPartitions()
API endpoint, resulting in lower latency and load on the catalog database.
- Fixed page errors that occurred when there were conflicts creating permissions.
2.11.7 (1/9/2023)¶
Security Vulnerabilities (CVEs/CWEs) Addressed¶
- CVE-2022-41946 Information Exposure
- CVE-2022-42898 Integer Overflow or Wraparound
- CVE-2022-45061 Resource Exhaustion
Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.
Bug Fixes and Improvements¶
- Fixed a bug where a crawler ignored the default schema specified for an Athena connection.
2.11.6 (12/9/2022)¶
Athena Connection Changes¶
When creating or editing an Athena connection, the default source schema field is now optional.
Security Vulnerabilities (CVEs/CWEs) Addressed¶
- Alpine-13661 Alpine314: Alpine-13661
- CVE-2018-25032 Alpine314: Out-of-bounds Write
- CVE-2021-46828 Alpine314: Allocation of Resources Without Limits or Throttling
- CVE-2022-0778 Alpine314: Loop with Unreachable Exit Condition ('Infinite Loop')
- CVE-2022-1097 Alpine314: OpenJDK
- CVE-2022-1271 Alpine314: Improper Input Validation
- CVE-2022-2309 Alpine314: NULL Pointer Dereference
- CVE-2022-3970 Numeric Errors
- CVE-2022-21540 Alpine315: OpenJDK
- CVE-2022-21541 Alpine315: OpenJDK
- CVE-2022-21549 Alpine315: OpenJDK
- CVE-2022-21619 Alpine315: OpenJDK
- CVE-2022-21624 Alpine315: OpenJDK
- CVE-2022-21626 Alpine315: OpenJDK
- CVE-2022-21628 Alpine315: OpenJDK
- CVE-2022-27404 Alpine314: Out-of-bounds Write
- CVE-2022-27405 Alpine314: Out-of-bounds Read
- CVE-2022-27406 Alpine314: Out-of-bounds Read
- CVE-2022-27774 Alpine314: Insufficiently Protected Credentials
- CVE-2022-27776 Alpine314: Insufficiently Protected Credentials
- CVE-2022-28391 Alpine314: BusyBox
- CVE-2022-29824 Alpine314: Integer Overflow or Wraparound
- CVE-2022-35252 Alpine314: Curl
- CVE-2022-39399 Alpine315: OpenJDK
- CVE-2022-40303 Alpine314: Integer Overflow or Wraparound
- CVE-2022-40304 Alpine314: XML External Entity (XXE) Injection
- CVE-2022-40674 Alpine314: Use After Free
- CVE-2022-42898 Alpine315: KRB5
- CVE-2022-43680 Alpine314: Use After Free
Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.
Bug Fixes and Improvements¶
- Fixed an issue where the property
okera.external.view
in Databricks environments did not always match the value of thecerebro.external.view
property.
2.11.5 (11/15/2022)¶
Security Vulnerabilities (CVEs/CWEs) Addressed¶
- CVE-2020-16156 Ubuntu 18.04 - Perl (Improper Verification of Cryptographic Signature)
- CVE-2021-43618 Ubuntu 18.04 - gmp (Integer Overflow or Wraparound)
- CVE-2021-46848 Alpine Curl - Out-of-bounds Read
- CVE-2022-21589 Ubuntu 18.04 - MySQL Server Vulnerability
- CVE-2022-21592 Ubuntu 18.04 - MySQL Server Vulnerability
- CVE-2022-21608 Ubuntu 18.04 - MySQL Server Vulnerability
- CVE-2022-21617 Ubuntu 18.04 - MySQL Server Vulnerability
- CVE-2022-32221 Ubuntu 18.04 - Curl
- CVE-2022-39253 Ubuntu 18.04 - git (Link Following)
- CVE-2022-39260 Ubuntu 18.04 - git (Out-of-bounds Write)
- CVE-2022-42915 Alpine Curl - Double Free
- CVE-2022-42916 Alpine Curl - Cleartext Transmission of Sensitive Information
Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.
Bug Fixes and Improvements¶
- Fixed an issue where users were required to have write privileges to view metadata.
2.11.4 (10/23/2022)¶
Bug Fixes and Improvements¶
- Updated the Okera Spark3 connector to address an incompatibility resulting from a Databricks library change in Okera-supported Databricks runtime versions 9.1 LTS and later.
2.11.3 (10/20/2022)¶
Support for Referencing Amazon S3 Objects in Okera Configuration Parameters in nScale Amazon EMR Deployments¶
With this release, you can now reference objects stored in Amazon S3 as Okera configuration parameters for odas-emr-bootstrap
. Okera pulls the objects referenced in the configuration parameters and mounts them in the nScale container, making the Amazon S3 paths available to Okera for processing. For example, this is helpful when configuring the SSL certificate and key required to start the OkeraEnsemble Amazon EMR access proxy in TLS/SSL mode:
--external-objects-to-container SSL_CERTIFICATE_FILE=s3://bucket/certificate-object, SSL_KEY_FILE=s3://bucket/key-object
If SSL_CERTIFICATE_FILE
specifies the path to the SSL certificates file in Amazon S3 and SSL_KEY_FILE
specifies the path to the SSL key file in Amazon S3, these paths can be used by the OkeraEnsemble access proxy for any necessary TLS/SSL processing.
Security Vulnerabilities (CVEs/CWEs) Addressed¶
- CVE-2021-4209 Null Pointer Dereference
- CVE-2022-2509 Double Free
- CVE-2022-2526 Use After Free
- CVE-2022-36944 Remote Code Execute (RCE)
- CVE-2022-37434 Out-of-Bounds Write
- CVE-2022-40664 Improper Authentication
- CVE-2022-41828 Use of Function with Inconsistent Implementations
- CVE-2022-42003 Deserialization of Untrusted Data
- CVE-2022-42004 Deserialization of Untrusted Data
Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.
Bug Fixes¶
The following bugs were fixed in this release:
- Fixed an issue where some sensitive environment variables were not always redacted in logs.
- The default for the Databricks environment variable
OKERA_PREFER_ACTIVE_SESSION_FOR_ID
is nowtrue
, so the workaround described in Problem 2: Partitioned Temporary Tables Generated Cannot Be Accessed is no longer necessary.
- You no longer need to specify the transport protocol (
http://
orhttps://
) in the bootstrap script option--rest-server-hostports
ofodas-emr-bootstrap
.
2.11.2 (9/23/2022)¶
Blocking Access to the Okera UI for Tablets¶
This release introduces the ability to block access to the Okera UI on tablets. The configuration parameter, BLOCK_WEB_UI_FOR_MOBILE_CLIENTS
, controls this behavior. Valid values for this parameter are true
(the UI is blocked on mobile devices and tablets) and false
(the UI is not blocked on mobile devices and tablets). The default is false
.
Updated Input to LDAP Filtering¶
In this release, we have updated Okera's input for LDAP filtering. Two new configuration parameters have been defined to allow you to specify separate base DNs (distinguished names) for users and groups for LDAP server searches during its authentication processing.
- Use
GROUP_RESOLVER_LDAP_USER_BASE_DN
to specify the base DN for users. - Use
GROUP_RESOLVER_LDAP_GROUP_BASE_DN
to specify the base DN for groups.
See Okera Configuration Parameter Reference for a complete list of the configuration parameters available to you for Okera configurations.
OkeraEnsemble nScale Amazon S3 Bucket Access Enhancements¶
This release enhances OkeraEnsemble nScale mode deployment in Amazon EMR Spark environments by supporting access control of both Amazon S3 buckets defined for Amazon's assume secondary role feature, and optionally those buckets to which the Okera cluster is granted access using its IAM permission.
Since nScale deploys the OkeraEnsemble access proxy with least-privilege access to Amazon EMR, it has no IAM permissions naturally and retrieves its credentials to sign Amazon S3 requests from the Okera Policy Engine (Planner). Consequently, when you deploy OkeraEnsemble in nScale mode, you must provide access to the Amazon S3 buckets using either of two methods:
-
Using Amazon S3's assume secondary role feature. For Amazon S3 buckets that use assume secondary roles (bucket role map), the OkeraEnsemble access proxy retrieves the AWS Security Token Service (STS) credentials associated with the Amazon Resource Name (ARN) for the Amazon S3 bucket.
-
By setting a new
OKERA_SYSTEM_IAM_ROLE_ARN
configuration parameter in the Okera configuration file to the IAM Amazon Resource Name (ARN) associated with the Okera cluster. When this is activated, Okera can grant OkeraEnsemble nScale users access to buckets to which the Okera cluster has access by permission through its IAM role.
For more information about OkeraEnsemble nScale mode deployment in Amazon EMR environments, see OkeraEnsemble nScale Mode Deployment in Amazon EMR Environments.
OkeraEnsemble nScale System Token Duration Controls¶
You can now specify the duration, in minutes, of the JWT system token for OkeraEnsemble nScale processing. A new environment variable, SYSTEM_TOKEN_DURATION_MIN
, can be set on the nScale container using the Okera Amazon EMR odas-emr-bootstrap
script to configure the duration of the Okera system token. For example, passing the following arguments with the odas-emr-bootstrap.sh
script will configure the system token duration time to 300 minutes. Valid values are positive integers. The default value is equivalent to one day (1440 minutes).
--local-worker-env-vars "-e SYSTEM_TOKEN_DURATION_MIN=300"
This configuration setting only works when the nScale proxy is configured using JWT_PRIVATE_KEY
and not with SYSTEM_TOKEN
. When configured using JWT_PRIVATE_KEY
, the nScale access proxy generates its own token and the SYSTEM_TOKEN_DURATION_MIN
setting determines how long that token is good for. When configured with SYSTEM_TOKEN
, the SYSTEM_TOKEN_DURATION_MIN
setting has no effect because the JWT token identified by the SYSTEM_TOKEN
path includes an embedded expiration time that cannot be governed by SYSTEM_TOKEN_DURATION_MIN
setting. If both JWT_PRIVATE_KEY
and SYSTEM_TOKEN
are specified, the JWT_PRIVATE_KEY
is used and the SYSTEM_TOKEN
is ignored.
Security Vulnerabilities (CVEs) Addressed¶
- CVE-2017-7525 Incomplete Blacklist
- CVE-2017-15095 Deserialization of Untrusted Data
- CVE-2017-17485 Deserialization of Untrusted Data
- CVE-2018-5968 Deserialization of Untrusted Data
- CVE-2018-7489 Incomplete Blacklist
- CVE-2018-11307 Deserialization of Untrusted Data
- CVE-2018-12022 Deserialization of Untrusted Data
- CVE-2018-12023 Deserialization of Untrusted Data
- CVE-2018-14718 Deserialization of Untrusted Data
- CVE-2018-14719 Deserialization of Untrusted Data
- CVE-2018-14720 XML External Entity (XXE) Injection
- CVE-2018-14721 Server-Side Request Forgery (SSRF)
- CVE-2018-19360 Deserialization of Untrusted Data
- CVE-2018-19361 Deserialization of Untrusted Data
- CVE-2018-19362 Deserialization of Untrusted Data
- CVE-2019-12086 Deserialization of Untrusted Data
- CVE-2019-12384 Deserialization of Untrusted Data
- CVE-2019-12384 Deserialization of Untrusted Data
- CVE-2019-12814 Deserialization of Untrusted Data
- CVE-2019-14379 Improperly Controlled Modification of Dynamically-Determined Object Attributes
- CVE-2019-14439 Deserialization of Untrusted Data
- CVE-2019-14540 Deserialization of Untrusted Data
- CVE-2019-14892 Deserialization of Untrusted Data
- CVE-2019-14893 Deserialization of Untrusted Data
- CVE-2019-16335 Deserialization of Untrusted Data
- CVE-2019-16942 Deserialization of Untrusted Data
- CVE-2019-16943 Deserialization of Untrusted Data
- CVE-2019-17267 Deserialization of Untrusted Data
- CVE-2019-17531 Deserialization of Untrusted Data
- CVE-2019-20330 Deserialization of Untrusted Data
- CVE-2020-8840 Deserialization of Untrusted Data
- CVE-2020-9546 Deserialization of Untrusted Data
- CVE-2020-9547 Deserialization of Untrusted Data
- CVE-2020-9548 Deserialization of Untrusted Data
- CVE-2020-10650 Deserialization of Untrusted Data
- CVE-2020-10672 CVE-2020-10672
- CVE-2020-10673 CVE-2020-10673
- CVE-2020-10968 Deserialization of Untrusted Data
- CVE-2020-10969 Deserialization of Untrusted Data
- CVE-2020-11111 Deserialization of Untrusted Data
- CVE-2020-11112 Deserialization of Untrusted Data
- CVE-2020-11113 Deserialization of Untrusted Data
- CVE-2020-11619 Deserialization of Untrusted Data
- CVE-2020-11620 Deserialization of Untrusted Data
- CVE-2020-14060 Deserialization of Untrusted Data
- CVE-2020-14061 Deserialization of Untrusted Data
- CVE-2020-14062 Deserialization of Untrusted Data
- CVE-2020-14195 Deserialization of Untrusted Data
- CVE-2020-17523 Authentication Bypass
- CVE-2020-24616 Deserialization of Untrusted Data
- CVE-2020-24750 Deserialization of Untrusted Data
- CVE-2020-25649 XML External Entity (XXE) Injection
- CVE-2020-35490 Deserialization of Untrusted Data
- CVE-2020-35491 Deserialization of Untrusted Data
- CVE-2020-35728 Deserialization of Untrusted Data
- CVE-2020-36179 Deserialization of Untrusted Data
- CVE-2020-36180 Deserialization of Untrusted Data
- CVE-2020-36181 Deserialization of Untrusted Data
- CVE-2020-36182 Deserialization of Untrusted Data
- CVE-2020-36183 Deserialization of Untrusted Data
- CVE-2020-36184 Deserialization of Untrusted Data
- CVE-2020-36185 Deserialization of Untrusted Data
- CVE-2020-36186 Deserialization of Untrusted Data
- CVE-2020-36187 Deserialization of Untrusted Data
- CVE-2020-36188 Deserialization of Untrusted Data
- CVE-2020-36189 Deserialization of Untrusted Data
- CVE-2020-36518 Out-of-bounds Write
- CVE-2021-20190 Deserialization of Untrusted Data
- CVE-2022-21434 Oracle Java SE Vulnerability
- CVE-2022-34169 Incorrect Conversion between Numeric Types
- CVE-2022-37434 Out-of-bounds Write
Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.
Bug Fixes¶
The following bugs were fixed in this release:
- Privacy functions
encrypt
andaes_encrypt
can now only be used by Okera admins.
- The fix in Okera version 2.11.0 that returned string format instead of tabular format in the output of
show create table
has been reverted.
-
Upgraded Okera's version of Jackson to 2.13.3.
-
Upgraded Okera's base Alpine version to 3.15.6.
-
The policy synchronization enforcement mechanism used for Snowflake connections is now no longer enabled by default. You must enable it manually using the
POLICY_SYNC_SCHEDULER_ENABLED
configuration parameter or theokera.policy_sync.enabled
advanced parameter in your Snowflake connection.
-
Upgraded Okera's version of Apache Shiro to 1.9.1.
-
Upgraded Okera's version of OpenJDK to 8.345.01-r0.
-
Fixed an error that occurred when querying Athena tables using JDBC pushdown processing. The error received was:
An error has been thrown from the AWS Athena client. 1 validation error detected: Value '' at 'queryExecutionContext.catalog' failed to satisfy constraint: Member must have length greater than or equal to 1 [Execution ID not available].
2.11.1¶
Okera Version 2.11.1 was never distributed. Its updates were rolled into Okera 2.11.2.
2.11.0 (8/11/2022)¶
Snowflake Policy Synchronization Changes¶
This release introduces support for multiple changes to Okera's Snowflake policy synchronization enforcement.
- All the access levels that are supported in both Okera and Snowflake are now supported. In past releases, only
SELECT
access was supported. This release extends Okera support forALL
,INSERT
,DELETE
, andUPDATE
access as well. For more information about Snowflake policy synchronization, see Policy Synchronization Enforcement Overview.
-
The Snowflake connection dialog in the UI has been updated in this release. Users are now required to choose one of the following user options for policy synchronization when they set up a Snowflake connection in the UI:
-
They can select a checkbox indicating that synchronization should occur for all users.
-
They can specify a comma-separated list of users or a tag in a provided entry box.
You should no longer specify the
okera.policy_sync.user_allowed_list
advanced connection property in the Advanced properties box in the UI dialog. The list is now managed by the new checkbox and entry box. However, you can continue to use the property when setting up a Snowflake connection using the API. -
-
The connection details for Snowflake connections (Connection Details tab for a connection) now more closely matches the details provided for other connections.
-
Instructions and a sample script are now provided for creating a tag in Snowflake for Okera policy synchronization and applying it to your Snowflake user definitions. See Tag Users in Snowflake.
For complete information about Snowflake policy synchronization see Policy Synchronization Enforcement Overview. For information about setting up a Snowflake connection, see Create a Snowflake Connection.
Tag Restrictions¶
This release introduces restrictions for tagging.
-
Users who do not have permissions to create tags can no longer see the
button on the Tags page in the UI.
-
Users who do not have permissions to create tag namespaces can no longer create them on the Create new tag dialog.
-
Users who do not have permissions to remove tags can no longer see the
option on the Tags page in the UI.
-
When creating or removing a tag, users can only select the namespaces for which they have privileges.
See Managing Tags for more information about tags.
Deleting Databases From the UI¶
This release introduces the ability to delete an Okera database in the UI. For more information, see Delete a Database.
Workspace and Preview Changes in the UI¶
This release introduces the following changes to the Workspace page and to the dataset preview pages available for datasets registered to a database and for the dataset details of a crawler on the Registration page.
- The Workspace page, when accessed from a dataset details page, now defaults to using the Presto API.
- The dataset previews now default to using the Presto API for the preview queries.
In past releases, the Okera API was used.
Blocking Access to the Okera UI for Mobile Devices¶
This release introduces the ability to block access to the Okera UI on mobile devices. A new configuration parameter, BLOCK_WEB_UI_FOR_MOBILE_CLIENTS
, has been introduced to control this behavior. Valid values for this parameter are true
(the UI is blocked on mobile devices) and false
(the UI is not blocked on mobile devices). The default is false
.
OkeraEnsemble Amazon EMR nScale Mode Deployment ¶
With this release, you can elect to deploy the OkeraEnsemble access proxy in nScale mode in Amazon EMR environments, so the OkeraEnsemble workload is distributed across your cluster nodes and scales up and down with your clusters. To do this, the OkeraEnsemble access proxy retrieves AWS credentials from the Okera Policy Engine (planner). To communicate with the Okera cluster, the access proxy generates its own system token if it is configured with the JWT private key used by the Okera cluster (via the JWT_PRIVATE_KEY
configuration property). This is done for you if you use the odas-emr-bootstrap.sh
script with the --install-jwt-key
argument (specifying the Amazon S3 path to the key).
For more information, see OkeraEnsemble nScale Mode Deployment in Amazon EMR Environments.
Databricks 10 and 11 Support ¶
This release introduces support for Databricks 10.0 through 10.5 and 11.0. In past versions, Okera only supported versions 8.3, 8.4, 9.0 and 9.1. For more information, see Databricks Integration Steps.
Note: With this release, Okera drops support for Databricks 7.3.
Sample GCP Group Resolution Script¶
This release introduces a sample script to resolve groups in Okera when using Google Cloud Platform (GCP). The script requires that the following configuration parameters be specified in the Okera configuration file.
-
Parameter
GROUP_RESOLUTION_GOOGLE_APPLICATION_CREDENTIALS
must provide the fully qualified path to a credentials JSON file for a GCP service account with appropriate admin privileges. The path can be a container path (the JSON file is mounted to the container by Kubernetes), an Amazon S3 (s3:
) path, or an ADLS (adl:
) path. -
Parameter
GSUITE_GROUP_ADMIN_EMAIL
must provide the email of a GCP user with appropriate administrative privileges. -
Parameter
GROUP_RESOLVER_SCRIPTS
must specify the fully qualified path/opt/scripts/resolve_groups_gcp_example.py
.
When all configuration parameters are specified correctly, GCP group resolution is performed for Okera. See Sample GCP Group Resolution Script.
Configuring Parquet File Resolution Types¶
Table property parquet.resolve-by.type
can now be used to configure how a Parquet data file is resolved. Valid values are ordinal
(positional resolution) and name
(name resolution). In past releases, resolution was configured globally and by default, resolved by name.
For example:
ALTER TABLE nation SET TBLPROPERTIES('parquet.resolve-by.type'='name')
ALTER TABLE nation SET TBLPROPERTIES('parquet.resolve-by.type'='ordinal')
AWS Athena Upgrade and Performance Improvements¶
This release upgrades Okera to use the AWS Athena 2.0.30 JDBC driver. With this upgrade, the Athena JDBC JAR file is no longer provided by Okera in the Maven repository, so you must download it from https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html. Okera does not require a JDBC driver with the AWS SDK, so download the one without the AWS SDK. In addition, Okera connections to Athena also now require specification of the path to the JDBC JAR file and its class name, specified in the driver.jar.path
and driver.class.name
properties in the connection. If you are creating an Athena connection in the Okera UI, these properties can be specified in the Driver file path and Driver class name fields. See Athena Data Source Connections.
Starting with Athena 2.0.5, the Athena JDBC connector uses the result set streaming API to improve its performance when fetching query results. To use this new Athena feature:
-
Include and allow the
athena:GetQueryResultsStream
action in your IAM policy statement. For details on managing Athena IAM policies, see https://docs.aws.amazon.com/athena/latest/ug/security-iam-athena.html. -
If you are connecting to Athena through a proxy server, make sure that the proxy server does not block port 444. The result set streaming API uses port 444 on the Athena server for outbound communications.
Glue Enhancements¶
The following changes were made in this release to Okera's integration with AWS Glue:
-
A new configuration parameter,
OKERA_GLUE_SILENCE_TBL_PAGINATOR_500
has been introduced. This parameter enables and disables Okera's silencing of unknown Glue errors that can affect the dataset counts on the Data page in the Okera UI. Valid values aretrue
(silence the errors) andfalse
(don't silence the errors). The default isfalse
. You only need to set this configuration parameter if you receive anInternalServiceException 500
from Glue, while trying to open the Data page in the Okera UI. -
Additional log messages have been added to improve any debugging that might be needed in Glue environments.
For more information about Okera's integration with AWS Glue, see Using Glue as a Third-Party Metadata Catalog.
SSL/TLS Enabled for the Okera Catalog¶
This release introduces configurable SSL and TLS support for Okera MySQL catalog databases and enhanced SSL support for Okera Postgres catalog databases. In past releases, Okera only provided very basic, non-configurable SSL support for Postgres catalogs and did not support TLS for either MySQL or Postgres catalogs. You can enable this enhanced configurable encryption by setting the CATALOG_DB_SSL
parameter to "true"
(the default is "false"
) in the Okera configuration file as well as setting the following new Okera parameters (encoded in base64) as described below:
CATALOG_DB_SERVER_CERT
: Specifies the SSL/TLS certificate for the MySQL or Postgres catalog database server.CATALOG_DB_CLIENT_CERT
: Specifies the TLS certificate for the MySQL catalog database client. This parameter is only needed for TLS support.CATALOG_DB_CLIENT_CERT_KEY
: Specifies the private key for the MySQL catalog client TLS certificate. This parameter is only needed for TLS support.
Okera can determine which protocol (SSL or TLS) to use based on the certificates provided.
Notes: This change only impacts Okera connections to its MySQL or Postgres catalog and does not establish SSL/TLS configurable support throughout the Okera cluster.Okera only supports TLS for MySQL catalogs at this time. It does not support Cloud SQL Auth proxy functionality.
For more information see Configure SSL/TLS for Okera Metadata Storage
Active/Active In-Parallel Policy Loading¶
This release introduces the ability to load Okera policies in parallel when active/active environments start up. This speeds up service start time, particularly for slower RDBMS environments or environments in which many roles must be loaded. Okera uses two thread pools to perform active/active in-parallel policy loading, one for the initial load and one that occurs in the background. The default number of roles loaded in parallel for an initial load is 12
; the default number of roles loaded in the background is 2
. To control these settings, two new configuration parameters have been introduced:
SENTRY_INITIAL_LOAD_THREADS
can be used to override the initial in-parallel load default of12
roles. Specify the number of roles that should be loaded in parallel when an active/active environment is initially started.SENTRY_BACKGROUND_LOAD_THREADS
can be used to override the background in-parallel load default of2
roles. Specify the number of roles that should be loaded in parallel in the background of an active/active environment.
For more information about active/active environments, see Active/Active Deployment in Aurora RDS. For more information about Okera configuration parameters, see Configuration and Okera Configuration Parameter Reference.
Restricting Use of Privacy Functions¶
This release introduces the ability to restrict use of Okera's privacy functions and user-defined functions (UDFs) to Okera administrators only. To activate this feature, add the RESTRICTED_UDFS
configuration parameter to your Okera configuration file. Valid values for this parameter are a comma-separated list of function names. Use of any functions listed in the parameter require administrator privileges. In the following example, the aes_decrypt
and nfp_ref_tokenize
privacy functions can only be used by administrators.
RESTRICTED_UDFS: aes_decrypt,nfp_ref_tokenize
For complete information about the privacy functions supported by Okera, see Privacy and Security Functions.
Dropping Attributes From Nested Fields¶
This release introduces the ability to drop attributes from nested fields. See Nested Field Tags.
Transformation Priority Defaults¶
This release introduces defaults for transformation priorities, when more than one transformation is applied to a single column. The default priority is:
null
zero
sha2
hash
fnv_hash
aes_decrypt
aes_encrypt
tokenize
fp_ref_tokenize
nfp_ref_tokenize
mask
mask_ccn
diff_privacy
phi_age
phi_date
phi_dob
phi_zip3
fp_random
nfp_random
random_ccn
The higher the priority (the later transformations in this list) override the earlier transformations with lower priority (for example, mask_ccn
overrides zero
). However, you can specify your own prioritization. See Prioritization of Transformations.
Native Delta Lake Table Support (Preview Feature)¶
This release introduces manifest-less native support for files in Delta Lake tables. This is introduced as an Okera preview feature. Previously, Okera only read Delta Lake tables if a manifest was explicitly created. With native support, this is no longer needed. Okera recommends switching to (manifest-less) native support, if you currently use the manifest method.
Native support is disabled by default, but can be enabled for individual Delta Lake tables or databases by specifying okera.delta.native-support=true
as a table or database property. You can also enable it for the entire Okera cluster using the new DELTA_TABLE_NATIVE_SUPPORT
configuration parameter in the Okera configuration file. Valid values for these properties are true
(use native support, not manifest support) and false
(use manifest support, not native support). The default is currently false
for the cluster, but will be changed to true
in a future release.
Note: Okera currently only supports querying the latest snapshot of a Delta Lake table.
For more information about Delta Lake file support, see Databricks Delta Lake Table Support.
Notable Changes¶
- This release drops Okera's support for Kubernetes v1beta1. Okera now only supports Kubernetes v1. The v1beta1 version is deprecated and should no longer be used. This change affects the
okctl
and Helm charts used by your Okera clusters. The change was necessary because without it, Okera cannot support newer Kubernetes clusters. However because of this change, Okera cannot support clusters older than 2017. If your cluster uses Kubernetes v1beta1 or is older than 2017, please upgrade your Okera environment or contact Okera for assistance.
-
The
authorize-query
REST server API now requires thatauthorize_for
be set to your user name when you submit an Okera query unless you are an Okera admin. Okera admins do not need to specifyauthorize_for
. See Okera Policy Engine Integration. -
This release drops support for Databricks 7.3.
-
Cloudera CDH is no longer supported in Okera.
-
Amazon Web Services EMR versions lower than version 5.24 are no longer supported in Okera.
Security Vulnerabilities (CVEs) Addressed¶
- CVE-2020-28483 HTTP Response Splitting
- CVE-2022-2097 Inadequate Encryption Strength
- CVE-2022-22576 Improper Authentication
- CVE-2022-22747 NSS Issue
- CVE-2022-23437 XML Injection
- CVE-2022-24765 Uncontrolled Search Path Element
- CVE-2022-25647 Deserialization of Untrusted Data
- CVE-2022-27775 Curl
- CVE-2022-27781 Loop with Unreachable Exit Condition ('Infinite Loop')
- CVE-2022-27782 Improper Certificate Validation
- CVE-2022-29155 Ubuntu USN-5424-1 OpenLDAP Vulnerability
- CVE-2022-29187 Uncontrolled Search Path Element
- CVE-2022-29361 HTTP Request Smuggling
- CVE-2022-29458 Out-of-Bounds Read
- CVE-2022-31197 SQL Injection
- CVE-2022-32205 Allocation of Resources Without Limits or Throttling
- CVE-2022-32206 Allocation of Resources Without Limits or Throttling
- CVE-2022-32207 Incorrect Default Permissions
- CVE-2022-32208 Out-of-Bounds Write
- CVE-2022-34480 NSS Issue
- CVE-2022-34903 Arbitrary Code Injection
Okera uses Snyk and GitHub Advanced Security for security vulnerability scanning.
Bug Fixes¶
-
Okera's base Ubuntu image has been upgraded to bionic-20220801.
-
Audit log entries are now added for all
CREATE_AS_OWNER
implied operations.
- Upgraded the base Alpine image used by Okera to 3.15.5.
- Fixed an issue in which Okera failed to start when using an Azure database for Postgres.
- Tooltips in the UI now have an updated, more legible, look.
-
Resolved a problem in which the dataset previews for a BigQuery table failed because the row limit was not applied on queries, and consequently produced very large result sets.
-
Fixed the CSP headers for the REST API documentation.
- Cloned tables with applied policies in Snowflake no longer break policy synchronization when the original table used for the clone is removed.
- Upgraded Scala in recordservice-spark-2.0.jar to version 2.11.12.
- Fixed an issue in which database passwords with special characters were not properly encoded when establishing a connection.
- Corrected a problem with string data when using Avro complex data types for certain unnesting queries.
- Users who do not have permissions to create Okera databases now receive an authorization error if they attempt to create a database that already exists.
- Fixed an issue in which the Presto configuration was written with mismatched closing tags.
- The Snowflake connection synchronization details tab now provides a dropdown link you can select to show the last error message stored during connection synchronization.
-
Fixed a bug in which users without the correct privileges (for example, granted only SELECT privileges) could add groups to a role using the REST API. This is no longer possible without the correct privileges.
-
Snowflake connection details have been reordered and headings have changed in the Web UI.
- Fixed a bug in which users without the correct permissions were able to update dataset and dataset column descriptions directly using the REST API.
- Snowflake pushdown processing now supports queries using column tags (tag-based row filtering).
- A
Content-Security-Policy
header is now applied to the REST server resources used by the Okera UI.
-
Fixed a bug in which a crawler's details could not be viewed the first time after deleting another crawler.
-
Upgraded to the latest version (2.1.0.7) of the Redshift JDBC driver.
-
Improved the performance of Okera metadata queries when calling Databricks with JDBC.
-
Improved Okera performance when evaluating ABAC policies.
-
The Google Cloud CLI (
gcloud
) is now removed from the Okera core image.
-
Improved performance when authorizing datasets from Databricks.
-
Improved Okera performance when authorizing table access from Databricks.
- Fixed a layout issue with a dataset's Yes this dataset dialog, where the copyable text extended beyond its containing dialog.
-
Improved performance when using policies with transform clauses.
-
Improved Okera performance when loading table metadata.
-
Improved the performance of AuthorizeQuery RPC calls.