1.0.0 onwards

Release notes for 1.0.0 onwards.

0.9.0 (april 2018)

The 0.9.0 introduces two major new features but no significant alterations to existing features or functionality.

Support for struct types

ODAS and the client libraries now support struct as a data type. This allows the engine to handle nested data. For details, see the docs.

Support for scanning last partition of a table

ODAS’s query language has been extended to easily support just reading the last partition of a partitioned table. This is a common pattern in multiple use cases. See supported sql for more details.

Incompatible changes

The REST API now by default does not format the JSON result (doesn’t intent it and put new lines). Clients often consume this via a tool and the formatting is an unnecessary cost. Users can pass the ?format option to enable server side formatting.

For example:

> curl localhost:11050/scan/okera_sample.sample
[{"record":"This is a sample test file."},{"record":"It should consist of two lines."}]
> curl localhost:11050/scan/okera_sample.sample?format
[
    {
        "record": "This is a sample test file."
    },
    {
        "record": "It should consist of two lines."
    }
]

0.8.1 (march 2018)

0.8.1 is a minor release which addresses a few specific issues in 0.8.0 as well as updates to the pyokera python client library.

PyCerebro

The initial release in 0.8.0 was an opportunity to get feedback on the client API. (Thanks everyone that did!). We’ve updated the API in response. The APIs in this version are expected to be compatible going forward. Note that this does require the server to also be updated to 0.8.1. Significant changes:

  • Removed or renamed mentions to ‘planner’ and made them ‘connection’. We expect typical end users to not need to know internal services.
  • Moved some APIs into the connection object instead of module-wide APIs
  • Removed explicit need to differentiate dataset vs sql arguments in the scan APIs.
  • Fixed issues with null handling in some cases.
  • Renamed create_context() to context()

For more details, see the docs.

Users can upgrade to this by upgrading to the latest package with pip.

Bug Fixes

  • Fix bug in configuring LDAP domain and LDAP base DN. Configuring both lead to the base DN config always being used instead of the domain configuration. Starting in this release, users wanting to set a default domain should not set LDAP_BASE_DN.

  • Fix server crash when executing some unsupported DDL. This only affected these DDL statements: CREATE TABLE AS SELECT and COMPUTE STATS. These now fail with not supported.

  • Allow revoking permissions to s3 bucket URI without a trailing slash. In general we do not treat bucket URIs without a trailing slash as valid URI. For example, s3://bucket is invalid and should be s3://bucket/. Note that this is specific for how buckets behave; subdirectories do not need to end in a slash. This has always been the behavior in ODAS but other tools or existing metadata may have allowed invalid URIs to be persisted. In this release we allow revoking both kinds of URIs so that the invalid entries can be corrected.

0.8.0 (Feb 2018)

0.8.0 is a major release. It includes all fixes from the prior releases.

New Features

Datasets with errors in Web UI

The datasets list page now shows datasets that have errors associated with them, possibly due to misconfiguration or incorrect definition.

New layout for datasets list page in Web UI

The datasets list page in the Web UI has been updated to allow for faster browsing of datasets, with an in-page view of the dataset’s details.

Native python client (beta)

Add a beta release of a native python client, the pycerebro library. The goal is to provide a native python experience with the performance and functionality similar to what is possible through the java client.

In this initial release, we primarily want to ensure the library is easy to install, and supports the required authentication mechanisms in a variety of environments. It has support for executing DDL and scan statements against the ODAS servers. In the next release, we will optimize the scan APIs further.

For more details, see the docs.

Improved support for partitioned tables when using them from EMR

In previous releases, Okera would return the tables as unpartitioned tables to hive. While this allowed the tables to be scanned and partition pruning worked correctly (via the more general predicate pushdown mechanism), it meant that commands such as show partitions did not work. In this release, we more faithfully return the partitioning information to Hive, and the Okera partitioned tables behave much more similar to standard hive partitioned tables.

Remove cases where ODAS may need write permissions to underlying storage system

In typical usage patterns, ODAS only needs read access to the underlying storage system. For example, it should not be necessary to give ODAS write access to the S3 buckets where the data is stored, just read access. While ODAS has never written or modified data files it manages, there were specific cases where ODAS would try to create the directory structure. For example, creating a table over a non-existent path could result in ODAS trying to create that path. This could result in some DDL commands failing if ODAS only had read access. In this release, removed those cases and these DDL commands should complete successfully.

Support for CREATE TABLE AS SELECT (CTAS) when run from Hive in EMR

This is supported if the Hive warehouse has been configured to use S3, instead of the EMR-local HDFS cluster. See the cluster-local DBs section of the EMR docs for more details.

Support for ALTER TABLE RECOVER PARTITIONS

This command behaves identical to the HiveQL command to reconstruct partitions automatically from the file system structure. This must be run from an Okera native API, such as the REST API or via odb. It is not possible to run this from hive emr.

Support for user defined functions (UDFs) registered in ODAS to be available to EMR

UDFs that have been registered in ODAS are now accessible to Hive. Previously these UDFs would have to be registered once in ODAS and then once again for each EMR cluster. For more details, see this.

Support for specifying an existing Hive Metastore RDBMS database

Users with an existing Hive Metastore can not configure the ODAS catalog to just use the same underlying (i.e. RDS) database. This is useful to migrate to the ODAS catalog or to bootstrap the ODAS catalog.See the advanced install docs for more details.

Spark predicate pushdown improvements

Predicate pushdown has been enhanced to include the following:

  • is null
  • is not null
  • LIKE predicates (e.g. name like '%ber%')

Java 9 is now supported

Planner APIs now support SELECT without FROM clause

This supports for example, select UDF('test-value'). These type of queries are typically just to verify connectivity or the behavior of builtins and UDFs.

Support for token auto-renewal

The Hive and Spark ODAS client libraries have been updated to call a user-defined token acquisition script to refresh an expired token. Configuration details and script requirements can be found in the EMR documentation.

Performance improvements DDL statements on highly partitioned table

Significant performance for alter table recover partitions and show partitions on a highly partitioned table.

Bug Fixes

  • Fixed an issue where DbCli, the REST server, and the Web UI were reporting field values as 0 when they were in fact null.

  • Fixed an issue where in some cases users were unable to see databases if the user only had partial (i.e. a subset of columns) to all objects in the database. Databases are now visible to user if they have any access to any objects in it.

  • Fixed an issue in the Web UI where if the user logged in via OAuth, the UI showed an Okera Token instead of an SSO Token.

Incompatible and Breaking Changes

  • The REST server, DbCLI and the WebUI now correctly report null field values which were previously incorrectly reported as 0. This may cause incompatibilities for users reading directly from the REST API that depended on the previous behavior.

  • OKERA_DB_NAME has been deprecated and replaced by OKERA_DM_DB_NAME. The Deployment Manager will fail to start if both env variables are set with conflicting values. While we maintain compatibility of this flag in this release, the old setting (‘OKERA_DB_NAME’) will be removed in a future release. Custom scripts referencing OKERA_DB_NAME will need to be updated eventually.

  • In 0.8.0 and beyond the Okera web UI no longer supports IE 11.

  • Deployments using unix-based group resolution now treat group names as case sensitive. For example, granting roles to GROUP is no longer the same as granting to group. Note that this is specific to deployments configured to get user group information from the host unix VM.

  • OKERA_OAUTH_SUB_ENDPOINT is no longer required nor supported. It can be safely removed from any configuration locations.

  • For EMR clusters, we have made internal changes to how tokens are managed. This requires that a given user’s home directory has the appropriate permissions set prior to a token being written to it. To support this, we have added another bootstrap script that should be run after the existing bootstrap script. See the EMR docs documentation for details.

Known issues

  • Tableau WDC connector does not support custom signed SSL certificates. If SSL is enabled and the certificate is self signed, the Tableau connector will fail with an SSL handshake error. The issue is that Tableau is unable to find the self signed certificate in all cases. Potential workarounds are to use a certificate signed by a CA or to connect to ODAS through a JDBC enabled framework, such as presto.

0.7.3 (feb 2018)

0.7.3 is a patch release. It contains a critical presto client issue and we recommend all presto users upgrade. Versions prior to 0.7.3 are no longer supported for EMR Presto. Note that it is perfectly fine to run a 0.7.3 client (e.g. EMR) against a 0.7.2 ODAS cluster.

Bug Fixes

  • Fixed session leak in presto client. In some cases, sessions started by the presto client may not get properly closed. This can cause issues as those connections will not close for a very long time, which can starve out other clients.

  • Fixed server crash trying to load unsupported schemas in some cases. In prior versions, trying to load avro schemas when the avro schema file is deleted from storage after creating the table, the server can crash. This has been fixed.

  • Fixed some issues related to HTTP proxy settings. We now properly handle proxy settings that block access to S3. Previously, the Deployment Manager agent running on all the ODAS cluster VMs would not use the proxy settings in all cases. This issue manifests if access to the storage system needs to happen through an HTTP proxy.

0.7.2 (jan 2018)

0.7.2 includes stability fixes. We recommend all 0.7.0/0.7.1 users upgrade.

Bug Fixes

  • Improved REST server scalability in the presence of long running scan requests. Previously, long running scans could block out other requests, including the service health check. This causes Deployment Manager to report the service as unhealthy. This has been fixed so there are separate request handlers for long requests.

  • Fix to planner refusing all connections due to race conditions when logging. Specific request patterns to ODAS can get the planner in a state where it refuses all subsequent connections. This causes the services to report as 5/7 healthy as well as the REST server going into a restart loop. This is caused by race conditions resulting in logs not being drained properly. This has been resolved by upgrading our bundled docker version (to 1.12.6) as well as log handling improvements.

  • Fixed deployment in environments which require HTTP proxy configurations. Some of the new validations added in 0.7 were not properly using HTTP proxy configurations in all cases, causing those calls to fail. No configuration changes are required. We now properly use the proxy configurations if set.

  • Fixed deployment when the cluster’s private IP range conflicts with what ODAS requires. Specifically, in 0.7.2, we resolved the conflict on the IP range 10.32.0.0/12.

  • Fixed token expiration display for Json Web Tokens (JWT) in the UI. The value displayed in the previous version was incorrect and much longer than the actual expiration. Note that this was a presentation issue only; the tokens would have expired correctly.

  • Fixed issue where catalog does not load properly when there are invalid catalog objects. These catalog objects are unreadable by Okera but now this no longer causes other catalog objects to be skipped.

  • Support LDAP default domains and SSL enabled servers. Previously, ODAS only supported distinguished names. Now it is possible to login with a domain name, for example USERS\user. It is also possible to configure a default domain and just login with user.

  • Updated packaged docker version from 1.12.2 to 1.12.6, which includes multiple stability fixes.

0.7.1 (dec 2017)

New Features

  • Support for Json Web Token (JWT) authentication using public key and external server. In previous releases, ODAS could only be configured to use one or the other. It is not possible to configure both at the same time. For more information see here.

Bug Fixes

  • Support for non-reneweable keytabs 0.6.2 contains a single fix for kerberized clusters. All ODAS services have been updated to be more robust to non-renewable keytabs. Previously, services may become unstable when the kerberos tickets expired when using these kind of keytabs.

  • UI no longer shows zero datasets in the case that some failed to load In 0.7.0 and previous releases, the UI would show no datasets in the Datasets page list, even if only one failed to load. In 0.7.1, the UI will show all datasets that were loaded without error.

bug fixes

0.6.2 (dec 2017)

0.6.2 contains a single fix for kerberized clusters. All ODAS services have been updated to be more robust to non-renewable keytabs. Previously, services may become unstable when the kerberos tickets expired when using these kind of keytabs.

0.7.0 (nov 2017)

0.7.0 is a major release.

New Features

JSON structured audit logs

By default, the planner audit log is now output as json in the planner logs. The new format contains more information, is much easier to parse and will be stable over time. For more information on its schema and how to use it, see the docs.

Support for Hive SerDes

Prior versions of ODAS had limited support for tables which require a custom Hive SerDe to read. In this release, we extended the support to all the ODAS supported types, the ability to read SerDe libraries from S3 and builtin, HiveQL compatible support in the various ODAS hive-ddl APIs. For more information, see here.

Support for external views

In prior version of ODAS, views created in Okera had to be evaluated in ODAS. This is critical for views which enforce data security related transformations. For example, if the view anonymizes user data, the view must be evaluated in ODAS before it is returned to the client. In this release, we added support for external views, which are used to store views to store data transformations and have no security implications. These views are used strictly for tracking what can be evaluated in ODAS or in the compute application. For more details, see here.

Support to drop permissions when dropping catalog objects

Dropping a database, table or view does not drop the permissions on the object. This means if the same object with the same name is created, it will retain the permissions from the dropped object. This pattern is sometimes used, for example in ETL, where the permissions and catalog object (re)creation are decoupled. This behavior is not ideal in other cases. In this release we extended the DROP [DATABASE|TABLE|VIEW] DDL commands to support optionally dropping the associated permissions as well. If not specified, this command is backwards compatible and keeps the previous semantics. For more information see this.

Dynamic REST API scan page size

Previously the REST scan page size was fixed (by default 10000 records). This can be problematic for tables with very many columns and requests would cause timeouts or other bad behavior. In this release, the page size is dynamic (up to 10000 by default) and adjusts based on how long the requests are taking.

Improved client connection related errors

Previously, the errors related to failed connections from the Java client libraries required looking at the ODAS service logs. These errors should now be returned to the client. Note that this requires updating the client library to the latest version (beta-9) as well. Older clients are compatible and will continue to work, but may not see the improved error reporting in all cases.

Filtered dataset search in the UI

The Okera UI now has enhanced dataset search capabilities, allowing users to search for datasets by name and/or database name, as well as filter by any set of databases.

Dataset access inspection in the UI

Dataset stewards (or anyone with all access to a dataset) can inspect which groups or set of groups has access to their datasets as well as which fields in those datasets.

Bug Fixes

  • Fixed an issue where the 30-second timeout for session_ids issued by the /api/scanpage endpoint was starting from the beginning of a query, resulting in the user being “charged” for system time. This was corrected and the timer now starts when ODAS begins returning data to the user.

  • Fixed /api/scanpage endpoint so that for fetching more than 10,000 records, the total number of records to be fetched only needs to be specified on the first call. Subsequent calls can use the session_id to return the remaining records in batches of up to 10,000 entries. If the record number is specified in subsequent calls, it will be ignored.

  • Unable to see a database if the user has only been granted access to that database’s columns. This issue has been resolved and the user can properly see a database even if they only have partial access to objects in the database.

  • Allow dropping views even if its metadata becomes invalid. In previous versions, it would sometimes not be possible to drop views if its metadata became invalid. This can happen for example, if the base table for the views are deleted. These views can now be dropped.

  • Fixed an issue where permission granted to a top-level S3 bucket was not being propagated to sub-directories.

  • Fixed an issue were detailed error messages were not being returned to clients for server-side errors (they were being overwritten with generic error messages).

  • Fixed an issue where Hive in EMR 5.8 and later would not work with ODAS. EMR 5.8.0 upgraded Hive from 2.1.1 to 2.3.0 which introduced a backward compatibility breaking api change, which has been addressed.

Incompatible and Breaking Changes

  • The records parameter for the /api/scanpage REST endpoint now indicates the total number of records that the query should return. This had previously indicated the batch size to be used for each page. Results are now returned in batches of up to 10,000 entries.

  • Number of services in standalone cluster reduced from 8 to 7. In the release, the earlier version of the UI was completely removed, reducing the number of services by 1. In 0.6.x, this service was running but not externally exposed by default (for example, even in 0.6.x, endpoints did not report the earlier UI).

  • Root user required to run kubectl commands on kubernetes master. Previously, kubectl could be run as any user, for example, ec2-user. It is now required to be root to run kubectl.

  • Existing launch scripts will not work with 0.7.0. We have enhanced our validation of launch scripts when they are registered during the okera_cli “environments create” call. The changes required to perform that validation necessitates that all launch scripts must to be replaced with ones based on the new template provided in the 0.7.0 release. Launch scripts based on earlier templates will cause the “environments create” call to fail. All of the values that need to be configured have been moved to a dedicated section towards the top of the script in the new template, so porting older launch files should be fairly quick. Launch scripts based on the 0.7.0 template are backwards compatible with older ODAS releases.

Known issues

**After installing a new version of DM, upgrading a component in existing cluster does not work

The workaround is to upgrade the existing cluster component(s) to the newer version before newer DM is installed and restarted.

** Java 9 is not supported

There is an issue with the new module changes in Java 9 that will be addressed in future versions of ODAS.

0.5.3 (november 2017)

0.5.3 is a minor release consisting of backports of select fixes from the 0.6.1 release. Those patches are:

  • Invalid JSON encountered when retrieving cluster status
  • Incorrect user when using /api/scanpage endpoint
  • add_date function does not work with view creation
  • REST server scaling issues

0.6.1 (oct 2017)

0.6.1 is a minor release and we recommend all 0.6.0 users upgrade.

New Features

OAUTH integration

The Web UI now supports authentication using OAUTH. If configured, users will be redirected to the identity provider’s login page, for example logging in with their gmail account.

Improved SSL support

Some clients (e.g. latest version of chrome) require the REST server to have a DNS domain name (instead of IP) if SSL is enabled as additional security. In this release, we added a configuration to specify the DNS name for the REST server. This is not required for SSL to be enabled and not all clients require the server to be configured this way.

This configuration is OKERA_SSL_FQDN. For example:

export OKERA_SSL_FQDN=cluster1.cloud.com

Note that due to our traffic routing, this can be the DNS name of any machine in the cluster, for example, the CNAME for the cluster.

Bug Fixes

Support for EMR up to 5.9 Hive in EMR 5.8, introduced a non backwards compatible change which caused issues for older versions (0.6 and before) of Okera EMR clients. This has been fixed in 0.6.1 and now supports all versions from 5.3 to 5.9.

0.6.0 (sep 2017)

0.6.0 is a major release. It includes major new features and numerous improvements across the Okera services.

New Features

New Web UI

The Web UI has been revamped with a new look-and-feel, enhanced stability, and several new features, including improvements to dataset discoverability, metadata, and account information.

Datasets are now searchable from the dataset browser; search for datasets by name.

Dataset information has been expanded. We now explicitly display which columns the current user has access to and which groups grant access to those columns without access. We also show which columns are partitioning columns. Finally, a description of how to integrate a dataset with R has been added.

On the home page, account details are displayed, including the token for the current user, which groups the current user belongs to, the roles the current user has, and the groups granting those roles.

See the docs for more information.

Significantly improved integration with EMR

EMR integration has been significantly improved, allowing better pushdown into ODAS across the engines, improved multi-tenant user experience, work scheduling, etc. See EMR docs for more details.

Support for SSL

This release adds SSL support to the REST server and WebUI. If configured, users should switch to https, whenever interacting with either of these services.

Minor Features

  • Added cli commands to specify the number of planners. See docs for more details.
  • Added cli commands to specify additional arguments for the planners and workers.
  • Improved load balancing for worker tasks. Users should see more consistent load on workers, particularly in the case when there are a smaller number of total tasks.
  • Significantly reduced memory usage when executing joins.
  • Improved install times. Okera binary sizes have been significantly reduced.

Incompatible and breaking changes

Deprecating specifying user token as part of URL for REST server

We will be deprecating supporting specifying user tokens as part of the URL in a subsequent release and recommend users start switching now when using the REST server. For example, instead of querying rest-server-host:port/api/databases?user=<TOKEN>, clients should instead specify the token as part of the Authorization header. See these docs for more details.

WebUI port renamed

The name of the webui port has been renamed from okera_catalog_ui:webui to okera_web:webui. This value is used when configuring ports (OKERA_PORT_CONFIGURATION) as well as the output from listing the service endpoints.

Permission roles are now consistently case insensitive

Case sensitivity in roles was inconsistent and now we’ve made them consistently case insensitive. This means that commands such as CREATE ROLE admin_role and CREATE ROLE ADMIN_ROLE now are identical.

Output of okera_cli clusters nodes changed

The output is no space separated and not commas to make it easier to interop with ecosystem tools. If this was consumed from scripts, they may need to be updated.

Known issues

Dataset preview in Web UI or Catalog REST API shows 0s instead of NULLs

When navigating to a particular dataset in the Web UI and clicking “Show Preview”, if the value of a particular cell is NULL, it is displayed as 0. This issue additionally exists when querying the Catalog REST API at /api/scanpage/<dataset>.

The workaround to determine the correct value of the cell is to query the dataset records via alternate Okera clients, like odb.

Unable to upgrade an existing cluster from 0.5 to 0.6

The install binary format has changed in 0.6, meaning clusters prior to 0.6 will not be able to handle the binary images. Note that the metadata stored in a 0.5 cluster can be read by a 0.6 cluster. Users can create a new 0.6 cluster instead. If upgrading an existing cluster is important, contact us and we can manually do this.

Unable to see databases if user has only been granted columns to objects in database

If a user has been granted only partial access to all objects (table or views) in a database, they are not able to see the database or any of the contents in it. Users are only able to properly see the objects if they’ve been granted full access to at least one table or view in that database (at which point the access controls work as expected).

Workaround: create a dummy table in these database and grant users full select on this table.

Hive in EMR does not support all DDL commands

The Hive Okera integration for EMR does not currently support all DDL commands. It does not support GRANT/REVOKE statement and ALTER TABLE statements.

Workaround: use the DbCli or connect through a kerberized Hive installation.

0.4.3 and 0.5.1 release notes (august 2017)

0.4.3 and 0.5.1 are minor patch release that contain significant performance fixes as well as critical fixes for the Hive EMR integration.

It is recommended that all 0.4.x and 0.5.x users upgrade.

In particular:

  • Significant speedups handling tables with larger number of partitions
  • Improved column pruning and predicate pushdown when using Hive in EMR.

New Features

The EMR integration has been updated on the configs that are required for better Spark and Hive integration. In particular, we recommend specifying the Okera planner.hostports config for Spark’s hive-site.xml config. This has been updated in the EMR. docs

We’ve also updated the client versions to 0.5.1 and EMR clusters should be bootstrapped with this version (from 0.5.0).

0.5.0 release notes (july 2017)

0.5.0 is a major release. It contains all the bug fixes from 0.4.1 and 0.4.2 and numerous other bug fixes.

New Features

JSON Web Token and SSO support

This release adds support for authentication with ODAS using JSON Web Tokens(JWT). These tokens can verified by ODAS using either public/private keys or via an external SSO server. This can be used in addition to or instead of kerberos authentication. If an external SSO server is configured, Okera can also be configured to generated SSO tokens in our webui.

Support for joins with views

Support has been added to create views which contain joins. Previously, views only support filters and projections. This enables use cases where the sensitive data needs to be joined against a (dynamic) whitelist. After creating the view, the identical grant/revoke statements can be used to control access to it. The kind of joins we enable is limited, see docs for more details.

Improved support for EMR, including Hive and Presto

While previous versions could support EMR, we’ve improved the integration experience, providing a bootstrap action which enables deeper integration. See the EMR Integration docs for details.

Simplified Install Process

Deployment Manager config verification has been improved significantly to catch more configuration issues as well as report the issues more intuitively. Please let us know how to improve this further. We’ve also removed the steps to upload any config files to S3 manually as part of the install.

Hadoop client updates

This release corresponds to the beta-6 release of the Hadoop client libraries. While previous clients are API compatible, we advise upgrading to this version.

Incompatible and breaking changes

Planner_worker service port renamed

The okera_planner_worker service ports have been split into okera_planner and okera_worker. Specifically:

  • okera_planner_worker:planner is now okera_planner:planner
  • okera_planner_worker:worker is now okera_worker:worker

This will need to be updated in the OKERA_PORT_CONFIGURATION value as well as the output of ‘okera_cli clusters list.’

Java client token authentication change

Previously, Hadoop java clients (e.g. MapReduce, Spark, Pig, etc) needed to specify the service name if they were using token authentication. This service name had to match the principal of the okera cluster. For example, if the Okera cluster had the kerberos principal ‘odas/service@REALM’, the service name had to be ‘odas’. This value could change from ODAS cluster to ODAS cluster which can be difficult.

In 0.5.0, we’ve update it so this value should always be ‘okera’ and we recommend that clients not set it at all (it is the default in the updated client jars). Clients that were setting it will see connection failures.

For example, in spark, applications should remove specifying: ‘spark.recordservice.delegation-token.service-name’.

Environment variable name change

The environment variable OKERA_JWT_SERVICE_TOKEN_FILE has been replaced with OKERA_SYSTEM_TOKEN. Users upgrading from 0.4.5 will need to update this config.

Known issues

Unable to see databases if user has only been granted columns to objects in database

If a user has been granted only partial access to all objects (table or views) in a database, they are not able to see the database or any of the contents in it. Users are only able to properly see the objects if they’ve been granted full access to at least one table or view in that database (at which point the access controls work as expected).

Workaround: create a dummy table in these database and grant users full select on this table.

Hive in EMR does not support all DDL commands

The Hive Okera integration for EMR does not currently support all DDL commands. It does not support GRANT/REVOKE statement and ALTER TABLE statements.

Workaround: use the DbCli or connect through a kerberized Hive installation.

0.4.2 release notes (july 2017)

0.4.2 contains a critical bug fix for users running a newer kernel with the Stack Guard. protection. This will result in the ODAS services (planner and worker) crashing on start.

Kernels with this version are known to be affected: kernel-3.10.0-514.21.2.el7.x86_64. We recommend all 0.4.x users upgrade.

For more information, see this

0.4.5 release notes

June-2017

0.4.5 contains support support for JSON Web Tokens (JWT) for authentication. Users not using JWTs do not need to upgrade to this version. For details on how to configure JWT support, see the install docs.

Incompatible and breaking changes

Planner_worker service port renamed

The okera_planner_worker service ports have been split into okera_planner and okera_worker. Specifically:

  • okera_planner_worker:planner is now okera_planner:planner
  • okera_planner_worker:worker is now okera_worker:worker

This will need to be updated in the OKERA_PORT_CONFIGURATION value as well as the output of ‘okera_cli clusters list.’

0.4.1 release notes

June-2017

0.4.1 contains bug fixes for 0.4.0 and it is recommended to upgrade all 0.4.0 clusters.

Upgrading

First, upgrade the Deployment Manager. While the upgrade is happening, existing ODAS clusters will continue to be operational.

cd /opt/okera # Or where your existing install directory is.


## get the tarball from s3.
curl -O https://s3.amazonaws.com/cerebrodata-release-useast/0.4.1/deployment-manager-0.4.1.tar.gz


## extract the bits.
rm -f deployment-manager
tar xzf deployment-manager-0.4.1.tar.gz && rm deployment-manager-0.4.1.tar.gz && ln -s deployment-manager-0.4.1 deployment-manager


## restart the deploymentmanager
/opt/okera/deployment-manager/bin/deployment-manager


## upon restarting, the new deploymentmanager will take a few seconds to health check

## the existing services.

When all the existing clusters report READY, upgrade those clusters one by one. For each cluster, run:

okera_cli clusters upgrade --version=0.4.1 <CLUSTER_ID>

This will take a few minutes to download the updated binaries and restart the cluster afterwards. The existing services will be operational while the download is occurring. See the cluster admin docs for more details.

Bug Fixes

  • Fixes to the Tableau WDC connector to be tolerant to catalog errors.
  • Fixes for users specifying a custom ‘core-site.xml’ or ‘hive-site.xml’ config. In 0.4.0, we were not picking up these configs files correctly.

New Features

In addition, we’ve added a new feature to allow the user to configure CNAMEs for service endpoints. This is useful for example, if end users should only access the odas_rest_server API endpoint behind a CNAME. In the example below, the service endpoint odas_rest_server:api maps to the CNAME cname1.example.com

Update the OKERA_SERVICE_CNAME_MAP config in your env.sh file as follows:

$ export OKERA_SERVICE_CNAME_MAP="<service_endpoint_host>:<service_endpoint_port>:<CNAME>"

In the case of the above example this becomes:

$ export OKERA_SERVICE_CNAME_MAP="odas_rest_server:api:cname1.example.com"

Once set, restart your Deployment Manager.

NOTE: If you upgrade your Deployment Manager to 0.4.1, make sure you upgrade ODAS to 0.4.1 as well.

0.4.0 release notes

May-2017

New Features

Cluster Administration

Cluster administration has been significantly enhanced to protect your cluster from accidental termination, scaling an existing cluster, and upgrading to newer versions of ODAS components. See Cluster Administration for further details.

SQL Statement Processing

The Record Service daemon and catalog-admin rest endpoint scan and scanpage APIs now process SQL statements through a POST interface.

Database Command Line Interface (CLI)

End-user database and dataset functionality is made available through a command line (CLI) tool, odb. The tool enables users to acquire tokens, list databases, list datasets in a database, show the schema for a dataset (describe), view a sample of data, create tables and grant permissions through Hive DDL.

See Database CLI for details.

Basic Authentication using LDAP

With this release, Basic authentication using LDAP is introduced, which should allow Okera users to authenticate using their Active Directory credentials.

The user now has an option to either use the REST API or the new web-based login UI to get their Okera token using their Active Directory username and password.

See the LDAP Basic Auth Document for details.

Changes

okera_cli utility

The following subcommands were added:

  • okera_cli agents state
    • lists the state of all agents, grouped by cluster
  • okera_cli clusters nodes <cluster id>
    • lists the nodes that constitute the indicated cluster
  • okera_cli clusters set_default_version --version <version> --components <componenta:version,component:version> <clusterID>
    • Sets the version to be used by clusters that are subsequently created. The version flag sets the version of ODAS to install and the components flag allows for specific components to run a different, likely higher, version of that component. This command wipes out any existing version and component settings. The –version argument must be a valid version, –components is also required but can be an empty string.

AWS Region Configuration

Deployment Manager will now detect the AWS region that it is running in if one is not configured via the AWS_DEFAULT_REGION value. The result is cached, requiring a Deployment Manager restart (following a change in your configuration) if you want to manage a cluster in another AWS region.

Kerberos

Deployment Manager allows a kerberos principal for the REST API to be explicitly specified. Previously this was assumed to be derived from the service principal (i.e. HTTP/). See [Kerberos](kerberos-cluster-setup) docs for more details.

Deployment Manager

The S3_STAGING_DIR environment variable is now validated during Deployment Manager startup. Improved reporting of issues that arise during agent startup. Configured service port uniqueness is now enforced during Deployment Manager startup.

Using Encrypted S3 buckets

Writes to S3 buckets now set the server-side encryption flag on the S3 write or copy request if the S3_STAGING_ENCRYPTION configuration is set to true.

Incompatible and Breaking Changes

Change to OKERA_KERBEROS_KEYTAB_FILE config

Previously, this config used to be the basename of the keytab file and the user was required to upload the file to Okera’s S3 staging directory. In this release, this config has been updated to be the full path to the keytab. The path can be on the Deployment Manager machine (e.g. /etc/keytabs/okera.keytab) or on S3. No steps are now required to upload the keytab to the staging directory.

The prior config will no longer validate and the Deployment Manager will not start up. Users coming from a previous release will need to update their configs. For example, by changing:

export OKERA_KERBEROS_KEYTAB_FILE=okera.keytab

## to
export OKERA_KERBEROS_KEYTAB_FILE=/path/on/deployment-manager/okera.keytab

Rename database to db

Some REST APIs used the field name ‘database’ and others used the field name ‘db’. All APIs were changed to consistently use the term ‘db’ This impacts the following APIs:

  • api/datasets [POST]
  • api/datasets/{name} [POST]

For detail see: Catalog REST API.

Known issues

Errors during Deployment Manager configuration file writes prevent restart

If the configuration file for Deployment Manager is not correctly written to RDS (due to an issue occurring during the write), then the Deployment Manager will not be able to start again. The workaround is to manually update (correct) the underlying configuration file.

Unable to delete a launching cluster

In some cases, it is not possible to immediately terminate a cluster that is launching and the delete will only go into effect after it is done launching. To get the cluster to delete immediately, manually terminating/shutting down the launching machines will work.

0.3.0 release notes

February-2017

This is the feature complete release candidate of ODAS.

New Features

Tableau Okera Catalog Integration

You can access data stored in Okera using Tableau. See Tableau WDC for details.

Catalog UI

Beta release of the catalog webui. You can see the datasets that are in the system and how to read them from a variety of integration points. Just navigate to the okera_catalog_ui:webui end point and log in with your user token.

Catalog REST API Integration

Changes were made to the Catalog REST API. See Catalog API and the tutorial for further details.

Installation Process

The installation process has been enhanced by providing customizable templates for launching EC2 instances and initializing cluster nodes. See: Installation Guide, “Starting up a ODAS cluster” for details.

Authentication

With this release, all Okera services can run with authentication enabled end-to-end. See: Authentication for further details. This includes non-kerberized clients (for example the catalog webui) using tokens.

For information on setting up a Kerberized cluster, see: Kerberized Cluster Setup

Changes

Admin Dashboard

The Kubernetes admin dashboard has been upgraded to version 1.5.1 from version 1.4.2. See Kubernetes Quickstart for details.

Kubernetes

Kubernetes has been upgraded to version 1.5.3 from version 1.4.2.

Incompatible and Breaking Changes

Renamed okera_catalog_ui to odas_rest_server. This is a port configuration change and will require users to update their env file. Note that this point will also need to be exposed.

Installation instructions moved components from /var/run/okera to /etc/okera. Prior versions of the install script recommended you place various files (on the Deployment Manager machine) in /var/run/okera. If you have built scripts and automation following those steps, those should be adapted to use /etc/okera instead.

Known issues

Catalog UI sometimes does not refresh databases correctly. Refresh from the browser as a workaround.

0.2.0 release notes

02-03-2017

0.2.0 Okera ODAS release makes significant improvements on usability, security and reliability.

New Features

Installation

Install process has been further simplified with fewer steps and faster deployment. Configuration steps include examples. See install Logging improvements assist in faster problem determination, if any.

Server

Deployment Manager(DM) server has evolved to an agent architecture. Each of the cluster nodes will now run a DM agent to deploy and launch Kubernetes services.

See install for details.

Security

With this release, REST api to deployment manager can be secured using Kerberos. Along with Kerberos authentication, authorization may be configured for admin access to deployment manager. See secure-deployment-manager-rest-api for additional details.

Admin Dashboard

You may now use the Kubernetes admin dashboard for managing the Okera cluster. See kubernetes-dashboard-quickstart for details.

CLI interface

okera_cli has new commands to interact with the DM REST API and the agents. See OkeraCLI.md for details. install has a few examples. Run okera_cli help to see the much improved command line options.

Incompatible Changes

No known incompatibilities exist.

Known issues

Cluster create fails occasionally

This manifests with an error message like: Unreachable external machine: 10.1.10.101:8085. Expecting Okera agent to be running at: 10.1.10.101:8085.

The workaround is to rerun the command few seconds later.

Duplicate IP address in cluster machine list will cause launch failures If an IP address appears more than once in the machines list, the install process will fail.

The workaround is to ensure that there are no dups.

release notes

12-6-2016

Multiple updates were made on how to interact with the Okera catalog. Some of these changes are not backwards compatible.

New Features

CLI interface for catalog

The REST API was intended for programmatic access. While it is possible to use it interactively (using curl), it is not very user friendly. We have added a CLI that sits on top of the REST API. It provides identical capabilities to going against the REST API directly.

Permissions API

We added a permissions API to both the REST API and CLI which can be useful to examine the aggregate result of the policies that have been set. It is useful to answer questions such as:

  • What datasets (and pieces of them) does this particular user/group have?
  • What are all the users/groups that have access to this dataset?

Incompatible Changes

Policy API has changed.

The endpoints are different (/api/grant-policy, /api/revoke-policy). The arguments are largely the same. It is no longer necessary to delete policies and not possible to view them.

Known issues:

Creating the dataset with ‘storage_url’ currently requires the dataset to be parquet.

The workaround is to specify the request using hiveql.

Granting a policy with filters is disabled

This causes compatibility issues with clients such as Impala. The workaround is to register the dataset with the filter and grant the policy on the new dataset.