Okera Platform Architecture Overview¶
The Catalog Overview and ODAS Overview documents introduce the major components of the Okera Active Data Access Platform. This document takes the next step, explaining the Platform's architecture and showing the components in action for various client requests.
On a more technical level, the Okera Platform is divided into services that are accessible for clients, and a few other services that are only accessible internally by administrators or other services. This is intentionally the case to reduce complexity and concentrate access for clients to just a few locations. The following diagram shows all of the platform services in context.
First we will introduce how clients access the Okera Platform, and then break up the services shown in the diagram based on their scope.
The diagram starts in the top right corner with clients of varying kind communicating with the platform. There are three major access points:
Okera Portal Web UI¶
Clients can use Okera Portal to interact with the underlying platform. The web UI provides access to the catalog services, such as the Schema Registry and the Policy Engine, and enables self-service dataset discovery. It also offers a workspace to issue SQL statements against the platform. Behind the scenes, this web application uses the REST API gateway service to interact with your platform's services, just the way your bespoke or customized applications can.
Many clients have support for the ubiquitous REST protocol, avoiding most of the issues that arise when dealing with custom protocols. The REST gateway service gives access to the all necessary client API calls. This includes both the Schema Registry for creating, altering, or dropping various objects, as well as the Policy Engine for granting or revoking access related to all registered objects. See the Catalog API documentation for the list of supported REST calls.
Finally, Okera provides a set of libraries for interfacing directly with the Catalog and ODAS services, including Planners and the Workers. This is the recommended way to integrate with the Catalog and ODAS services.
Client Accessible Components¶
The Platform architecture diagram has a dashed line that separates the external services, which provide the client connectivity, from the internal support and management services. The following discusses the external and internal components separately, and how they work together to provide the services to all clients alike.
The external components in the architecture provide the major functionality of the platform, which is the data access and metadata management, as needed by clients such as interactive users to automated applications.
Note how the above referred to services, such as the REST API, Planner, and Catalog services. For fail-over purposes and continuous availability of these services, it is common in practice to have each service running multiple times. This is explained in greater detail in the High Availability documentation. In short, you can use the provided administrative functionality to instruct the platform to add redundant instances of the services (expressed as multiple, stacked boxes in the diagram) so that in case of a node failure the system will continue to operate.
The Planner service is also responsible for exposing the Catalog services: the Schema Registry and Policy Engine. In effect, the clients use the REST API or client libraries as proxies (as shown in the diagram) to these two Catalog services, provided by the Planner instances. The clients also communicate with the Planner as part of a query execution (see Communication below).
The ODAS client libraries are used by many of the provided higher-level client integrations, including Hive, Presto, Spark, and others. This integration support is needed to hide Okera as much as possible behind common tools, with which users are already familiar. For example, enabling the Hive integration allows for existing Hive setups to switch from direct usage of underlying data sources to use ODAS instead. There is no need to alter any metadata at all.
For Python, there are two different ways of communicating with an ODAS cluster: which is using the (legacy) REST API or the (newer) native PyOkera library.
Wherever possible, use PyOkera, as it performs markedly better than REST.
Once the cluster is operational, a client can communicate with the cluster services. Refer to the Authentication documentation for more details on how to secure the interactions between clients and an ODAS cluster.
The following sequence diagram shows how Presto, as an example interacts with ODAS after Okera integration is enabled (see Hadoop ecosystem tools integration).
Depicted is a read request issued by Zeppelin (an interactive notebook service), showing the various calls made between the client and the services. Once a query is submitted in Zeppelin, it will use the Presto JDBC driver to plan and execute the request. The ODAS Planner is used to retrieve the dataset schema and instruct Presto how to distribute the query across its workers. Each worker is handed a list of tasks to perform, which is querying a subset of the resulting records.