Skip to content

Prerequisites

For the sake of simplicity Okera recommends that it be set up on managed container orchestration services, such as AWS EKS, Azure AKS, or Google GKS. If this is not an option, you can use the provided Gravity installation procedure to set up a managed container service from scratch. This applies to setups on virtual machines in cloud environments, such as AWS EC2, Azure VMs, or Google GCE, as well as to setups on bare-metal machines in your data centers.

The main difference between a Gravity-based installation and one using a managed Kubernetes service is that in case of Gravity, there are two types of nodes in the Okera setup: master and worker nodes. On the master node, there are two sets of services installed: the Kubernetes Control Plane and the Okera services, such as Okera Policy Engines (planners) or Okera Enforcement Fleet workers. Conversely, the worker nodes only host the Okera services, reducing the load on those machines to some extent.

For non-Gravity installations, where the Kubernetes Control Plane is set up and managed by the service provider, there are only worker nodes as far as Okera is concerned, as all its services are located just on those worker nodes. In other words, the masters (from a user's perspective inaccessible) are only serving the Kubernetes services.

The following provides general guidelines regarding the Gravity-based installation of Okera on bare metal machines. In addition to these guidelines, nameserver lookup of some form is also required.

Networking

Okera will run within a virtual private network (VPC), virtual network (VNet), or virtual local area network (VLAN) and subnet that you configure in your local on-premises or cloud off-premises environment. It is not required to create a dedicated network and/or subnet for Okera, but it can be helpful to isolate access. If you do create a dedicated network and subnet, you should follow your normal procedures and guidelines for doing so.

The following should be taken care of when setting up the network connectivity:

  • All Okera nodes can talk to each other on all ports
    • Disable or configure the intra-cluster firewall services accordingly
  • All Okera nodes are accessible by client applications on specific ports
    • Configure the external firewall services accordingly
    • Inbound traffic should be limited to the ports specified below
    • All outbound traffic for both TCP and UDP should be unrestricted

For incoming connections, the rules for the external traffic should have the following definitions:

Protocol Port Source Description
TCP 8083 <service CIDR> Okera Web UI
TCP 5010 OkeraFS AWS CLI, Spark, and Databricks
TCP 12050 <service CIDR> Okera Policy Engine (planner) API
TCP 12052 <service CIDR> Okera Hive HiveServer2 proxy (optional)
TCP 12053 <service CIDR> Okera Impala HiveServer2 proxy (optional)
TCP 13050 <service CIDR> Okera Okera Enforcement Fleet worker API
TCP 14050 <service CIDR> Okera Presto/JDBC API
TCP 22 <admin CIDR> SSH
TCP 12051 <admin CIDR> Okera Policy Engine (planner) diagnostics (optional)
TCP 13051 <admin CIDR> Okera Enforcement Fleet worker diagnostics (optional)
TCP 32009 <admin CIDR> Okera diagnostics (optional)

Notes:

  • The above ports are all configurable in Okera, that is you could expose the Web UI on port 6000 instead. This will require a corresponding change in the firewall definition, for instance, you will need to update the rules to allow access on port 6000. You can read more about this in the Ports page.

  • <service CIDR> refers to the CIDR range that your Okera cluster should be accessible from. This should encompass all users of the web UI and any client (such as Tableau, Databricks, etc.) that will need to access Okera.

  • <admin CIDR> refers to the CIDR range that administrative functions should be accessible from, typically via a browser. In many cases, these can be the same.

  • The ports for the HiveServer2 proxies, by default 12052 and 12053, only need to be opened when Okera is configured as a proxy for Hive and/or Impala.

Shared Storage

An Okera cluster uses a shared storage system (like HDFS, S3, or ADLS) to store audit and system log files and (optionally) to stage intermediate configuration files. This path is set to be readable and writable by all nodes running Okera services.

The following applies to the shared storage configuration:

  • Okera needs a location where it can persists log files, including the audit log. This requires at least write permissions to the configured location.

  • For the Insights page to work properly, the Okera nodes also require read access to the configured location. The Insights page uses the Okera audit logs as its data source.

Cluster Nodes

Okera requires a clean install of bare-metal or equally performing virtual machine instances for reliable software installation. The following subsections address the various aspects of providing the right cluster nodes.

Hardware

Okera on-premise deployments are recommended on machines with the following minimum hardware specifications (or larger if necessary):

Environment CPU Cores Memory (RAM) Networking
Development 8 32+ GB 1 GigE
UAT/Test 16 16 GB 1-10 GigE
Production 32+ 32+ GB 10+ GigE

In addition, a minimum 120 GB data drive is needed.

Operating System (OS)

Okera has been tested on the following operating systems:

  • CentOS 7.2 or newer
  • RHEL 7.4 or newer
  • Ubuntu

The following should be taken care of when configuring the operating system:

  • Root access is required.

    • Using sudo is sufficient
    • This is only needed during the installation and low-level cluster management (such as upgrades)
  • When using XFS it must be formatted with Docker support enabled.

    Docker requires that the d_type option is enabled, which requires the file system to be formatted with the ftype=1 option. See the Docker Docs and this blog post for details.

  • The bridge netfilter kernel module is required for Kubernetes to work.

    See the network troubleshooting pages.

  • The Docker packages should not be installed before attempting a Gravity-based Okera install

Per-Node Storage

While an Okera cluster is not using local disks to store any user data, it requires local storage for operational purposes. This includes the Okera container images, the Gravity, Docker and Kubernetes related packages, as well as log file space.

The following should be taken care of when configuring the nodes with local storage:

  • Okera requires at least 150 GB of storage on each node

  • The 150 GB of storage should be assigned to root of the file system (that is, /)

  • All working directories should be able to use the total space available

    • Especially /opt and /var should be not mounted as separate volumes but share the main storage
    • Ideally, the nodes have a single volume and mount point at the root

Database Management System (RDBMS)

Okera is backed by a relational database, which persists the metadata of the cluster, including schemas, policies, and attributes. You can use a shared database system or a dedicated one. All database names an Okera cluster is using are configurable and allow to share an RDBMS while keeping the metadata separate.

The following database systems are supported:

Database System Version Notes
MySQL/MariaDB 5.6 and 5.7 Supported since Okera 1.0
PostgreSQL 9.5 and 10.7 Supported since Okera 2.0.1

The following notes apply:

  • All Okera nodes will require administrative permissions on the configured databases

    • It is possible to pre-create the databases to avoid giving Okera CREATE permissions at the catalog level
  • For test or proof-of-concept (POC) instances of Okera there is an option to leave the database system unconfigured

    • In this case an embedded PostgreSQL service is used (WARNING: May result in loss of metadata!)