Skip to content

Prerequisites

Okera is provided as a set of Docker images that run as containers in managed Kubernetes environments, such AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). You can use Helm charts, a Kubernetes package manager, to deploy and configure your Okera cluster in any of these Kubernetes environments. See Deploy Okera Using Helm Charts.

Networking

Okera will run within a virtual private network (VPC), virtual network (VNet), or virtual local area network (VLAN) and subnet that you configure in your local on-premises or cloud off-premises environment. It is not required to create a dedicated network or subnet for Okera, but it can be helpful to isolate access. If you do create a dedicated network and subnet, you should follow your normal procedures and guidelines for doing so.

The following should be taken care of when setting up the network connectivity:

  • All Okera nodes can talk to each other on all ports
    • Disable or configure the intra-cluster firewall services accordingly
  • All Okera nodes are accessible by client applications on specific ports
    • Configure the external firewall services accordingly
    • Inbound traffic should be limited to the ports specified below
    • All outbound traffic for both TCP and UDP should be unrestricted

For incoming connections, the rules for the external traffic should have the following definitions:

Protocol Port Source Description
TCP 8083 <service CIDR> Okera Web UI
TCP 5010 OkeraEnsemble AWS CLI, Spark, and Databricks
TCP 12050 <service CIDR> Okera Policy Engine (planner) API
TCP 12052 <service CIDR> Okera Hive HiveServer2 proxy (optional)
TCP 12053 <service CIDR> Okera Impala HiveServer2 proxy (optional)
TCP 13050 <service CIDR> Okera Okera Enforcement Fleet worker API
TCP 14050 <service CIDR> Okera Presto/JDBC API
TCP 22 <admin CIDR> SSH
TCP 12051 <admin CIDR> Okera Policy Engine (planner) diagnostics (optional)
TCP 13051 <admin CIDR> Okera Enforcement Fleet worker diagnostics (optional)
TCP 32009 <admin CIDR> Okera diagnostics (optional)

Notes:

  • The above ports are not currently configurable in Okera.

  • <service CIDR> refers to the CIDR range in which your Okera cluster should be accessible. This should encompass all users of the web UI and any client (such as Tableau, Databricks, etc.) that will need to access Okera.

  • <admin CIDR> refers to the CIDR range in which administrative functions should be accessible, typically via a browser. In many cases, these can be the same.

  • The ports for the HiveServer2 proxies, by default 12052 and 12053, only need to be opened when Okera is configured as a proxy for Hive or Impala.

Shared Storage

An Okera cluster uses a shared storage system (like HDFS, Amazon S3, or ADLS) to store audit and system log files and (optionally) to stage intermediate configuration files. This path is set to be readable and writable by all nodes running Okera services.

The following applies to the shared storage configuration:

  • Okera needs a location where it can persist log files, including the audit log. This requires at least write permissions to the configured location.

  • For the Insights page to work properly, the Okera nodes also require read access to the configured location. The Insights page uses the Okera audit logs as its data source.

Cluster Nodes

Okera requires a clean install of bare-metal or equally performing virtual machine instances for reliable software installation. The following subsections address the various aspects of providing the right cluster nodes.

Hardware

Okera on-premise deployments are recommended on machines with the following minimum hardware specifications (or larger if necessary):

Environment CPU Cores Memory (RAM) Networking
Development 8 32+ GB 1 GigE
UAT/Test 16 16 GB 1-10 GigE
Production 32+ 32+ GB 10+ GigE

In addition, a minimum 120 GB data drive is needed.

Operating System (OS)

Okera has been tested on the following operating systems:

  • CentOS 7.2 or newer
  • RHEL 7.4 or newer
  • Ubuntu

The following should be taken care of when configuring the operating system:

  • Root access is required.

    • Using sudo is sufficient
    • This is only needed during the installation and low-level cluster management (such as upgrades)
  • When using XFS it must be formatted with Docker support enabled.

    Docker requires that the d_type option is enabled, which requires the file system to be formatted with the ftype=1 option. See the Docker Docs and this blog post for details.

  • The bridge netfilter kernel module is required for Kubernetes to work.

    See the network troubleshooting pages.

Per-Node Storage

While an Okera cluster is not using local disks to store any user data, it requires local storage for operational purposes. This includes the Okera container images, the Docker and Kubernetes-related packages, as well as log file space.

The following should be taken care of when configuring the nodes with local storage:

  • Okera requires at least 150 GB of storage on each node

  • The 150 GB of storage should be assigned to root of the file system (that is, /)

  • All working directories should be able to use the total space available

    • Especially /opt and /var should be not mounted as separate volumes but share the main storage
    • Ideally, the nodes have a single volume and mount point at the root

Database Management System (RDBMS)

Okera is backed by a relational database, which persists the metadata of the cluster, including schemas, policies, and attributes. You can use a shared database system or a dedicated one. All database names an Okera cluster is using are configurable and allow to share an RDBMS while keeping the metadata separate.

The following database systems are supported:

Database System Version Notes
MySQL/MariaDB 5.6 and 5.7 Supported since Okera 1.0
PostgreSQL 9.5 and 10.7 Supported since Okera 2.0.1

The following notes apply:

  • All Okera nodes will require administrative permissions on the configured databases

    • It is possible to pre-create the databases to avoid giving Okera CREATE permissions at the catalog level
  • For test or proof-of-concept (POC) instances of Okera there is an option to leave the database system unconfigured

    • In this case an embedded PostgreSQL service is used (WARNING: May result in loss of metadata!)