Prerequisites¶

Okera is provided as a set of Docker images that run as containers in managed Kubernetes environments, such AWS Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), or Google Kubernetes Engine (GKE). You can use Helm charts, a Kubernetes package manager, to deploy and configure your Okera cluster in any of these Kubernetes environments. See Deploy Okera Using Helm Charts.

Networking¶

Okera will run within a virtual private network (VPC), virtual network (VNet), or virtual local area network (VLAN) and subnet that you configure in your local on-premises or cloud off-premises environment. It is not required to create a dedicated network or subnet for Okera, but it can be helpful to isolate access. If you do create a dedicated network and subnet, you should follow your normal procedures and guidelines for doing so.

The following should be taken care of when setting up the network connectivity:

All Okera nodes can talk to each other on all ports
- Disable or configure the intra-cluster firewall services accordingly
All Okera nodes are accessible by client applications on specific ports
- Configure the external firewall services accordingly
- Inbound traffic should be limited to the ports specified below
- All outbound traffic for both TCP and UDP should be unrestricted

For incoming connections, the rules for the external traffic should have the following definitions:

Protocol	Port	Source	Description
`TCP`	`8083`	`<service CIDR>`	Okera Web UI
`TCP`	`5010`		OkeraEnsemble AWS CLI, Spark, and Databricks
`TCP`	`12050`	`<service CIDR>`	Okera Policy Engine (planner) API
`TCP`	`12052`	`<service CIDR>`	Okera Hive HiveServer2 proxy (optional)
`TCP`	`12053`	`<service CIDR>`	Okera Impala HiveServer2 proxy (optional)
`TCP`	`13050`	`<service CIDR>`	Okera Okera Enforcement Fleet worker API
`TCP`	`14050`	`<service CIDR>`	Okera Presto/JDBC API
`TCP`	`22`	`<admin CIDR>`	SSH
`TCP`	`12051`	`<admin CIDR>`	Okera Policy Engine (planner) diagnostics (optional)
`TCP`	`13051`	`<admin CIDR>`	Okera Enforcement Fleet worker diagnostics (optional)
`TCP`	`32009`	`<admin CIDR>`	Okera diagnostics (optional)

Notes:

The above ports are not currently configurable in Okera.
<service CIDR> refers to the CIDR range in which your Okera cluster should be accessible. This should encompass all users of the web UI and any client (such as Tableau, Databricks, etc.) that will need to access Okera.
<admin CIDR> refers to the CIDR range in which administrative functions should be accessible, typically via a browser. In many cases, these can be the same.
The ports for the HiveServer2 proxies, by default 12052 and 12053, only need to be opened when Okera is configured as a proxy for Hive or Impala.

Shared Storage¶

An Okera cluster uses a shared storage system (like HDFS, Amazon S3, or ADLS) to store audit and system log files and (optionally) to stage intermediate configuration files. This path is set to be readable and writable by all nodes running Okera services.

The following applies to the shared storage configuration:

Okera needs a location where it can persist log files, including the audit log. This requires at least write permissions to the configured location.
For the Insights page to work properly, the Okera nodes also require read access to the configured location. The Insights page uses the Okera audit logs as its data source.

Cluster Nodes¶

Okera requires a clean install of bare-metal or equally performing virtual machine instances for reliable software installation. The following subsections address the various aspects of providing the right cluster nodes.

Hardware¶

Okera on-premise deployments are recommended on machines with the following minimum hardware specifications (or larger if necessary):

Environment	CPU Cores	Memory (RAM)	Networking
Development	8	32+ GB	1 GigE
UAT/Test	16	16 GB	1-10 GigE
Production	32+	32+ GB	10+ GigE

In addition, a minimum 120 GB data drive is needed.

Operating System (OS)¶

Okera has been tested on the following operating systems:

CentOS 7.2 or newer
RHEL 7.4 or newer
Ubuntu

The following should be taken care of when configuring the operating system:

Root access is required.
- Using sudo is sufficient
- This is only needed during the installation and low-level cluster management (such as upgrades)
When using XFS it must be formatted with Docker support enabled.

Docker requires that the d_type option is enabled, which requires the file system to be formatted with the ftype=1 option. See the Docker Docs and this blog post for details.
The bridge netfilter kernel module is required for Kubernetes to work.

See the network troubleshooting pages.

Per-Node Storage¶

While an Okera cluster is not using local disks to store any user data, it requires local storage for operational purposes. This includes the Okera container images, the Docker and Kubernetes-related packages, as well as log file space.

The following should be taken care of when configuring the nodes with local storage:

Okera requires at least 150 GB of storage on each node
The 150 GB of storage should be assigned to root of the file system (that is, /)
All working directories should be able to use the total space available
- Especially /opt and /var should be not mounted as separate volumes but share the main storage
- Ideally, the nodes have a single volume and mount point at the root

Database Management System (RDBMS)¶

Okera is backed by a relational database, which persists the metadata of the cluster, including schemas, policies, and attributes. You can use a shared database system or a dedicated one. All database names an Okera cluster is using are configurable and allow to share an RDBMS while keeping the metadata separate.

The following database systems are supported:

Database System	Version	Notes
MySQL/MariaDB	5.6 and 5.7	Supported since Okera 1.0
PostgreSQL	9.5 and 10.7	Supported since Okera 2.0.1

The following notes apply:

All Okera nodes will require administrative permissions on the configured databases
- It is possible to pre-create the databases to avoid giving Okera CREATE permissions at the catalog level
For test or proof-of-concept (POC) instances of Okera there is an option to leave the database system unconfigured
- In this case an embedded PostgreSQL service is used (WARNING: May result in loss of metadata!)