Advanced Installation Procedures

This document describes advanced Okera installs. This document assumes the user is familiar with the base installation and will describe changes relative to that process.

This document covers the following categories of additional installation options:

Deployment Manager Configuration

Basic DM setup is instructed in the main Installation Guide. This section describes some additional options.

Changing DM Configuration Values

The environment variables set in the env.sh script are stored by the deployment manager when it starts up. When a deployment manager creates an Okera cluster, the current values of those environment variables are applied to the new cluster.

To change the values used by a deployment manager, you must either update the env.sh script (assuming that it’s in the default location of /etc/okera/), or update your environment variables. Then, restart the deployment manager by running:

/opt/okera/deployment-manager/bin/deployment-manager

Clusters created after the deployment manager is restarted use the new configuration values.

Changing Configuration Values in a Running Cluster

Note Certain infrastructure-specific configurations are immutable after cluster creation. These configurations cannot be updated without creating a completely new cluster. If you are not seeing a configuration value updated after following these steps, it could be an immutable configuration.

To change configuration values on a running cluster, do the following:

  1. Update the desired configuration values on the deployment manager (described above)
  2. Restart the deployment manager (described above)
  3. Restart the cluster, ocadm cluster <id> restart
  4. Wait for the restart to complete, approximately 5 minutes.

Working with OKERA_INSTALL_DIR

The value of OKERA_INSTALL_DIR has no effect on where Deployment Manager loads the /etc/okera/env.sh file. The Deployment Manager always attempts to read config values from /etc/okera/env.sh, if it exists. If env.sh does not exist in /etc/okera, the Deployment Manager falls back to the config values from environment variables. The setting OKERA_INSTALL_DIR, defined either in environment variables or in /etc/okera/env.sh, is used after the Deployment Manager is successfully configured and running; it is for writing additional files.

OKERA_SERVER_HOSTPORT

This is the host:port for running the Deployment Manager. The host listens to all interfaces (by default) and runs on port 8085 (e.g. 0.0.0.0:8085). This port does not need to be made accessible to the typical user who accesses data, but it is required by the one who administers Okera clusters.

Example: Setting the DM’s host and port

export OKERA_SERVER_HOSTPORT=10.10.1.123:8085

OKERA_DM_DB_NAME

This is the name of the relational database at the OKERA_DB_URL location (the RDBMS instance) that the Deployment Manager uses. If this RDBMS instance is only backing a single Deployment Manager install, the OKERA_DM_DB_NAME does not need to be set. Otherwise, each installation can have a different database. The database does not need to exist before starting the DM as long as its host instance is available.

Example: Setting the relational database name

export OKERA_DM_DB_NAME=okera_db

The OKERA_DM_DB_NAME value has format restrictions. It must begin with an alphabetic character and subsequent characters are restricted to alphanumerics and the underscore character (‘_’).

This command will evaluate the validity of the database name once the environment variable is established.

[[ "$OKERA_DM_DB_NAME" =~ ^[A-Za-z][A-Za-z0-9_]*$ ]] && echo "Is valid." || echo "Is not valid."

Special Hive Metastore Management

In a typical Okera Data Access Service install, ODAS will start up and manage a service compatible with Hive Metastore (HMS). You are not restricted to this default arrangement.

External Hive Metastore

If you prefer to have ODAS use an existing, externally managed HMS instance, you can do so. In this configuration, ODAS will simply read and write from the external HMS, but all other behavior is unchanged.

ODAS is supports Hive 1.1.0 and up.

To use an external HMS, add this config to your Deployment Manager configs, typically /etc/okera/env.sh.

export OKERA_EXTERNAL_HMS_CONFIG=<path to hive-site.xml>

The value should be a full path to the hive-site.xml client config for the external HMS. The path must be accessible from the Deployment Manager process but can be either local or remote (for example, S3). After setting the config and restarting the Deployment Manager, newly created clusters will use the external HMS. When restarted, existing clusters will update to use the external HMS config.

It is possible to have multiple ODAS clusters use the same external HMS, subject to the concurrency settings of your HMS.

Sharing Existing Hive Metastore or Sentry RDBMS

It is also possible to start an Okera catalog which shares the same RDBMS database as an existing Hive Metastore. This is useful, for example, during migration to have the Okera catalog use the same underlying database as the existing Hive Metastore. This allows the existing catalog information to be automatically visible through Okera.

To do so, configure OKERA_DB_URL (when starting up the Deployment Manager) to be the same database instance (MySQL server) as the one used by the existing Hive Metastore Then, when creating the cluster, specify --hmsDbName and/or --sentryDbName when creating the cluster from the CLI. Note that either or both can be specified - if not set, the values are derived from the cluster name.

Example: Pointing an existing Okera cluster to the Hive Metastore in env.sh

export OKERA_DB_URL=hms.db.mycompany.com:3306

Example: Creating a cluster with metadata in hive_db

ocadm clusters create --hmsDbName=hive_db --name=fintech_prod --numNodes=1 --type=STANDALONE_CLUSTER --environmentid=1

JWT

JWT enabled clusters can optionally set these configurations to fine tune the service scalability. These configs can be set using the DeploymentManager advanced configuration environment variable SERVICE_ENVIRONMENT_CONFIGS.

  • OKERA_EXTERNAL_JWT_AUTH_SERVER_CACHE_TTL_MS

    Time, in milliseconds, to cache valid tokens. If the token is presented again in this window, it will be accepted without reauthenticating it. This can be used to dramatically reduce load on the JWT authentication service. Be default, this is disabled (set to 0).

  • OKERA_EXTERNAL_JWT_AUTH_SERVER_TIMEOUT_MS

    Time, in milliseconds, to wait before timing out a JWT token authentication call if configured to authenticate against an external server. This value defaults to 5000 (5 seconds).

As an example, to cache a token for 300 seconds and set the timeout to 10 seconds, set in /etc/okera/env.sh

export SERVICE_ENVIRONMENT_CONFIGS="$SERVICE_ENVIRONMENT_CONFIGS;OKERA_EXTERNAL_JWT_AUTH_SERVER_CACHE_TTL_MS=300000"
export SERVICE_ENVIRONMENT_CONFIGS="$SERVICE_ENVIRONMENT_CONFIGS;OKERA_EXTERNAL_JWT_AUTH_SERVER_TIMEOUT_MS=10000"

External Kubernetes Cluster

Note: This feature is currently in beta

The DeploymentManager supports installing ODAS services on an externally managed kubernetes cluster. In this case, the user provides the DeploymentManager that cluster’s kubernetes configuration file. The DeploymentManager no longer participates in the clusters machine’s management in any way. This can be used for example, if there is an existing kubernetes cluster which can also be running non-ODAS services. The Kubernetes cluster can be managed or not, for example, running in your datacenter, Amazon’s Managed Kubernetes Service (EKS) or Azure’s Kubernetes Service (AKS).

These clusters can be created using ocadm. In this configuration, it is not necessary to first create an environment.

ocadm clusters create [standard arguments] --existingCluster=[Path to kubernetes cluster conf file]

For example:

ocadm clusters create --name odas --type STANDALONE_CLUSTER --existingCluster /etc/k8s.conf

Note The conf files must end in .conf.

Advanced Networking

Configuring the IP range that a ODAS cluster should use for internal routing

Each Okera cluster configures a private network for communication within the cluster. By default, Okera will use the 172.30.0.0/16 range for internal communication. The environment variable OKERA_CDAS_INTERNAL_NETWORK_IP_RANGE on the Deployment Manager can be used to configure this setting.

Note

This IP address range is used only for communication within each Okera cluster. External clients never connect to it and it does not have to be open on the network. It is important that this range does not conflict with any other ranges. For example it should not be a subset of the EC2 machine’s ip range.

Note

AWS attaches an HTTP server to every instance at IP address 169.254.169.254. You should never configure an internal network range that would overlap with that IP address. If that occurs, then AWS libraries will not be able to query the HTTP server for the instance’s IAM credentials, precluding access to AWS resources.

Okera Port Allocation

Okera utilizes network ports for external (user-facing) and internal system communication. This section outlines the default minimum port requirements for successful cluster functionality.

Warning

Please note that for optimal cluster performance, it is assumed that all internal ports between cluster node instances are open. If environment policy requires that only essential ports remain open, be aware that it is possible that the cluster may perform sub-optimally as the underlying Kubernetes and Docker layers are optimized over time.

External ports to the Deployment Manager and other ODAS services are used for ODAS cluster and data administration. It is assumed that these ports are available to administrative users.

Static Ports

Okera reserves the following ports for Okera service functionality. These ports are not user configurable and are required for cluster creation, management, and operation.

Port Usage
443 Kubernetes Master Communication
6443 Kubernetes API Server
6783 Kubernetes Weave
8085 Okera Master Host

Customizable Ports

Okera supports changing the following port assignments in the /etc/okera/env.sh configuration file for your environment’s Deployment Manager instance.

The table below lists the suggested port assignments as defined by Okera. All port assignments listed below are optional, and any that are not explicitly assigned will be assigned by Kubernetes during cluster creation.

In the case that no ports are assigned, ports will be assigned in the 30000-32000 port range. In cases where ports are partially assigned, Kubernetes will assign ports to the unassigned services in the port range between (1) the lowest-numbered port assignment and (2a) the highest-numbered assignment or (2b) 2000 ports from the lowest-assigned port – whichever range is larger.

Example 1:

  • Lowest configured port: 5000 (ODAS REST Server)
  • Highest configured port: 10020 (Planner)
  • Port range for possible assignment of non-assigned ports: 5000-10020

Example 2:

  • Lowest configured port: 5000 (ODAS REST Server)
  • Highest configured port: 5010 (Planner)
  • Port range for possible assignment of non-assigned ports: 5000-7000

External ports are required for user administration as well as cluster creation, maintenance, and usage.

Port Usage Access
5000-13050 Kubernetes Dynamic Range external
5000 ODAS REST Server external
11050 Okera Planner Web UI external
10001 Kubernetes Web UI external
10000 Grafana Dashboard UI external
8083 Okera Web UI external
9098 Okera Canary internal
11060 Sentry internal
11061 Okera HMS internal
12050 Planner external
13050 Okera Worker external