Deploying ODAS on EC2

This document will guide you through installing ODAS on EC2 machines. You will walk through the following steps:

  1. Provision an EC2 instance for ODAS.
  2. Deploy a single-node base unconfigured cluster.
  3. Updating the configuration of your cluster.
  4. Joining more instances to your cluster manually.
  5. Creating a Launch Template and Auto-Scaling Group to scale your cluster.

Prerequisites

  1. Required: Security groups
  2. Required: IAM role
  3. Optional: S3 location to store logs (required for multi-node clusters)
  4. Optional: RDS instance

Provisioning an EC2 Instance

In this step we will provision an EC2 instance to install our ODAS cluster on. Our instance will have the following configuration:

  1. Amazon Linux 2 64-bit AMI.
  2. t3.2xlarge instance type.
  3. Use the security group configured in the Prerequisites section.
  4. Have the IAM role configured in the Prerequisites section attached.
  5. Have 120GB EBS volume attached.

To create the instance, navigate to the EC2 Launch Instance wizard:

  1. Choose the Amazon Linux 2 AMI (HVM), SSD Volume Type AMI and press "Select".
  2. Choose the t3.2xlarge instance type and press "Configure Instance Details".
  3. Choose your desired VPC and Subnet (and any other normal EC2 configuration you use).
  4. Select the IAM role created for the ODAS cluster as per the prerequisites section and press "Add Storage".
  5. Configure the volume to be 120GB in size of type gp2 and press "Add Tags".
  6. Add any tags you need for this instance and press "Configure Security Group".
  7. Select the Security Group created for the ODAS cluster as per the prerequisites section and press "Review and Launch".
  8. Ensure all the settings are correct and press "Launch".

Once the instance is created, SSH into the instance. The below instructions are all done within this SSH session. All commands can be run as the default ec2-user unless they explicitly use sudo.

Choosing a Region

Okera provides the installation files in three locations, US West, US East, and EU West. You should modify the links shown below choosing the region closest to you by replacing the S3 base URL, while leaving the rest of the URL path as given in the examples on this page.

The base links for the available regions are:

Region Base URL
US West https://okera-release-uswest.s3-us-west-2.amazonaws.com
US East https://okera-release-useast.s3.amazonaws.com
EU West https://okera-release-euwest.s3.eu-west-2.amazonaws.com

For example, for the download link of the install script (explained in the next section) in the US East region, combine the S3 base URL for the region with the path of the installation resource, including the ODAS version number:

https://okera-release-useast.s3.amazonaws.com/2.1.0/gravity/install.sh

|---------------- Base URL -----------------||---- Resource Path ----|

Deploy a Single-Node Unconfigured Cluster

When you are SSHed onto the machine, to download and run the ODAS installer, run the following command:

$ curl https://okera-release-uswest.s3.amazonaws.com/2.1.0/gravity/install.sh | sh -

The output should look similar to this:

Created /home/ec2-user/okera-2.1.0
Preparing for ODAS v2.1.0 Installation...
Downloading https://okera-release-uswest.s3.amazonaws.com/2.1.0/gravity/odas.tar to /home/ec2-user/okera-2.1.0/odas.tar...
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100 6539M  100 6539M    0     0  41.0M      0  0:02:39  0:02:39 --:--:-- 40.9M
Untarring /home/ec2-user/okera-2.1.0/odas.tar...
Preparing for cluster bootstrap...
2020/05/07 06:09:20 [validate:diskspace] passed
2020/05/07 06:09:20 [validate:mysql] skipped
2020/05/07 06:09:20 [validate:ldap] skipped
2020/05/07 06:09:20 [validate:s3-read/AUTOTAGGER_CONFIGURATION/json] skipped
2020/05/07 06:09:20 [validate:s3-read/JWT_PUBLIC_KEY/pubkey] skipped
2020/05/07 06:09:20 [validate:s3-read/JWT_PRIVATE_KEY/privkey] skipped
2020/05/07 06:09:20 [validate:s3-read/SYSTEM_TOKEN/token] skipped
2020/05/07 06:09:20 [validate:adls-gen1-read/AUTOTAGGER_CONFIGURATION/json] skipped
2020/05/07 06:09:20 [validate:adls-gen1-read/JWT_PUBLIC_KEY/pubkey] skipped
2020/05/07 06:09:20 [validate:adls-gen1-read/JWT_PRIVATE_KEY/privkey] skipped
2020/05/07 06:09:20 [validate:adls-gen1-read/SYSTEM_TOKEN/token] skipped
2020/05/07 06:09:20 [validate:s3-write/WATCHER_S3_REGION/WATCHER_LOG_DST_DIR] skipped
2020/05/07 06:09:20 [validate:s3-write/WATCHER_S3_REGION/WATCHER_AUDIT_LOG_DST_DIR] skipped
2020/05/07 06:09:20 [validate:adls-gen1-write/WATCHER_LOG_DST_DIR] skipped
2020/05/07 06:09:20 [validate:adls-gen1-write/WATCHER_AUDIT_LOG_DST_DIR] skipped
2020/05/07 06:09:20 [validate:local-dns] passed
2020/05/07 06:09:20 Created 'cluster-config.yaml'
2020/05/07 06:09:20 Created 'create.cmd'
2020/05/07 06:09:20 Created 'join.cmd'
2020/05/07 06:09:20 Created 'join.asg.cmd'
2020/05/07 06:09:20 If you are creating a new cluster, you can execute:
2020/05/07 06:09:20
2020/05/07 06:09:20   cd /home/ec2-user/okera-2.1.0 && sudo ./gravity install --token=kymPnYuvEW --advertise-addr=10.1.10.168 --cloud-provider=generic --config=cluster-config.yaml --pod-network-cidr="172.23.0.0/16" --service-cidr="172.34.0.0/16"
2020/05/07 06:09:20
2020/05/07 06:09:20 If you later want to join a new node to this cluster, you can execute the following on that node:
2020/05/07 06:09:20
2020/05/07 06:09:20   sudo ./gravity join --token=kymPnYuvEW --role=worker 10.1.10.168 --cloud-provider=generic
2020/05/07 06:09:20
2020/05/07 06:09:20 You can reference 'create.cmd', 'join.cmd' and 'join.asg.cmd' for future use

Note

The install.sh script simply downloads odas.tar from the same location, unpacks it and runs the installation commands. You can look at the contents of the script and run them manually if preferred.

This has prepared this node to have ODAS installed on it, as well outputted instructions for adding more nodes to this cluster (which we will use in later sections).

As instructed by the above output, we can now install the cluster. Note that the following command is copy and pasted from the earlier output and will look different for your installation:

$ cd /home/ec2-user/okera-2.1.0 && sudo ./gravity install --token=kymPnYuvEW \
    --advertise-addr=10.1.10.168 --cloud-provider=generic --config=cluster-config.yaml \
    --pod-network-cidr="172.23.0.0/16" --service-cidr="172.34.0.0/16"

Note

If you did not save the original command, you can always retrieve it by going to the installation directory and finding it in the create.cmd file. You can also then simply execute $ ./create.cmd to start the cluster creation process.

The output should be similar to this (slightly abbreviated):

Thu May  7 06:10:25 UTC Starting installer

To abort the installation and clean up the system,
press Ctrl+C two times in a row.

If the you get disconnected from the terminal, you can reconnect to the installer
agent by issuing 'gravity resume' command.

If the installation fails, use 'gravity plan' to inspect the state and
'gravity resume' to continue the operation.
See https://gravitational.com/gravity/docs/cluster/#managing-an-ongoing-operation for details.

Thu May  7 06:10:25 UTC Connecting to installer
Thu May  7 06:10:39 UTC Connected to installer
Thu May  7 06:10:40 UTC Successfully added "master" node on 10.1.10.168
Thu May  7 06:10:40 UTC All agents have connected!
Thu May  7 06:10:41 UTC Operation has been created
Thu May  7 06:10:42 UTC Executing "/checks" locally
Thu May  7 06:10:42 UTC Execute preflight checks
Thu May  7 06:10:42 UTC Running pre-flight checks
Thu May  7 06:10:45 UTC Executing "/configure" locally
Thu May  7 06:10:45 UTC Configuring cluster packages
Thu May  7 06:10:45 UTC Configure packages for all nodes
Thu May  7 06:10:49 UTC Executing "/bootstrap/ip-10-1-10-168.us-west-2.compute.internal" locally
Thu May  7 06:10:49 UTC Bootstrap master node ip-10-1-10-168.us-west-2.compute.internal
Thu May  7 06:10:50 UTC Configuring system directories
Thu May  7 06:10:52 UTC Configuring application-specific volumes
Thu May  7 06:10:54 UTC Executing "/pull/ip-10-1-10-168.us-west-2.compute.internal" locally
Thu May  7 06:10:54 UTC Pulling user application
Thu May  7 06:10:54 UTC Pull packages on master node ip-10-1-10-168.us-west-2.compute.internal
Thu May  7 06:11:04 UTC         Still pulling user application (10 seconds elapsed)
...
Thu May  7 06:11:36 UTC Pulling configured packages
Thu May  7 06:11:44 UTC Unpacking pulled packages
Thu May  7 06:11:54 UTC         Still unpacking pulled packages (10 seconds elapsed)
Thu May  7 06:12:03 UTC Install system software on master nodes
Thu May  7 06:12:04 UTC Executing "/masters/ip-10-1-10-168.us-west-2.compute.internal/teleport" locally
Thu May  7 06:12:04 UTC Installing system service teleport:3.2.7
Thu May  7 06:12:04 UTC Install system package teleport:3.2.7 on master node ip-10-1-10-168.us-west-2.compute.internal
Thu May  7 06:12:06 UTC Executing "/masters/ip-10-1-10-168.us-west-2.compute.internal/planet" locally
Thu May  7 06:12:06 UTC Install system package odas-planet:6.1.3-2.1.0-planet on master node ip-10-1-10-168.us-west-2.compute.internal
Thu May  7 06:12:06 UTC Installing system service odas-planet:6.1.3-2.1.0-planet
Thu May  7 06:12:16 UTC         Still installing system service odas-planet:6.1.3-2.1.0-planet (10 seconds elapsed)
Thu May  7 06:12:25 UTC Executing "/wait" locally
Thu May  7 06:12:25 UTC Wait for Kubernetes to become available
Thu May  7 06:12:35 UTC         Still executing "/wait" locally (10 seconds elapsed)
Thu May  7 06:12:39 UTC Executing "/rbac" locally
Thu May  7 06:12:40 UTC Creating Kubernetes RBAC resources
Thu May  7 06:12:40 UTC Bootstrap Kubernetes roles and PSPs
Thu May  7 06:12:42 UTC Executing "/coredns" locally
Thu May  7 06:12:42 UTC Configure CoreDNS
Thu May  7 06:12:43 UTC Configuring CoreDNS
Thu May  7 06:12:43 UTC Executing "/system-resources" locally
Thu May  7 06:12:43 UTC Create system Kubernetes resources
Thu May  7 06:12:44 UTC Configuring system Kubernetes resources
Thu May  7 06:12:44 UTC Executing "/user-resources" locally
Thu May  7 06:12:45 UTC Creating user-supplied Kubernetes resources
Thu May  7 06:12:45 UTC Create user-supplied Kubernetes resources
Thu May  7 06:12:46 UTC Executing "/export/ip-10-1-10-168.us-west-2.compute.internal" locally
Thu May  7 06:12:46 UTC Populate Docker registry on master node ip-10-1-10-168.us-west-2.compute.internal
Thu May  7 06:12:47 UTC Unpacking application rbac-app:6.1.4
Thu May  7 06:12:47 UTC Exporting application rbac-app:6.1.4 to local registry
Thu May  7 06:12:47 UTC Unpacking application dns-app:0.3.0
Thu May  7 06:12:48 UTC Exporting application dns-app:0.3.0 to local registry
Thu May  7 06:12:49 UTC Unpacking application bandwagon:6.0.1
Thu May  7 06:12:50 UTC Exporting application bandwagon:6.0.1 to local registry
Thu May  7 06:12:51 UTC Unpacking application logging-app:6.0.2
Thu May  7 06:12:52 UTC Exporting application logging-app:6.0.2 to local registry
Thu May  7 06:12:55 UTC Unpacking application monitoring-app:6.0.4
Thu May  7 06:12:57 UTC Exporting application monitoring-app:6.0.4 to local registry
Thu May  7 06:13:04 UTC Unpacking application tiller-app:6.0.0
Thu May  7 06:13:04 UTC Exporting application tiller-app:6.0.0 to local registry
Thu May  7 06:13:05 UTC Unpacking application site:6.1.4
Thu May  7 06:13:06 UTC Exporting application site:6.1.4 to local registry
Thu May  7 06:13:07 UTC Unpacking application odas:1.0.0-2.1.0
Thu May  7 06:13:17 UTC         Still unpacking application odas:1.0.0-2.1.0 (10 seconds elapsed)
...
Thu May  7 06:14:19 UTC Exporting application odas:1.0.0-2.1.0 to local registry
Thu May  7 06:14:29 UTC         Still exporting application odas:1.0.0-2.1.0 to local registry (10 seconds elapsed)
...    
Thu May  7 06:15:18 UTC Executing "/health" locally
Thu May  7 06:15:18 UTC Waiting for the planet to start
Thu May  7 06:15:18 UTC Wait for cluster to pass health checks
Thu May  7 06:15:19 UTC Executing "/runtime/dns-app" locally
Thu May  7 06:15:19 UTC Install system application dns-app:0.3.0
Thu May  7 06:15:20 UTC Executing install hook for dns-app:0.3.0
Thu May  7 06:15:30 UTC         Still executing install hook for dns-app:0.3.0 (10 seconds elapsed)
Thu May  7 06:15:34 UTC Executing "/runtime/logging-app" locally
Thu May  7 06:15:34 UTC Executing install hook for logging-app:6.0.2
Thu May  7 06:15:34 UTC Install system application logging-app:6.0.2
Thu May  7 06:15:40 UTC Executing "/runtime/monitoring-app" locally
Thu May  7 06:15:40 UTC Executing install hook for monitoring-app:6.0.4
Thu May  7 06:15:40 UTC Install system application monitoring-app:6.0.4
Thu May  7 06:15:50 UTC         Still executing install hook for monitoring-app:6.0.4 (10 seconds elapsed)
Thu May  7 06:15:56 UTC Executing "/runtime/tiller-app" locally
Thu May  7 06:15:56 UTC Install system application tiller-app:6.0.0
Thu May  7 06:15:57 UTC Executing install hook for tiller-app:6.0.0
Thu May  7 06:16:07 UTC         Still executing install hook for tiller-app:6.0.0 (10 seconds elapsed)
Thu May  7 06:16:11 UTC Executing "/runtime/site" locally
Thu May  7 06:16:11 UTC Executing install hook for site:6.1.4
Thu May  7 06:16:11 UTC Install system application site:6.1.4
Thu May  7 06:16:21 UTC         Still executing install hook for site:6.1.4 (10 seconds elapsed)
...
Thu May  7 06:17:40 UTC Executing postInstall hook for site:6.1.4
Thu May  7 06:17:50 UTC         Still executing postInstall hook for site:6.1.4 (10 seconds elapsed)
Thu May  7 06:18:00 UTC         Still executing postInstall hook for site:6.1.4 (20 seconds elapsed)
Thu May  7 06:18:01 UTC Executing "/runtime/kubernetes" locally
Thu May  7 06:18:01 UTC Install system application kubernetes:6.1.4
Thu May  7 06:18:02 UTC Executing "/app/odas" locally
Thu May  7 06:18:02 UTC Install application odas:1.0.0-2.1.0
Thu May  7 06:18:03 UTC Executing install hook for odas:1.0.0-2.1.0
Thu May  7 06:18:13 UTC         Still executing install hook for odas:1.0.0-2.1.0 (10 seconds elapsed)
...
Thu May  7 06:18:47 UTC Executing "/connect-installer" locally
Thu May  7 06:18:47 UTC Connect to installer
Thu May  7 06:18:48 UTC Connecting to installer
Thu May  7 06:18:50 UTC Executing "/election" locally
Thu May  7 06:18:50 UTC Enable leader elections
Thu May  7 06:18:50 UTC Enable cluster leader elections
Thu May  7 06:18:51 UTC Executing "/gravity-resources" locally
Thu May  7 06:18:51 UTC Create user-supplied Gravity resources
Thu May  7 06:18:52 UTC Creating user-supplied cluster resources
Thu May  7 06:18:53 UTC Executing operation finished in 8 minutes
Thu May  7 06:18:54 UTC Operation has completed
Thu May  7 06:18:54 UTC Installation succeeded in 8m13.47619093s
Thu May  7 06:18:57 UTC
Cluster endpoints:
    * Authentication gateway:
        - 10.1.10.168:32009
    * Cluster management URL:
        - https://10.1.10.168:32009

Application endpoints:
    * odas:1.0.0-2.1.0:
        - ODAS Presto:
            - https://10.1.10.168:14050
        - ODAS Planner:
            - tcp://10.1.10.168:12050
        - ODAS Worker:
            - tcp://10.1.10.168:13050
        - ODAS REST:
            - http://10.1.10.168:8089
            - http://10.1.10.168:8083

Congratulations!
The cluster is up and running. Please take a look at "cluster management" section:
https://gravitational.com/gravity/docs/cluster/

After around 10-15 minutes, the command will complete. When it does, your cluster is doing it's final set up steps and should be ready within 2-3 minutes. You can see when your cluster is ready by running:

$ ./okctl status
ready

Once the cluster is ready, you can use the okctl endpoints command to find the URL for the webui:

$ ./okctl endpoints
2020/05/07 09:51:25 cdas-rest-server:api (type: NodePort)
2020/05/07 09:51:25   10.1.10.168:8089
2020/05/07 09:51:25 cdas-rest-server:webui (type: NodePort)
2020/05/07 09:51:25   10.1.10.168:8083
2020/05/07 09:51:25 cerebro-planner:planner (type: NodePort)
2020/05/07 09:51:25   10.1.10.168:12050
2020/05/07 09:51:25 cerebro-worker:worker (type: NodePort)
2020/05/07 09:51:25   10.1.10.168:13050
2020/05/07 09:51:25 presto-coordinator:api (type: NodePort)
2020/05/07 09:51:25   10.1.10.168:14050

Open your browser at the address specified under cdas-rest-sever:webui, in this example http://10.1.10.168:8083. Since this is an unconfigured cluster, there is no authentication and you can type in root to log in to the web UI as an administrator.

Configuring Your ODAS Cluster

ODAS clusters use a YAML configuration file, which you can learn more about here.

You can update the configuration of your cluster using okctl update.

To deploy the Quickstart configuration, which will add authentication and SSL:

$ ./okctl update --config configs/config-quickstart.yaml

You can copy this file (or the other example configuration files in the configs/ directory) and modify them for your deployment, and then apply them using okctl update.

Growing your ODAS Cluster Manually

You can join more nodes to your cluster by provisioning additional EC2 instances (follow the same EC2 instructions). Once the instance is created, run the following command to download the minimal installer:

$ curl -O https://okera-release-uswest.s3.amazonaws.com/2.1.0/gravity/gravity

Once downloaded, we can use the command printed during the original cluster step to join our node to the cluster:

$ chmod +x gravity
$ sudo ./gravity join --token=kymPnYuvEW --role=worker 10.1.10.168 --cloud-provider=generic

Note

If you did not save the original command, you can always retrieve it by going to the installation directory and finding it in the join.cmd file.

The output should be similar to this:

Thu May  7 09:58:19 UTC Starting agent

To abort the agent and clean up the system,
press Ctrl+C two times in a row.

If the you get disconnected from the terminal, you can reconnect to the installer
agent by issuing 'gravity resume' command.
See https://gravitational.com/gravity/docs/cluster/#managing-an-ongoing-operation for details.

Thu May  7 09:58:19 UTC Connecting to agent
Thu May  7 09:58:20 UTC Connected to agent
Thu May  7 09:58:20 UTC Connecting to cluster
Thu May  7 09:58:20 UTC Connected to existing cluster at 10.1.10.168
Thu May  7 09:58:21 UTC Executing "/configure" locally
Thu May  7 09:58:21 UTC Configuring cluster packages
Thu May  7 09:58:25 UTC Executing "/bootstrap" locally
Thu May  7 09:58:26 UTC Configuring system directories
Thu May  7 09:58:27 UTC Configuring application-specific volumes
Thu May  7 09:58:27 UTC Executing "/pull" locally
Thu May  7 09:58:27 UTC Pulling user application
Thu May  7 09:58:37 UTC         Still pulling user application (10 seconds elapsed)
...
Thu May  7 09:59:26 UTC Pulling configured packages
Thu May  7 09:59:32 UTC Unpacking pulled packages
Thu May  7 09:59:42 UTC         Still unpacking pulled packages (10 seconds elapsed)
Thu May  7 09:59:50 UTC Executing "/system/teleport" locally
Thu May  7 09:59:50 UTC Installing system service teleport:3.2.7
Thu May  7 09:59:51 UTC Executing "/system/planet" locally
Thu May  7 09:59:51 UTC Installing system service odas-planet:6.1.3-2.1.0-planet
Thu May  7 10:00:01 UTC         Still installing system service odas-planet:6.1.3-2.1.0-planet (10 seconds elapsed)
Thu May  7 10:00:10 UTC Executing "/wait/planet" locally
Thu May  7 10:00:10 UTC Waiting for the planet to start
Thu May  7 10:00:20 UTC         Still waiting for the planet to start (10 seconds elapsed)
...
Thu May  7 10:00:50 UTC Executing "/wait/k8s" locally
Thu May  7 10:00:50 UTC Waiting for the Kubernetes node to register
Thu May  7 10:00:50 UTC Executing "/elect" locally
Thu May  7 10:00:50 UTC Enabling leader elections
Thu May  7 10:00:51 UTC Operation completed

Growing your ODAS Cluster using Auto-Scaling Groups

Besides manually scaling your cluster, you can also configure an Auto-Scaling Group to automatically scale your cluster up and down.

We first create a Launch Configuration by going to the Launch Configuration Wizard:

  1. Choose the Amazon Linux 2 AMI (HVM), SSD Volume Type AMI and press "Select".
  2. Choose the t3.2xlarge instance type and press "Configure Details".
  3. Give your Launch Configuration a name (e.g. odas-launch-config).
  4. Expand "Advanced Details", and in the "User Data" section, put the contents of the join.asg.cmd file from your cluster install directory.
  5. Select the IAM role created for the ODAS cluster as per the prerequisites section and press "Add Storage".
  6. Configure the volume to be 120GB in size of type gp2 and press "Configure Security Group".
  7. Select the Security Group created for the ODAS cluster as per the prerequisites section and press "Review".
  8. Ensure all the settings are correct and press "Create Launch Configuration".

Once your Launch Configuration is created, we can create an Auto-Scaling Group that leverages it, navigating to the Auto-Scaling Group Wizard:

  1. Select your previously created Launch Configuration (e.g. odas-launch-config).
  2. Give your ASG a name, e.g. odas-asg and your initial desired number of instances (1 is fine to start with).
  3. Choose which VPC and subnet to create this ASG in. This should be the same as your original cluster node. Press "Configure Scaling Policies".
  4. Leave the default option of "Keep this group at its initial size" selected and press "Configure Notifications".
  5. Don't create any notifications, and press "Configure Tags".
  6. Add any tags you need for this instance and press "Review".
  7. Ensure all the settings are correct and press "Create Auto Scaling Group".

The number of instanced you selected as your initial desired number will be created and automatically joined to your ODAS cluster. You can grow and shrink the cluster as your scaling needs change. You may also put in scaling policies (e.g. based on CPU utilization) to automatically scale your cluster up or down.