Prerequisites

VPC and Subnet

ODAS will run within a VPC and subnet that you configure in your AWS account. It is not required to create a dedicated VPC and subnet for ODAS, but it can be helpful to isolate access. If you do create a dedicated VPC and subnet, you should follow your normal AWS procedures and guidelines for doing so.

S3

ODAS requires an S3 path where it can store audit and operational logs, e.g. s3://company/odas. We will refer to this as the ODAS S3 Storage.

Note

For production use cases, we suggest having encryption and versioning enabled on this path to prevent accidental deletion of data or data leaks.

If you already have a bucket created, you can simply supply the S3 path and ODAS will leverage it. If you do not already have a bucket created, you can create it via the S3 UI.

IAM

ODAS requires three IAM policies:

  1. Read data from your data lake .
  2. Read and write data to the ODAS S3 Storage path.
  3. Read keys from KMS (this is only used when you have server-side encryption configured for your S3 buckets).

First, we will create the IAM policy by navigating to the Create Policy Wizard:

  1. Choose the JSON view
  2. Paste in the following policy, replacing <Your Data Lake Storage> with the path to the root of your data lake and <ODAS S3 Storage> with the location created in the S3 section, and press "Review Policy".

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Okera_Datalake_Readonly",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucketByTags",
                    "s3:GetLifecycleConfiguration",
                    "s3:GetBucketTagging",
                    "s3:GetInventoryConfiguration",
                    "s3:GetObjectVersionTagging",
                    "s3:ListBucketVersions",
                    "s3:GetBucketLogging",
                    "s3:ListBucket",
                    "s3:GetAccelerateConfiguration",
                    "s3:GetBucketPolicy",
                    "s3:GetObjectVersionTorrent",
                    "s3:GetObjectAcl",
                    "s3:GetEncryptionConfiguration",
                    "s3:GetBucketRequestPayment",
                    "s3:GetObjectVersionAcl",
                    "s3:GetObjectTagging",
                    "s3:GetMetricsConfiguration",
                    "s3:HeadBucket",
                    "s3:GetBucketPublicAccessBlock",
                    "s3:GetBucketPolicyStatus",
                    "s3:ListBucketMultipartUploads",
                    "s3:GetBucketWebsite",
                    "s3:GetBucketVersioning",
                    "s3:GetBucketAcl",
                    "s3:GetBucketNotification",
                    "s3:GetReplicationConfiguration",
                    "s3:ListMultipartUploadParts",
                    "s3:GetObject",
                    "s3:GetObjectTorrent",
                    "s3:GetAccountPublicAccessBlock",
                    "s3:ListAllMyBuckets",
                    "s3:GetBucketCORS",
                    "s3:GetAnalyticsConfiguration",
                    "s3:GetObjectVersionForReplication",
                    "s3:GetBucketLocation",
                    "s3:GetObjectVersion"
                ],
                "Resource": [
                    "arn:aws:s3:::<Your Data Lake Storage>"
                ]
            },
            {
                "Sid": "Okera_Storage_ReadWrite",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucketByTags",
                    "s3:GetLifecycleConfiguration",
                    "s3:GetBucketTagging",
                    "s3:GetInventoryConfiguration",
                    "s3:GetObjectVersionTagging",
                    "s3:ListBucketVersions",
                    "s3:GetBucketLogging",
                    "s3:ListBucket",
                    "s3:GetAccelerateConfiguration",
                    "s3:GetBucketPolicy",
                    "s3:GetObjectVersionTorrent",
                    "s3:GetObjectAcl",
                    "s3:GetEncryptionConfiguration",
                    "s3:GetBucketRequestPayment",
                    "s3:GetObjectVersionAcl",
                    "s3:GetObjectTagging",
                    "s3:GetMetricsConfiguration",
                    "s3:HeadBucket",
                    "s3:GetBucketPublicAccessBlock",
                    "s3:GetBucketPolicyStatus",
                    "s3:ListBucketMultipartUploads",
                    "s3:GetBucketWebsite",
                    "s3:GetBucketVersioning",
                    "s3:GetBucketAcl",
                    "s3:GetBucketNotification",
                    "s3:GetReplicationConfiguration",
                    "s3:ListMultipartUploadParts",
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:GetObjectTorrent",
                    "s3:GetAccountPublicAccessBlock",
                    "s3:ListAllMyBuckets",
                    "s3:GetBucketCORS",
                    "s3:GetAnalyticsConfiguration",
                    "s3:GetObjectVersionForReplication",
                    "s3:GetBucketLocation",
                    "s3:GetObjectVersion"
                ],
                "Resource": [
                    "arn:aws:s3:::<ODAS S3 Storage>"
                ]
            },
            {
                "Sid": "Okera_KMS_ReadOnly",
                "Effect": "Allow",
                "Action": [
                    "kms:ListKeys",
                    "kms:Decrypt",
                    "kms:DescribeKey"
                ],
                "Resource": "*"
            }
        ]
    }
    
  3. Choose a name for your policy (e.g. odas-iam-policy) and press "Create Policy".

Once your policy is created, create an IAM Role by navigating to IAM Role Wizard:

  1. Select "AWS Service" as your trusted entity type and "EC2" as the service that will use this role, and press "Next: Permissions".
  2. Select your above created policy (e.g. odas-iam-policy) and press "Next: Tags"
  3. Add any tags you need for this instance and press "Next: Review".
  4. Give your role a name, like odas-iam-role and press "Create Role".

Security Groups

You should create a dedicated security group for ODAS, which will be used to control who can access the ODAS cluster.

For outgoing connections, it should allow all connections for both TCP and UDP.

For incoming connections, it should have the following definitions:

Type Protocol Port Range Source Description
Custom TCP Rule TCP 8083 ODAS Web UI
Custom TCP Rule TCP 8089 ODAS REST API
Custom TCP Rule TCP 12050 ODAS Planner API
Custom TCP Rule TCP 13050 ODAS Worker API
Custom TCP Rule TCP 14050 ODAS Presto/JDBC API
SSH TCP 22 SSH
Custom TCP Rule TCP 12051 ODAS Planner Diagnostics (Optional)
Custom TCP Rule TCP 13051 ODAS Worker Diagnostics (Optional)
Custom TCP Rule TCP 32009 ODAS Diagnostics (Optional)
All traffic All All (e.g. odas-sg) ODAS SG Internal Traffic

Note

The above ports are all configurable in ODAS (e.g. you can expose the Web UI on port 6000), and will require a corresponding change in the security group definition (e.g. you will need to update the security group to allow access on port 6000). You can read more about this in the Ports page.

NOTE: <vpc cidr> refers to the CIDR range that your ODAS cluster should be accessible from. This should encompass all users of the web UI and any client (such as EMR, Databricks, etc) that will need to access ODAS. <admin cidr> refers to the CIDR range that administrative functions should be accessible from, typically via a browser. In many cases, these can be the same.

You can see an example of a security group definition below:

Example Security Group

To create this security group, follow the general steps to create a Security Group, as outlined in the AWS Documentation:

  1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.
  2. In the navigation pane, choose "Security Groups".
  3. Choose "Create Security Group".
  4. Enter a name of the security group, such as odas-sg and provide a description of your choice.
  5. Select the ID of your VPC from the "VPC" menu.
  6. Use the "Add Rule" button to add the above "Inbound" rules that pertain to machines outside of the security group. Replace <vpc cidr> and <admin cidr> with the appropriate values (e.g. 10.0.0.0/8).

EC2 Instances

Instance Types

ODAS deployments are recommended on one of the following instances types (or larger if necessary):

  • t3.2xlarge (for POCs only)
  • m5.2xlarge
  • c5.4xlarge

Supported Operating Systems

ODAS has been tested on:

  • CentOS7
  • Amazon Linux 2

Storage

ODAS requires at least 100GB of storage, using an EBS volume for persistence across restarts.

Database (RDS)

ODAS is backed by a relational database, and we strongly recommend using RDS. ODAS supports MySQL 5.6 and 5.7, in either normal RDS or Aurora RDS configurations.

The RDS instance type should be of at least db.t3.medium.

Note

RDS limits the number of connections to the DB based on the instance type. If you have a large ODAS cluster you may need to increase your DB instance type.

Elastic Kubernetes Service (EKS)

EKS is not required to run ODAS in AWS, but if you do choose to use EKS to run a Kubernetes cluster, there are a few requirements to keep in mind:

  1. The security group for the cluster should be the same as described in the Security Groups section above.
  2. The IAM role for the cluster should be the same as described in the IAM section above.
  3. EKS clusters are often created in dedicated VPCs/subnets, and as such, you should ensure that the EKS cluster nodes have access to your RDS cluster as well.