Skip to content

OkeraFS Deployment on AWS S3 (Preview Feature)

This document describes the deployment and use of the OkeraFS access proxy service in AWS S3 environments. The OkeraFS access proxy service can be deployed using one of two modes:

  • Default mode. This is the classic method of deploying OkeraFS in AWS S3 environments. The OkeraFS access proxy service is deployed in the Okera cluster using the Okera cluster system token. This token enables it to authenticate to the Okera cluster and authorized its use of the object store. See OkeraFS Default Mode Deployment.
  • nScale mode. This mode is applicable only in Amazon EMR environments. The OkeraFS access proxy service is deployed in each EMR node, so its workload is distributed across your cluster nodes and scales up and down with your clusters. To do this, the OkeraFS access proxy needs IAM credentials to sign requests bound for AWS S3 from Okera. See OkeraFS nScale Mode Deployment in EMR Environments.

To understand how OkeraFS access permissions map to S3 actions, AWS CLI commands, and Spark actions, see Map Okera Access Permissions.

System Requirements

The following system requirements must be met.

  • For default mode deployment, Okera 2.9 or later must be installed. For nScale mode deployment, Okera 2.11 or later must be installed.

  • When running on EMR, versions 5.2 and 6.1 are supported.

  • The AWS CLI V1 must be installed, using either Python 2.7 or Python 3.4+.

Note: The default port used by OkeraFS is 5010.

S3 Bucket Role Mapping Support

OkeraFS running in default mode supports the ability to assume secondary roles to read S3 data, with different roles mapped to different buckets. For more information, see Amazon S3 Bucket Role Mapping Support.

Note: This is not supported in nScale mode at this time.