Skip to content

Amazon S3 Assume Secondary Role Support

When reading data from Amazon S3, Okera supports the ability to assume secondary roles to read data, with different roles for different buckets. This feature is also referred to as bucket role mapping. For example, you can configure Okera to use role-a when reading data from s3://bucket-a and role-b when reading data from s3://bucket-b.

To configure this capability, specify the BUCKET_TO_ROLE_MAP_FILE configuration setting in Okera's configuration yaml file. The value should be the path to a file (e.g., s3://path/to/mapping.json or file:///path/to/mapping.json) that has the following structure:

{
  "version": "v1",
  "buckets": {
    "bucket-a": {
      "role": "arn:aws:iam::<account>:role/role-a"
    },
    "bucket-b": {
      "role": "arn:aws:iam::<account>:role/role-b"
    }
}

If the BUCKET_TO_ROLE_MAP_FILE configuration parameter is defined, bucket role mapping is activated and ingests the bucket-to-role map file provided at that path. The following additional configuration parameters also affect processing:

  • OKERA_ASSUME_ROLE_DURATION_SECONDS determines the duration, in seconds, that assumed role credentials are valid. For the access proxy service (OkeraEnsemble), this default is 3600 seconds. For OkeraEnsemble access proxy processing, this default is 3600 seconds. For regular Amazon S3 processing, the default is 900 seconds.

  • GO_ACCESS_PROXY_CACHE_LOG_PERIOD defines the period, in seconds, at which OkeraEnsemble logs assumed role credential cache statistics. The default is 0 (zero) seconds, which disables logging. When this parameter is set to any value greater than zero, logging occurs at the time intervals specified by this parameter.