Amazon S3 Assume Secondary Role Support¶
When reading data from Amazon S3, Okera supports the ability to assume secondary roles to read data, with different roles for different buckets. This feature is also referred to as bucket role mapping.
For example, you can configure Okera to use role-a
when reading data from s3://bucket-a
and role-b
when reading data from s3://bucket-b
.
To configure this capability, specify the BUCKET_TO_ROLE_MAP_FILE
configuration setting in Okera's configuration yaml
file. The value should be the path to a file (e.g., s3://path/to/mapping.json
or file:///path/to/mapping.json
) that has the following structure:
{
"version": "v1",
"buckets": {
"bucket-a": {
"role": "arn:aws:iam::<account>:role/role-a"
},
"bucket-b": {
"role": "arn:aws:iam::<account>:role/role-b"
}
}
If the BUCKET_TO_ROLE_MAP_FILE
configuration parameter is defined, bucket role mapping is activated and ingests the bucket-to-role map file provided at that path. The following additional configuration parameters also affect processing:
-
OKERA_ASSUME_ROLE_DURATION_SECONDS
determines the duration, in seconds, that assumed role credentials are valid. For the access proxy service (OkeraEnsemble), this default is3600
seconds. For OkeraEnsemble access proxy processing, this default is3600
seconds. For regular Amazon S3 processing, the default is900
seconds. -
GO_ACCESS_PROXY_CACHE_LOG_PERIOD
defines the period, in seconds, at which OkeraEnsemble logs assumed role credential cache statistics. The default is0
(zero) seconds, which disables logging. When this parameter is set to any value greater than zero, logging occurs at the time intervals specified by this parameter.