Skip to content

Dynamic Bypass Querying on Amazon S3 (Preview Feature)

Important

This feature impacts the production of file level access audit logs, as it relies on bypassing the planner for authorization when accessing individual files. Table level and all non-bypass access events will be unaffected.

This document introduces dynamic bypass querying (DBQ), the new query-execution mode for OkeraEnsemble nScale clusters. DBQ selectively bypasses in-data plane operations when the requesting user has full table metadata access. Bypassing has been shown to produce faster query times when users have full access to the tables being queried. OkeraEnsemble can now scale along side your EMR cluster with minimal impact to ODAS.

Once enabled, bypass is able to dynamically route incoming queries according to the user's privileges without any additional configuration. As noted, the bypass functionality is only applied when the requesting user has full table metadata access. Otherwise, in-data plane processing will be used. Privileged queries will now be routed through to the access proxy instead of directly to the planner.

To adapt to this new traffic pattern, several read-through caches have been added to the Access Proxy. The most notable of these caches is the user permission cache. When running in bypass mode, the Access Proxy will now temporarily cache file authorization results to avoid duplicate authorization processing. More information can be found in the caching deep dive section.

Configuring Bypass in AWS EMR

In order to preview the Dynamic Bypass Query functionality, you'll need to make some slight modifications to the existing OkeraEnsemble nScale deployment configuration. An additional permission will need to be granted to the roles that will be used. Some additional parameters will also need to be added to the spark configurations.

Role Permission Configuration

At this current time, table level permissions cannot be directly translated into URI permissions for DBQ queries. This means that to leverage the bypass feature, both table level and its associated bucket URI need to be granted to the requesting user's role. Removing the need for the redundant permission grant will be included as part of the official rollout of the feature in future versions. Here's an example of how to set both of those permissions:

GRANT ALL ON TABLE nytaxi.parquet TO ROLE role;
GRANT ALL ON URI 's3://bucket/' TO ROLE role;

The following sections review the various combinations of permissions and the expected execution mode they yield. In this context, Full indicates that a user has full table metadata access and Restricted indicates there are conditionals on the user's access to this resource.

Single Resource Grants

This section assumes no other permissions have been granted to the requesting user besides the one indicated.

Level Granted Bypass Disabled Bypass Enabled
Database
Full Collocated X
Restricted Collocated Collocated
Table
Full Collocated X
Restricted Collocated Collocated
URI
Full X X
Restricted X X


Database Level Grants

This assumes the user has been granted at least some level of database access in conjunction with the listed permission.

DB Grant URI Path Access Level Bypass Disabled Bypass Enabled
Full
None - Collocated X
Bucket
ALL Collocated Bypass
SELECT Collocated X
Table
ALL Collocated X
SELECT Collocated X
Restricted
None - Collocated Collocated
Bucket
ALL Collocated Bypass
SELECT Collocated X
Table
ALL Collocated X
SELECT Collocated X


Table Level Grants

This assumes the user has been granted at least some level of table access in conjunction with the listed permission. Something to note here is that the URI permission granted to the requesting user needs to be on the bucket the table is stored in, not the table location itself.

DB Grant URI Path Access Level Bypass Disabled Bypass Enabled
Full
None - Collocated X
Bucket
ALL Collocated Bypass
SELECT Collocated X
Table
ALL Collocated X
SELECT Collocated X
Restricted
None - Collocated Collocated
Bucket
ALL Collocated Bypass
SELECT Collocated X
Table
ALL Collocated X
SELECT Collocated X


EMR Cluster Configuration

First, be sure to have familiarized yourself with deploying OkeraEnsemble nScale clusters. Enabling DBQ only requires a few additional parameters to be passed when creating your EMR cluster.

Configuring Spark

Add these additional parameters while creating your spark configurations in step 2:

  • add "okera.hms.enable-dynamic-bypass":"true" to spark-hive-site.xml and hive-site.xml

  • add "okera.hive.allow-original-metadata-on-all-tables":"true" to hive-site.xml

Here's an example configuration of an nScale OkeraEnsemble cluster with bypass enabled:

[
    {
      "Classification": "hive-site",
      "Properties": {
        "okera.hms.enable-dynamic-bypass": "true",
        "okera.hive.allow-original-metadata-on-all-tables": "true",
        "cerebro.hms.log-to-stdout": "true",
        "hive.fetch.task.conversion": "minimal",
        "hive.metastore.rawstore.impl": "com.cerebro.hive.metastore.CerebroObjectStore",
        "okera.hms.log-detailed-metadata": "true",
        "recordservice.planner.hostports": "10.0.0.1:12050",
        "recordservice.workers.local-port": "13050"
      }
    }, {
      "Classification": "core-site",
      "Properties": {
        "fs.s3a.aws.credentials.provider": "com.okera.recordservice.hadoop.OkeraCredentialsProvider",
        "fs.s3a.connection.ssl.enabled": "false",
        "fs.s3a.endpoint": "localhost:5010",
        "fs.s3a.path.style.access": "true",
        "fs.s3a.s3.client.factory.impl": "com.okera.recordservice.hadoop.OkeraS3ClientFactory",
        "okerafs.default.region": "us-west-2",
        "recordservice.token-provisioner": "http://10.0.0.1:8083",
        "recordservice.workers.local-port": "13050"
      }
    }, {
      "Classification": "spark-defaults",
      "Properties": {
        "spark.extraListeners": "com.okera.recordservice.spark.OkeraSparkListener",
        "spark.recordservice.planner.hostports": "10.0.0.1:12050",
        "spark.recordservice.workers.local-port": "13050"
      }
    }, {
      "Classification": "spark-hive-site",
      "Properties": {
        "okera.hms.enable-dynamic-bypass": "true",
        "cerebro.hms.log-to-stdout": "true",
        "okera.hms.log-detailed-metadata": "true",
        "recordservice.planner.hostports": "10.0.0.1:12050",
        "recordservice.workers.local-port": "13050"
      }
    }
  ]

Configuring Bootstrap Script

As part of step 4, add the following DBQ specific flags to the Okera bootstrap script's arguments.

--enable-bypass
--ap-user-permission-ttl <ttl in seconds, optional - defaults to 1 hour> 
The --enable-bypass flag enables the bypass on the access proxy instances running inside the external workers. The --ap-user-permission-ttl flag can also be passed to set the ttl on the access proxy's user permission cache in seconds.

Access Proxy Caching Deep Dive

There are three main stages of file authorization in the Access Proxy:
1. Verify the request
2. Authorize the request
3. Generate a signed URL with proper credentials

As part of the bypass effort, in-memory caches have been added to each of these stages. As these caches are in-memory, restarting/recreating these pods will clear the saved state.

Access Proxy Caches The Access Proxy viewed in isolation with respect to the ODAS cluster. When running in nScale mode the Access Proxy is deployed inside the external workers.


Request Verification Cache

This cache is responsible for verifying the credentials of the requesting user. The cache's TTL is based on the expiry defined at the creation of the user's credentials. To change the expiry of the credentials being generated, set the environment variable GO_REST_AWS_TOKEN_EXPIRY to the desired TTL when deploying the Rest Server.

Request Authorization (User Permission) Cache

This cache is responsible for storing recently seen file authorization results. The results are cached based on user, action, and storage location provided by the associated URI permission. When a file operation request comes into the Access Proxy, it will look for the widest scope of permissions granted to the user first. If the search exhausts the filepath, it will make an authorize file request to the Planner and then populate the cache only if the user has authorized access.

A common concern is how this cache will perform when cold or used in ephemeral clusters. Something to remember is that this cache is not being indexed on a specific file or query, but instead uses the storage path of the table. All the file level operations performed by spark would immediately populate the cache.

Execution of query The cold startup cache here would only remain cold for the first few file operations or until the in-memory cache propagates that the requesting user has access to the URI `s3://nytaxi/parquet/`. That propagation is usually <~500ms.


Assumed Role Cache

This cache is manages the credentials of the assumed roles that are to be used when resigning the proxied request with valid AWS credentials. These credentials can be bucket specific and can be defined as part of Okera's assume secondary role feature. These credentials are generated by assuming the associated role and come with a session expiration which is used as the cache's TTL. These values can be fine tuned as described here.