Skip to content

Presto User Impersonation

Note: This page refers to Presto as a placeholder for all Presto variations, including PrestoDB, PrestoSQL, and Trino. The links provided are for PrestoDB, but the same can be found for PrestoSQL and Trino.

Okera supports the ability to handle operations that impersonate other users. This is implemented by the Okera-provided Presto client library, since the API for the Presto service provider interface (SPI) can separate the user authenticating the query from the user who runs it. Okera requires two things to be enabled for this Presto integration to work correctly:

  • Presto must be configured with a password authenticator that ensures a user running queries is providing valid credentials.

  • The connection to the Presto HTTP endpoint is protected by TLS/SSL, since the password authentication uses the basic HTTP Authentication scheme that transports user credentials as clear text request headers.

Common clients submitting queries to Presto on behalf of the user include analytical applications, such as Tableau and MicroStrategy, as well as generic query tools, like DBeaver or SQL Workbench/J. They use a type-4 JDBC driver, which is a thin driver that runs in the client application and talks HTTP(S) directly to the Presto endpoint.

These clients may or may not have the ability to support user impersonation.

Configuration

On the Okera side, an administrator must configure which users are allowed to impersonate other users. These are commonly non-personal accounts (NPAs), that is, specific service accounts that are issued a username and matching Okera JSON web token (JWT). Their sole purpose is to send Presto requests to Okera on behalf of Presto-authenticated users, reducing the complexity of managing Okera tokens for each separate end-user.

There are two configuration parameters that must be configured on the Okera cluster, in the common section:

OKERA_ALLOW_IMPERSONATION: "true"
OKERA_ALLOWED_DELEGATION_USERS: "presto-service-user"

Important

Be sure to set the values using quoted strings, especially for the Boolean true value. Many tools reading YAML may interpret it as a YAML Boolean, while it should be a string parameter configured for the Okera services to read.

  • The first parameter, OKERA_ALLOW_IMPERSONATION, enables the feature and ensures that a distinction is made internally between a service user sending a query on someone else’s behalf and the authenticating user who owns the query.

  • The second parameter, OKERA_ALLOWED_DELEGATION_USERS, is a comma-separated list of usernames allowed to impersonate other users. Here is where the administrator must configure the Okera cluster with all known and approved service accounts, or NPAs.

After restarting the Okera cluster, a user can use a client that supports sending a separate set of authentication credentials along with the username responsible for the request.

Since the request only contains the originating username but no other user information, it is important to note that the Okera cluster must be configured with a secondary group lookup mechanism, such as LDAP. Otherwise, the user will have no groups associated with them and therefore no Okera roles will apply!

Example: Access via Python JDBC

In the following example client, a simple Python script uses the Presto-supplied JDBC driver. A service user token is generated, using another Python script (not shown here) that is provided with the private key matching the public key configured on the Okera cluster and the service username presto-service-user. The expiry of the token in this example is set to 2160 hours, or 30 days.

$ python3 ~/gen_token.py -k jwt-key.priv -u presto-service-user -g presto-service-user -e 2160
eyJ0eXAiOiJKV1QiLCJhbGciOiJS...jeJrwQHjBzGBM-Lt7RjbC-QQDaggdCS39PxDOoVMv7KkYc
$ TOKEN=eyJ0eXAiOiJKV1QiLCJhbGciOiJS...jeJrwQHjBzGBM-Lt7RjbC-QQDaggdCS39PxDOoVMv7KkYc
$ echo $TOKEN | awk -F. '{print $2}'| base64 -d
{"iss": "gen_token.py", "sub": "presto-service-user", "exp": 1674123165, "groups": ["presto-service-user"]}

For the following script to work, you first must install the Presto JDBC driver for Python:

$ pip3 install --user presto-python-client

Next, configure the Python script itself with the cluster details and the service account credentials, as well as the name of the impersonated user:

import prestodb

HOST='presto.internal.okera.rocks'
PORT=30530
SERVICE_USER='presto-service-user'
SERVICE_USER_TOKEN='eyJ0eXAiOiJKV1QiLCJhbGciOiJSU...RjbC-QQDaggdCS39PxDOoVMv7KkYc'
REQUEST_USER='janedoe'

conn = prestodb.dbapi.connect(
    host=HOST,
    port=PORT,
    user=REQUEST_USER,
    catalog='okera',
    auth=prestodb.auth.BasicAuthentication(SERVICE_USER, SERVICE_USER_TOKEN),
    http_scheme='https')
conn._http_session.verify = True

print('Connected...')
cursor = conn.cursor()
cursor.execute('SELECT * FROM okera_sample.whoami')
print(cursor.fetchall())
cursor.execute('SELECT * FROM okera_sample.sample')
print(cursor.fetchall())

Note how the service user is set using the extra auth parameter. The username for the owner of the query is set with the user parameter.

The following output might be produced when the script is run:

$ python3 presto-test.py
Connected...
[['janedoe']]
[['This is a sample test file.'], ['It should consist of two lines.']]

The third line shows that janedoe is the owner of the request to Presto and can access the example dataset as expected (line 4).

While this script works as expected, it does not further check that the user, janedoe , is allowed to run the query. In practice, it is more common for a client to have its own authentication step. An example is Presto with password authentication enabled.

Audit Logs

When the request is handled on the cluster side, the matching audit logs message show the user as well as the connected_user:

{"request_time":"2022-10-21 11:59:42.492210000","request_id":"84460800d6e49f84:70c12785d00000", \
"session_id":"124cff5b21e5bd9f:8e27832ad4fcc288","start_unix_time":1666353582492, \
"end_unix_time":1666353582565,"auth_failure":false,"status":"ok","user":"janedoe", \
"connected_user":"okera_system_user","client_network_address":"10.1.203.128:55358", \
"client_application":"okera-presto (2.11.0)","client_request_id":"20221021_115942_00003_5sr9m", \
"num_results_returned":1,"num_results_read":0,"peak_memory_usage":0,"bytes_scanned":0, \
"server_total_time_ms":0,"queue_time_ms":0,"planning_time_ms":0,"execution_time_ms":0, \
"statement_type":"PLAN","for_reporting":true,"default_db":"default", \
"statement":"SELECT `record` FROM `okera_sample`.`sample`", \
"rewritten_statement":"SELECT record FROM okera_sample.sample","ae_attribute":"", \
"ae_data_source":"local_fs","ae_database":"okera_sample","ae_function":"","ae_path":"", \
"ae_role":"","ae_table":"okera_sample.sample","ae_view":""}

In this example, they are:

... "user":"janedoe", "connected_user":"okera_system_user", ...

The user is as expected, but perhaps the connected_user’s name is not. This is because the service account, here presto-service-user, is not allowed to do anything but submit the request on the user's behalf. Internally, the cluster uses the configured system token, okera_system_user, to submit the call.