Okera Data Access Service Authentication

Every request to a ODAS service performs the following steps:

  1. Authenticates the username.
  2. Looks up the set of groups that the user belongs to.
  3. Using those groups and the permissions database, authorizes the request.

The above sequence occurs regardless of the configured authorization.

User and group management occurs outside of Okera, and that information is accessed via integrations with supported identity services like Active Directory (AD) or LDAP.

A user has the union of permissions of all the groups that they are in. Okera supports multiple methods for authenticating users and for resolving the set of groups that a user belongs to. Details about those methods as well as limitations on which can be used in conjunction follows.

User Authentication

Okera can authenticate a user via:

  1. Kerberos (SASL)
  2. Microsoft Active Directory username and password (SASL)
  3. Tokens, either Okera-managed or JSON Web Tokens (JWT), with multiple ways to authenticate the token (SASL)
  4. OAuth

Okera accepts that multiple methods will be enabled in a typical configuration. For example, batch applications may prefer tokens or Kerberos but end-users may prefer AD or OAuth.

Group Resolution

Currently, Okera resolves the groups that a user belongs to in one of the following ways:

  1. By asking the host machine (i.e. EC2 machine) for the Unix groups for the user. An example would be the output of id <username>. In this case, group names are case-sensitive, as they are in Unix.

  2. By reading the user’s groups from the supplied JWT token. In this case, group names are case insensitive.

  3. By querying the external REST service specified by the OKERA_GROUP_RESOLVER_URL configuration. In this case, group names are case insensitive.

Note: This approach currently assumes that the configured REST endpoint supports GET requests and that the expected format is <url>/<username> .

Approach 2 and/or 3 will used if only JWT support is configured (a system token is specified). If the configuration OKERA_GROUP_RESOLVER_URL is set, then approach 2 will be used with approach 3 used as a fallback (only after approach 2 was attempted). If OKERA_GROUP_RESOLVER_URL is unset, then approach 2 will be used exclusively. In all other situations, including if both JWT and Kerberos are configured, approach 1 will be used for all users, including those that authenticate via a JWT.

Case Sensitivity

In this case, case sensitivity pertains to the process of comparing the names of groups that a user is a member of to the names of groups that were granted a given role. A case-insensitive comparison would result in the names admin, Admin and ADMIN being viewed as equivalent where a case sensitive approach would consider those each unique.

Supported Configurations

Okera uses access service authentication. All Okera services are authenticated. The following mechanisms are used for authentication:

  • Kerberos and Okera Tokens
  • JSON Web Tokens (JWT)

You can configure Kerberos and Okera Tokens alone, use JWTs for all authentication, or configure Kerberos and Okera Tokens along with JWTs.

If both Kerberos and Okera Tokens are configured with JWTs, clients can authenticate with either (1) Kerberos and Okera Tokens or (2) JSON Web Tokens.

Kerberos and Okera Tokens

Clusters configured to use Okera tokens require Kerberos authentication, as Kerberos is used to bootstrap Okera tokens. Token-based authentication is optional, depending on the needs of the client application. We recommend using Kerberos authentication if possible (as with Hadoop integration) and using token-based authentication for non-Kerberized clients (for example, Python).

Note: To get an Okera token, you are required to connect with a Kerberized connection. You cannot get a user token in any other way.

JSON Web Tokens

Okera can use JSON Web Tokens (JWT) for authentication. These tokens can either be generated externally and provided to ODAS, or else ODAS can be configured to work with an external service (via REST) to acquire and validate JWTs. If only JWTs are used for authentication, ODAS also requires that a system token be generated for use in authentication with internal services. This is typically a token with okera as the subject.

Okera supports the standard JWT claims including:

  • sub
  • exp
  • nbf

Okera requires that JWTs have a sub claim. For specifying groups that the token subject is a member of, Okera suggests using the claim groups and storing the associated value as a list of strings.

Example JWT payload:

{
  "sub": "John Doe",
  "iss": "okera.com",
  "groups": [
    "web_user",
    "philatelist",
    "cat_person"
  ],
  "exp": 1590510807
}

Hadoop Integration

When using Hadoop analytics tools, in the mapred-site.xml or yarn-site.xml file, as part of client configurations, set recordservice.kerberos.principal to the value of OKERA_KERBEROS_PRINCIPAL.

Kerberos Authentication

There are multiple clients available for use with Okera’s REST API. As indicated in this document, using curl is a good choice, because it is widely available. However, when possible, it is recommended that a language-specific library be used. As an example, for Python, it is recommended that you use the request_kerberos package.

Testing Authentication

Testing authentication assumes the user has already logged in to Kerberos by way of kinit.

To connect from curl, add --negotiate -u : to the command.

Example: Curl connection to

$ curl --negotiate -u : ODAS_REST_SERVER_HOST:PORT/api/health-authenticated
{
  "health": "ok",
  "token": null,
  "user": "YOU@REALM"
}

Kerberos Principals

Many REST clients, including curl, assume the REST server’s Kerberos principal is: HTTP/<hostname>@REALM. Okera does not have this requirement and can use any hostname as the principal. If you have not configured Okera to have the expected principal, you should specify additional arguments for curl.

The command above should instead be:

$ curl --negotiate -u : --resolve
 <okera_principal_service_host>:<port>:<IP_address_of_REST_server> http://<okera_principal_service_host>/api/health-authenticated

For example, if the Okera service Kerberos principal is HTTP/okera-service@REALM, and the server is using port 7000 on the host with the IP address 1.1.1.1, the connection string would be:

$ curl --negotiate -u : --resolve okera-service:7000:1.1.1.1 http://okera-service:7000/api/health-authenticated

If using the requests-kerberos Python library, this can be achieved by specifying the --hostname_override option. In this example, specify okera-service for the value.

Okera Token Authentication

Okera tokens are suitable when accessing through a client that may not have a Kerberized connection to ODAS. In this case, the user can request a token and make requests using the token. ODAS resolves the token to the user that originally requested it.

The token is used to authenticate all calls to the REST server by additionally providing it to the REST API.

Getting an Okera Token

To get an Okera token, call the get-token REST API.

Note: You must be Kerberos authenticated.

$ curl --negotiate -u : -X POST http://okera-service:7000/api/get-token
{
  "token": "AARub25nABFub25nQENFUkVCUk8uVEVTVIoBWoZGWxKKAVqqUt8SAQI$.pklsqRlTrFFyEPSHVjItxqBrZ28$"
}

You can verify the token with:

Note: This does not required a Kerberized connection.

$ curl -H 'authorization: Bearer AARub25nABFub25nQENFUkVCUk8uVEVTVIoBWoZGWxKKAVqqUt8SAQI$.pklsqRlTrFFyEPSHVjItxqBrZ28$' http://okera-service:7000/api/get-user

This should return your username, among other elements.

{
  "user": "your_username"
}

Using the Token

To use the token, specify the token in the auth header.

For example, to scan data:

$ curl -H 'authorization: Bearer <token>' <ODAS_REST_host:port>/api/scan/<dataset>

For example, to get the databases:

$ curl -H 'authorization: Bearer <token>' <ODAS_REST_host:port>/api/databases

JWT

JWTs can be used in two different approaches:

  1. Providing services with both the public key used to verify the tokens and the algorithm that was used (RSA256, RSA512, etc.).

  2. Configuring two remote endpoints – one for acquiring tokens, the other for validating tokens.

For either of these approaches, if you are using JWT for authenticating communication between services, generate a token with the subject okera that can be read by the method you setup.

For example:

export SYSTEM_TOKEN=/etc/okera.token

Public Key Approach

To configure the public key, the environment variable, JWT_PUBLIC_KEY, should be a full path to the public key.

Note: This key must be in OpenSSL PKCS#8 format.

To configure the algorithm, the environment variable JWT_ALGORITHM must be set to a string indicating the algorithm used. Currently, support algorithms are RSA256 and RSA512.

For example:

export JWT_PUBLIC_KEY=/etc/id_rsa.512.pub
export JWT_ALGORITHM=RSA512

ODAS does support configuring multiple keys to use for validating JWTs passed in by users. This is accomplished by specifying the keys in a comma-delimited list. When a token is passed in, each key will be used to attempt to validate the token, with the token considered valid as soon as one of the specified keys matches.

Note: There must be the same number of algorithms specified as keys and the algorithm order must correspond to the key order

For example:

export JWT_PUBLIC_KEY=/etc/id_rsa.512.pub,/etc/external_vendor.256.pub
export JWT_ALGORITHM=RSA512,RSA256

Remote Endpoint Approach

To configure the external service approach, configure an endpoint for validating tokens remotely by way of the JWT_AUTHENTICATION_SERVER_URL configuration. Optionally, if you want users to be able to acquire tokens (for example, by way of the odb get-token command), you can configure an endpoint that accepts a REST request with the username and password fields in the body. The configuration for the external token-granting endpoint is SSO_URL.

Example: Setting external endpoint environment variables

export JWT_AUTHENTICATION_SERVER_URL=http://10.1.11.153:8900/idp/userinfo.openid
export SSO_URL=http://10.1.11.153:8900/as/token.oauth2

The call from Okera to the REST endpoint is a POST request that passes the JWT to validate as a bearer token and expects JSON as the return value. As a REST call, it looks like this:

curl -X POST -H 'Accept: application/json' -H 'Authorization: Bearer <token>' http://<ip>:<port>/<endpoint>

The return value must contain the key “sub” and the value must be a string that contains the username that corresponds to the passed token. Example:

{
    "sub": "santa@okera.com"
}

Group resolution via REST endpoints

The endpoint for validating JWTs is not expected to return group information. If your environment uses a REST endpoint for that, then that endpoint should be configured via the OKERA_GROUP_RESOLVER_URL environment variable. The call to this endpoint is a GET call where the username (the user to get group membership for) is appended to the configured URL and expects JSON as the return value. As a REST call, it looks like this:

curl -X GET  -H 'Accept: application/json' https://<URL>/<username>

The return value is must be a JSON list of Strings. Here is a representative payload:

{
    "groups": [
        "cat_person",
        "group1",
        "notadmin"
    ]
}

Using Both Approaches

If you have the requirement to support both approaches, configure the environment variables for both, and each is instantiated. The external endpoint is used first. If the JWT is not validated by that service, it is passed to the public-key authenticator for validation.

Using a JWT with Curl

Usage of the JWT token is identical to an Okera token. They can be used interchangeably.

To list the databases via curl, the JWT can be passed in the authorization header.

Example: JWT over curl for listing databases

curl <ODAS_REST_host:port>/api/databases -H 'authorization: Bearer <token>'

Automating Token Management

Token management can be automated. The end-user experience can be simplified by running a script that specifies, by way of the OKERA_TOKEN_RETRIEVAL_SCRIPT environment variable, a method to acquire and refresh tokens. The value associated with the environment variable should be the absolute path to an executable. The executable will be called with no arguments provided. When the executable is run, it must output a single string containing a valid token.

This script is called in any of the following situations:

  1. A query is issued to ODAS when no token is present.
  2. A ODAS query fails due to an expired token.

In either scenario, if the specified script successfully executes, the returned value is placed into a token file in the home directory of the user that ran the query. Subsequent queries use this token. If the token expires, the resulting failure triggers another execution of the configured script. The expired token is replaced with the new, valid token.

EMR

Configuring application environment variables for Amazon’s Elastic MapReduce (EMR) platform can vary between applications, so we offer several scenarios.

In the following examples, assume that the script for acquiring a token is located at /usr/lib/okera/get_token.sh and that the current user has sufficient permission to execute it.

Spark

Export the environment variable in your shell.

export OKERA_TOKEN_RETRIEVAL_SCRIPT=/usr/lib/okera/get_token.sh
spark-shell

Hive

The Hive environment variable should be configured in the file located at /etc/hive/conf.dist/hive-env.sh. Append it to the end of the file.

<lines omitted>
export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}${HIVE_AUX_JARS_PATH:+:}/usr/lib/hive-hcatalog/share/hcatalog
export HADOOP_HEAPSIZE=1000
export USE_HADOOP_SLF4J_BINDING=false
export OKERA_TOKEN_RETRIEVAL_SCRIPT=/usr/lib/okera/get_token.sh

Note: You must restart the Hive Metastore (HMS) service for it to take effect. /usr/lib/okera/restart-hms.sh is installed automatically as part of the EMR bootstrap and can be used to restart HMS.

Presto

Presto requires that the environment variable be exported in the configuration file for the service itself, located at /etc/init/presto-server.conf. Add it to the bottom of the block that looks like this:

...
env CONF_DIR="/etc/presto/conf"
env PIDFILE="/var/run/presto/presto-server.pid"
env WORKING_DIR="/var/lib/presto"
env OKERA_TOKEN_RETRIEVAL_SCRIPT="/usr/lib/okera/get_token.sh"

Note: You must restart the Presto services for it to take effect.