Skip to content

Authentication and Identity

Every request to an Okera service performs the following steps, regardless of which authorization tool is used:

  1. It authenticates the username.
  2. It looks up the set of groups to which the user belongs.
  3. Using those groups and the permissions database, it authorizes the request.

User and group management occurs outside of Okera, and that information is accessed via integrations with supported identity services such as Active Directory (AD) or LDAP.

A user is granted the permissions for all the groups in which they are included. Okera supports multiple methods for authenticating users and for resolving the set of groups to which a user belongs. Details about these methods, as well as any limitations on the methods that can be used together, are provided below.

User Authentication

Okera can authenticate users using:

  1. Microsoft Active Directory (AD)/LDAP username and password
  2. JSON Web Tokens (JWT)
  3. OAuth Authentication
  4. SAML
  5. Kerberos

Okera accepts that multiple methods may be enabled in a typical configuration. For example, batch applications may prefer JWTs but end users may prefer AD/LDAP or OAuth.

Two-factor authentication is also supported if you use OAuth or SAML to authenticate users and if your identity provider (IdP) is configured for two-factor authentication.

Group Resolution

Currently, Okera resolves the groups to which a user belongs in one of the following ways:

  1. It reads the user's groups from the supplied JWT token. In this case, group names are case insensitive.

  2. It queries an external REST service. In this case, group names are case insensitive.

  3. It queries the configured LDAP server for group membership for the specified user. In this case, group names are case insensitive.

If more than one of these methods are supported at a site, Okera, by default, will use JWT first, followed by LDAP and REST. To customize which are used and the order in which they are used, specify the CUSTOM_GROUP_RESOLVERS configuration setting. Values for this configuration setting are the fully qualified Java paths for the method class names, separated by commas, and in the order you want them used. The currently supported class names are InMemGroupsMapper (for JWT), LdapExtendedGroupsMapper (for LDAP), and RESTGroupMapper (for an external REST service).

For example, when both JWT and LDAP are supported, by default, Okera uses JWT first:

CUSTOM_GROUP_RESOLVERS=com.cerebro.hadoop.InMemGroupsMapper, com.cerebro.hadoop.LdapExtendedGroupsMapper

If you want to change this so LDAP is used first, flip the order of the class paths:

CUSTOM_GROUP_RESOLVERS=com.cerebro.hadoop.LdapExtendedGroupsMapper, com.cerebro.hadoop.InMemGroupsMapper

If you only want to resolve groups from LDAP:

CUSTOM_GROUP_RESOLVERS=com.cerebro.hadoop.LdapExtendedGroupsMapper

Case Sensitivity

Case sensitivity is used when comparing the group names for which a user is a member to the group names granted a given role. A case-insensitive comparison treats the names admin, Admin and ADMIN as equivalent names, whereas a case-sensitive approach treats each of them as unique names.

Required Groups

You can require all users who want to access and use Okera to be in a specific group. This setting is enforced for all users of the cluster, except for Administrator privileges which are exempt from this check.

Group Resolution Using the External REST Service

The call to this endpoint is a GET call where the username (of the user for which you want to provide group membership) is either:

  • Appended to the configured URL and expects JSON as the return value, OR
  • If the GROUP_RESOLVER_URL_FORMAT environment variable is set, then the first instance of the string {0} is replaced with the username in question.

As a REST call, it looks like this:

curl -X GET  -H 'Accept: application/json' https://<URL>/<username>
The return value is a JSON list of strings associated with the key groups. Here is a representative payload:

{
    "groups": [
        "cat_person",
        "group1",
        "notadmin"
    ]
}

Custom Script-Sourced Group Resolution

Using one or more custom scripts, you can perform group name resolution from bespoke systems, such as custom REST APIs or data stores. Group resolution is process that Okera runs during authorization, identifying the groups for a username. The output of a script must be a JSON document with a property groups with values specified as a list of strings that identify the group names to which a user belongs.

Example Script

Here is an example script.

#!/usr/bin/env python3
import json
import sys

USER=sys.argv[1]
result = {
    "groups": [USER, "group1", "group2"],
}
print(json.dumps(result))

This script returns a groups attribute with a list of groups (group1 and group2) to which the user belongs. The user name is passed as a variable to the script.

Configuration

To configure a custom script for group resolutions, you must specify the following configuration property:

GROUP_RESOLVER_SCRIPTS: <path to script 1>,<path to script 2>,...

Note: If multiple scripts are specified, all scripts are executed and the results are merged, with the last listed script having the highest priority.

If you use the Okera Helm chart to configure the script, <path to script> can be a local file, an Amazon S3 path or an ADLS path. Okera properly injects the script contents into the pods as part of their configuration.

If you manually configure Okera (you manually configure the odas-config ConfigMap), then the paths must be paths inside the pod.

By default, Okera only runs scripts in its allowed script directory (default of /opt/scripts), and automatically makes the scripts specified in GROUP_RESOLVER_SCRIPTS available in this directory with the right permissions. You can change the default allow script directory by specifying a different value for the OKERA_SCRIPTS_DIR configuration setting.

Sample GCP Group Resolution Script

Okera provides a sample script called resolve_groups_gcp_example.py to resolve groups in Okera when using Google Cloud Platform (GCP). The script requires that the following configuration parameters be specified in the Okera configuration file.

  • Parameter GROUP_RESOLUTION_GOOGLE_APPLICATION_CREDENTIALS must provide the fully qualified path to a credentials JSON file for a GCP service account with appropriate admin privileges. The path can be a container path (the JSON file is mounted to the container by Kubernetes), an Amazon S3 (s3:) path, or an ADLS (adl:) path.

  • Parameter GSUITE_GROUP_ADMIN_EMAIL must provide the email of a GCP user with appropriate administrative privileges.

  • Parameter GROUP_RESOLVER_SCRIPTS must specify the fully qualified path /opt/scripts/resolve_groups_gcp_example.py.

When all configuration parameters are specified correctly, GCP group resolution is performed for Okera.