Tutorial: Administration - How to Use the Okera Auth Server¶

This self-service tutorial guides you through an administrative task: how to use the Okera-provided authserver executable to issue JWTs for client applications and services.

Difficulty: Advanced
Time needed: 4 hours

Introduction¶

Okera uses tokens, and more specifically JSON Web Tokens (JWTs), to authenticate internal services as well as external clients. These JWTs comply with the following rules:

The JWT contains known and optional claims that state, for example, the subject for which the token is issued (using the sub claim) and the expiration time of the token (in the exp claim).
The JWT is signed by a private key, which is part of an asymmetric key pair.
The matching public key is configured on the Okera cluster as one of the keys used to verify incoming token-based requests.

If the token can be validated and has not yet expired, Okera uses the subject claim as the implicit username for all further processing of the current request.

How these tokens are generated is flexible since they can be issued close to the originating system. However, care should be taken to protect the sensitive material (that is, the private key and the generated tokens). See the Managing System Tokens tutorial for information on how this is handled for cluster-internal communication. This tutorial also provides a simple script that can be used to generate a JWT.

Some clients use different forms of authentication:

Single-sign-on (SSO) mechanisms, such as SAML or OAuth, are provided by a dedicated service provider, such as Okta or Ping Identity.
An enterprise directory, such as Microsoft Active Directory (AD), can store users and their passwords. This application uses an LDAP Bind operation to verify the user-provided password matches the stored one.

These other authentication options raise an important problem: sometimes the user authenticates in the client with either:

No token provided by the Identity and Access Management (IAM) service
A token that may contain claims that do not comply with what Okera expects.

Should this happen, the client must acquire a compliant JWT using different means. One of those means is using the Okera-provided authserver binary executable.

Downloading the `authserver` Binary¶

Okera provides its released software in Amazon Web Service (AWS) Simple Storage Service (S3) buckets. These buckets are maintained in three geographical regions, US West, US East, and EU West. The following table shows the base links for the available Okera regions.

Region	Base URL
US West	`https://okera-release-uswest.s3-us-west-2.amazonaws.com`
US East	`https://okera-release-useast.s3.amazonaws.com`
EU West	`https://okera-release-euwest.s3.eu-west-2.amazonaws.com`

To access the authserver binary, modify these base links and append the Okera Version and /authserver/authserver-linux. For example, for the download link of the authserver binary in the US East region, combine the Amazon S3 base URL for the region with the path of the resource, including the Okera version number:

https://okera-release-useast.s3.amazonaws.com/2.18.1/authserver/authserver-linux

|---------------- Base URL -----------------||------- Resource Path ------|

Run the following commands to create a directory and then download the authserver binary into it:

$ mkdir authserver && cd authserver
$ curl -O https://okera-release-uswest.s3.amazonaws.com/2.18.1/authserver/authserver-linux

Note: Okera provides the authserver binary only as a Linux/Unix executable and linkable format (ELF) binary. This tutorial assumes that the tool is run in a suitable environment.

Starting the binary with the --help parameter produces the following output:

$ ./authserver-linux --help
usage: authserver --private-key=PRIVATE-KEY [<flags>]

ODAS Auth Server.

Flags:
      --help                     Show context-sensitive help (also try --help-long and --help-man).
  -p, --port=5001                Port to bind to
  -d, --directory="/tmp/tokens"  Directory to place tokens in
  -k, --private-key=PRIVATE-KEY  Path to private key
  -a, --algorithm=rsa512         Signing algorithm
      --disable-delete           Disable automatic key garbage collection
  -g, --group="okera"            Group to chown to

Auth Server Sample Setup¶

After the downloaded binary works as described above, create a key pair for testing. Use ssh-keygen, which is often available in terminals or can easily be installed:

$ ssh-keygen -b 2048 -f ./keypair -t rsa -m PEM
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in ./keypair
Your public key has been saved in ./keypair.pub
The key fingerprint is:
SHA256:lufzX/KJVP1EDGOW8smTTXxink++2D1pVeV0mF0xKKU lars@DESKTOP-I4MRDRU
The key's randomart image is:
+---[RSA 2048]----+
|            ..*O+|
|           .o+*+X|
|           E.* X*|
|         .    O.*|
|        S .    *+|
|       . o    ..=|
|          o  .+ B|
|           o...Xo|
|            .oo.o|
+----[SHA256]-----+

$ ls -l keypair*
-rw------- 1 lars lars 1831 Nov 11 15:30 keypair
-rw-r--r-- 1 lars lars  402 Nov 11 15:30 keypair.pub

Using this key pair (keypair in the example above), the Auth Server can be started:

$ authserver/bin/authserver-linux --private-key=keypair --group=$(id -Gn | awk '{print $1}')
2022/11/11 15:35:04 Listening on 127.0.0.1:5001 - writing to '/tmp/tokens'

In another terminal, use curl to generate a token and print it:

$ TOKEN=$(curl -s http://foo:bar@localhost:5001/janedoe) && cat $TOKEN
2022/11/11 15:46:33 Generated token: /tmp/tokens/janedoe.1668177993995201518.token
eyJhbGciOiJSUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJqYW5lZG9lIiwiaXNzIjoiMTkyLjE2OC4xOTUuMTE1IiwiZXhwIjoxNjY4MjY0MzkzfQ.i-hV4QbjIfs1Bzz2wGU0HXUlqWjPt-TQt2D2QSxu62fqn2dS079T_aCcV-uHfQy9nLdAFWGaGR-cprTEzRkvdQHaoREJB5i4pEl7MnHzYI_gVcfi6SWYFMRbZLHHcD2LFG3rE8iZ7MZtHY8lLeglaUhQI2TLbCQxZkQVlotRCbQwNCuKt7jcMo0iamyymLQa5ckH0kFdNoz0eEVDp7cGlTANJ4c-YcoPYX1fWnn-0lUxkQ6pvrkb4pmNJ5SBVnBesd0vy6GRNfDbjjoUj7gojhsiju0-DYJoROy1OWbuJ8zM1993GQ3yXnlbeDL9o8bxw2kUKlDBUGrf6XYnHZd1eA

Soon after (about 10 seconds later), the first terminal running the Auth Server should log something like this:

2022/11/11 15:46:44 Deleted token: /tmp/tokens/janedoe.1668177993995201518.token

Some items in this example that require explanation:

When starting the Auth Server in our example, a --group parameter was specified that was set to retrieve the current user's main group ID (GID). This is needed to identify the group to which ownership should be changed (as described in the --help output earlier). The default is okera and will not likely match what is available in the terminal.
The Auth Server does not return the generated token, but the path to where it was saved, here /tmp/tokens/janedoe.1668177993995201518.token. The directory can be modified using the --directory parameter.
After about 10 seconds, the Auth Server deleted the generated token file. You can disable this from happening using the --disable-delete parameter.
The Auth Server is bound to the loopback device, that is 127.0.0.1 on the default port 5001. Although you can modify the port using the --port parameter, the interface to which the server binds cannot be changed.

The generated token shows that the sub claim was set to the user (janedoe) that presented to the Auth Server as the first path component:

$ echo eyJhbGciOiJSUzUxMiIsI...8bxw2kUKlDBUGrf6XYnHZd1eA | awk -F. '{print $2}' | base64 -d
{"sub":"janedoe","iss":"192.168.195.115","exp":1668264393}

$ TZ=CET date -d @1668264393
Sat Nov 12 15:46:33 CET 2022

The expiration time is hard-coded to 24 hours and cannot be changed.
Finally, there is no optional groups claim, since the Auth Server does not have access to this information.

Auth Server Use Cases¶

When and where is it useful to use the Auth Server? As mentioned earlier, there are situations in which an Okera client application must generate an Okera-compliant JWT based on generic authentication details from its users. The Auth Server can do that, but it has specific restrictions which require that architectural decisions be made. These are:

The Auth Server must be run on the same node as the client using it since they must communicate via the loopback device.
Generating a token is a two-step process: calling the Auth Server and then retrieving the token content from a shared, local storage location.

One place in which the Auth Server is used is in the Okera Amazon EMR integration: The --authserver option of the bootstrap script, plus some configuration parameters, set up the Auth Server on each Amazon EMR node so that the Okera client libraries can call it before invoking an RPC to the configured Okera cluster.

Another use case is to employ the Auth Server in containerized services, where the Auth Server is configured as a sidecar container that is called by the main service container. This works because containers in Kubernetes pods share the loopback device, even if they otherwise run isolated. They also can share a mount that can be used to materialize the token as a file and can subsequently fetch its content.

Containerizing the Auth Server Binary¶

Wrapping the Auth Server in a container is straightforward. It requires creating the following directory structure:

authserver/
├── bin/
│   ├── authserver-linux
│   └── entrypoint.sh
└── Dockerfile

The steps are:

Copy the Auth Server binary into the bin/ directory.
Create two text files, Dockerfile in the root, and entrypoint.sh in the bin/ directory.

Edit the Dockerfile and add the following:

FROM alpine:3.16.2

ENV PORT=5001
ENV TOKEN_DIR=/tmp/tokens
ENV SECRET_DIR=/etc/secrets/
ENV PRIVATE_KEY_FILE=${SECRET_DIR}/JWT_PRIVATE_KEY
ENV ALGORITHM=rsa512
ENV GID=okera

RUN set -x \
    && apk update \
    && apk --no-cache add \
        libc6-compat \
        curl \
    && adduser -D -g '' -s /sbin/nologin -u 1000 ${GID}

COPY bin/* /usr/local/bin/
EXPOSE ${PORT}
USER ${GID}
ENTRYPOINT ["entrypoint.sh"]

This configures environment variables, installs required packages, adds a user okera (with group okera, matching the default of the Auth Server), copies in the binaries, and then eventually calls the entrypoint.sh script as user okera.

The entrypoint.sh script contains this:

#!/bin/sh

authserver-linux -p $PORT -d $TOKEN_DIR -k $PRIVATE_KEY_FILE -a $ALGORITHM -g $GID

The script passes on the environment variables to the Auth Server binary and runs it.

Use the common Docker CLI commands to build and, eventually, push the container image into a repository:

Tip

Testing of this was done using kind, which can run Kubernetes in a local Docker container. If interested, follow the installation guide and use the provided Local Registry script to start a kind-based cluster along with a local registry container.
```
$ docker build -t localhost:5001/authserver:v0.1 authserver
$ docker push localhost:5001/authserver:v0.1
```

Example Service Code¶

Next, an example service is created that fulfills the following requirements:

Listens on an HTTP port for authenticated request
Creates a JWT using a local Auth Server endpoint
Using PyOkera, calls a configured Okera cluster using the JWT and retrieves the list of databases available to the authenticated service user

First, the directory structure needed looks like this:

myservice/
├── src/
│   └── service.py
├── requirements.txt
└── Dockerfile

For ease, the service is written in Python:

import sys
import os
import getopt
from http.server import BaseHTTPRequestHandler, HTTPServer
from base64 import b64decode
import requests
import json
from okera import context

hostName = os.environ.get('HOSTNAME', '0.0.0.0')
serverPort = int(os.environ.get('PORT', 5010))
tokenServiceHost = os.environ.get('TOKEN_SERVICE_HOST', 'localhost')
tokenServicePort = int(os.environ.get('TOKEN_SERVICE_PORT', 5001))
plannerHost = os.environ.get('PLANNER_HOST')
plannerPort = int(os.environ.get('PLANNER_PORT', 12050))

def get_okera_token(user):
    print("Getting token...")
    response = requests.get("http://%s:%s/%s" % (tokenServiceHost, tokenServicePort, user))
    if response.status_code == 200:
        print("Got token in", response.text)
        with open(response.text) as f:
            token = ''.join(f.readlines())
        print("Got token data: ", token[:20], "...")
        return token
    else:
        print("An error (%d) occurred getting token: %s" % (response.status_code, response.text))
        return None


def get_user_from_auth(auth):
    payload = auth.split(' ')[1] # e.g. "Basic dGVzdDp0ZXN0"
    decoded = b64decode(payload).decode('utf-8')
    return decoded.split(':')[0]


def list_databases(auth):
    ctx = context()
    user = get_user_from_auth(auth)
    token = get_okera_token(user)
    ctx.enable_token_auth(token_str=token)
    with ctx.connect(host=plannerHost, port=plannerPort) as conn:
        dbs = conn.list_databases()
    return dbs


class MyServer(BaseHTTPRequestHandler):
    def do_GET(self):
        if 'authorization' in self.headers:
            try:
                print(self.headers)
                dbs = list_databases(self.headers['authorization'])
            except Exception as e:
                print("ERROR:", e)
                self.send_response(500)
                self.send_header("Content-type", "text/plain")
                self.end_headers()
                self.wfile.write(bytes("Internal Server Error!", "utf-8"))
                return
            self.send_response(200)
            self.send_header("Content-type", "text/json")
            self.end_headers()
            self.wfile.write(bytes(json.dumps(dbs), "utf-8"))
        else:
            self.send_response(401)
            self.send_header("Content-type", "text/plain")
            self.end_headers()
            self.wfile.write(bytes("Unauthorized!", "utf-8"))


def main(argv):
    webServer = HTTPServer((hostName, serverPort), MyServer)
    print("Server started http://%s:%s" % (hostName, serverPort))

    try:
        webServer.serve_forever()
    except KeyboardInterrupt:
        pass

    webServer.server_close()
    print("Server stopped.")


# Main entrypoint
if __name__ == "__main__":
    main(sys.argv[1:])

Roughly, the parts of this Python script are:

The main() function starts the HTTP server with a request handler that reacts to GET requests via the do_GET() function.
The handler checks that the caller provides an HTTP Basic Authentication header, which looks like this:
```
Authorization: Bearer <base64_encoded_username:password>
```
If one is provided, the handler invokes the list_databases() function, triggering the subsequent calls.
The get_user_from_auth() function decodes the encoded username (ignoring any further checks, for the sake of simplicity).
The get_okera_token() is called with the username to call the configured Auth Server endpoint to request a token.
The token is loaded via a shared location in the file system.
The rest of the list_databases() function makes a call to the configured Okera cluster and returns the list of visible databases to the handler.
The handler returns the list or an error if something is wrong.

The requirements.txt contains just one line:

pyokera

And the Dockerfile assembles the container image:

FROM python:3.10.5 AS builder

COPY requirements.txt /requirements.txt
RUN python3 -m venv /venv && \
    /venv/bin/pip install --disable-pip-version-check -r /requirements.txt

FROM python:3.10.5-alpine3.16 AS runner

ENV PORT=5010

COPY --from=builder /venv /venv
COPY ./src /app

RUN apk add --no-cache \
    libstdc++ \
    gcompat \
    curl

EXPOSE ${PORT}

CMD [ "/venv/bin/python", "-u", "/app/service.py" ]

This uses the principles explained in the PyOkera in a Container tutorial, which first installs PyOkera using a full Python image, and then builds a light-weight image with an Alpine-based base image. Again, the script is copied, required libraries installed, and the service is started.

The image is built (and pushed) in a similar manner as done earlier:

$ docker build -t localhost:5001/myservice:v0.1 service
$ docker push localhost:5001/myservice:v0.1

Running in Kubernetes¶

Tying all of this together is the following Kubernetes manifest file, called k8s.yaml:

apiVersion: v1
kind: Secret
metadata:
  name: authentication
type: Opaque
data:
  # $ cat keypair | base64 -w0
  JWT_PRIVATE_KEY: LS0tLS1CRUdJTiBSU0EgUFJJVk...S1FTkQgUlNBIFBSSVZBVEUgS0VZLS0tLS0K
---
apiVersion: v1
kind: Pod
metadata:
  name: myservice
  labels:
    run: myservice
spec:
  restartPolicy: Never

  volumes:
  - name: secrets
    secret:
      defaultMode: 420
      secretName: authentication
  - name: shared-data
    emptyDir: {}

  containers:
  - name: authserver
    image: localhost:5001/authserver:v0.1
    imagePullPolicy: Always
    env:
    - name: PORT
      value: "5002"
    - name: TOKEN_DIR
      value: "/usr/share/myservice/tokens"
    volumeMounts:
    - mountPath: /etc/secrets
      name: secrets
      readOnly: true
    - name: shared-data
      mountPath: /usr/share/myservice/tokens

  - name: myservice
    image: localhost:5001/myservice:v0.1
    imagePullPolicy: Always
    env:
    - name: PORT
      value: "5010"
    - name: PLANNER_HOST
      value: "okera.demo.com"
    - name: PLANNER_PORT
      value: "12050"
    - name: TOKEN_SERVICE_PORT
      value: "5002"
    - name: PYTHONUNBUFFERED
      value: "1"
    volumeMounts:
    - name: shared-data
      mountPath: /usr/share/myservice/tokens

Note: Replace the value for the JWT_PRIVATE_KEY key, shown shortened above, with the value matching the Okera setup. See the note in the YAML file on how to convert the PEM-based key into a Base64 encoded string. Also adjust the PLANNER_HOST setting and any port that is different.

This manifest file does the following:

It creates a secret that contains the private key you created earlier.
It sets up a volume for the secret to be mounted into the containers.
It creates a volume for sharing the token files created by the Auth Server.
It creates two pods, one for the Auth Server and the other for the example service.

The magic is in these containers sharing the localhost and the shared-data volume. Otherwise, they are configured with the proper environment variables to make them work properly.

The manifest is applied using kubectl apply (assuming kubectl was configured to the proper Kubernetes cluster and any namespace details were added, if used):

$ kubectl apply -f k8s.yaml
...
$ kubectl get pods
NAME        READY   STATUS    RESTARTS   AGE
myservice   2/2     Running   0          10s

Testing the Service¶

Test the service to see if it can call the Okera cluster to retrieve the list of databases without any further details. Since the service is not further exposed locally, it is easiest to use the kubectl exec command with an invocation to curl that calls the example service using test and the username and test as the password:

$ kubectl exec -it myservice -c myservice -- curl http://test:test@myservice:5010
["default", "okera_sample", "okera_system", "okera_udfs"]

This produces the expected details from the Okera metadata, proving this setup works successfully. Give the test user permissions to additional databases in Okera and then run the command above again to test it further.

Conclusion¶

The source code of this tutorial is available in the authserver-example repository. It contains a script called run.sh in the root of the project that strings together the minimal commands to get everything running. It requires that the following tools be installed and accessible:

kind (see the notes earlier)
kubectl

kind automatically configures kubectl to talk to the local cluster.

Alternatively, the kubectl commands can run against any other Kubernetes cluster, though likely not in the default namespace used here for the sake of simplicity. Provisioning the example Docker containers also may require access to a shared container image registry.

In summary, using the Auth Server is an option for anyone who requires a separate binary to provide Okera-compliant JWTs. How it is integrated into an actual setup may vary and what is shown here is one example, which, lends itself nicely to the way Kubernetes orchestrates applications.