Skip to content

Prerequisites

Azure AD Application

Okera requires an Azure AD Application to be registered and granted access to the underlying ADLS Gen2 storage resources.

To create an Azure AD Application by navigating to the Azure AD Application Registration blade:

  1. Select "New Registration".
  2. Choose a name for your registration, such as okera-azure-app. Leave the rest of the options in their defaults and press "Register".
  3. Store the "Application (client) ID", this will be used later as your "ADLS Client ID".
  4. Store the "Directory (tenant) ID", this will be used later as your "ADLS Tenant ID".
  5. Click on the "Endpoints" button, and save the URL in "OAuth 2.0 token endpoint (v1)", this will be used later as your "ADLS Refresh URL".
  6. Navigate to the "Certificates & secrets" tab, and press the "New client secret" button.
  7. Choose a name for your secret (e.g. okera-azure-secret) and an expiration time of "Never".
  8. Store the value of the generated secret, this will be used later as your "ADLS Client Password".

Note: Refer to Microsoft documentation for creating Azure AD Applications here: https://learn.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal

Granting permissions to ADLS Gen2

After your Azure AD application is created, grant it access to your ADLS Gen2 storage using a Contributor role. See the Microsoft Azure documentation on assigning roles.

Note: Refer to Microsoft documentation for granting access to ADLS Gen2 here: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control-model.

Azure Data Lake Storage (ADLS)

Okera requires an ADLS Gen2 path where it can store audit and operational logs. This is referred to as Okera ADLS Storage. If you do not already have ADLS Gen2 storage created, please refer to Microsoft documentation to create it.

Here are some sample ADLS Gen2 paths:

abfs://<file-system-or-container>@<accountname>.dfs.core.windows.net 

abfss://<file-system-or-container>@<accountname>.dfs.core.windows.net/logs/azure.logs.okera.com/ops/

Your Azure AD application should have read/write permissions to this ADLS Gen2 path. For more information, see Access Azure Data Lake Storage Gen2 and Blob Storage.

Resource Group

Okera requires a resource group in which you create Okera resources such as your AKS cluster.

To create a resource group, navigate to the Resource Groups page in Microsoft Azure:

  1. Press the "Add" button.
  2. Choose the subscription to create this resource group in, and put in a name (e.g. okera-azure-rg).
  3. Choose the region you want this resource group to be created in. Note that this should be in the same region as your ADLS storage.
  4. Press the "Next: Tags" button.
  5. Add any tags you need to the resource group, and press the "Next: Review + Create" button.
  6. Review the settings and press the "Create" button.

Database (Azure SQL)

Okera is backed by a relational database, and we strongly recommend using Azure SQL. Okera supports MySQL 5.6 and 5.7.

To create an Azure SQL database instance, see the Microsoft Azure documentation. Be sure to:

  • Select either 5.6 or 5.7 as your MySQL version.
  • Add any tags you need for the database.

After the database is created, use Azure to review and modify its settings:

  1. Store the "Server name", this will be used later as your "Catalog DB URL".
  2. Store the "Server admin login name", this will be used later as your "Catalog DB User".
  3. Ensure that "SSL Enforce Status" is set to "Enabled".
  4. Navigate to the "Server parameters" tab.
  5. Change the time_zone setting from SYSTEM to +00:00.
  6. Change the wait_timeout setting from 120 to 28800.

Be sure to save your changes.

Azure Kubernetes Service (AKS)

Okera runs on top of Kubernetes, and on Azure, we leverage AKS as our managed Kubernetes runtime.

You should follow Microsoft documentation for creating AKS clusters, which you can find here: https://learn.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal For reference, a shortened version of these instructions are available here.

To create an AKS cluster, navigate to the "Kubernetes Services" blade:

  1. Press the "Add" button.
  2. Select the same subscription and Resource Group (e.g. okera-azure-rg) that you had previously created.
  3. Choose a name for your cluster, e.g. okera-azure-aks.
  4. Select the "Region" for this DB to be in the same region as the rest of your resources (e.g. US East 2).
  5. Select the "Kubernetes Version" to use. The minimum supported version is 1.12.7.
  6. For the node size, select one of Standard_B8ms, Standard_D16_v3, Standard_F32s_v2.
  7. For the node count, select the number of nodes you wish to have in your cluster (for production uses we recommend at least 3).
  8. Press the "Next: Node Pools" button.
  9. Select Disabled for both "Virtual Nodes" and "VM Scale Sets".
  10. Press the "Next: Authentication" button.
  11. Leave the defaults selected of creating a new service principal and Enabled for "Enable RBAC".
  12. Press the "Next: Networking" button.
  13. Leave the defaults selected of No for "HTTP Application Routing" and Basic for "Network Configuration".
  14. Press the "Next: Integrations" button.
  15. Leave the defaults selected of Enabled for "Container monitoring" and the default workspace.
  16. Press the "Next: Tags" button.
  17. Add any tags you need to the Kubernetes cluster, and press the "Next: Review + Create" button.
  18. Review the settings and press the "Create" button.

Once the AKS cluster is created, you will need to be able to access it using kubectl from a machine, which we call the deployer node. To do this, install and configure the Azure CLI on that machine, and then execute the following command:

$ az aks get-credentials --resource-group <your resource group> --name <your aks cluster>

Once complete, you should be able to use kubectl normally, e.g.:

$ kubectl get nodes -owide
NAME                       STATUS   ROLES   AGE     VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-agentpool-38801922-0   Ready    agent   5d11h   v1.13.10   10.240.0.35   <none>        Ubuntu 16.04.6 LTS   4.15.0-1052-azure   docker://3.0.6
aks-agentpool-38801922-1   Ready    agent   5d11h   v1.13.10   10.240.0.66   <none>        Ubuntu 16.04.6 LTS   4.15.0-1052-azure   docker://3.0.6
aks-agentpool-38801922-2   Ready    agent   5d11h   v1.13.10   10.240.0.4    <none>        Ubuntu 16.04.6 LTS   4.15.0-1052-azure   docker://3.0.6

You can also see all available Kubernetes contexts:

$ kubectl config get-contexts
CURRENT   NAME                                                  CLUSTER                                               AUTHINFO                                                   NAMESPACE
        Okera-Demo-AKS                                        Okera-Demo-AKS                                        clusterUser_Okera-Demo_Okera-Demo-AKS
*         Okera-Demo-AKS2                                       Okera-Demo-AKS2                                       clusterUser_Okera-Demo2_Okera-Demo-AKS2

VNet Access and Peering

By default, a dedicated VNet was created with your AKS cluster (you can customize this VNet if desired during creation). This VNet is not accessible from the outside world, and the Okera resources will only be accessible within this VNet.

If you would like to access the resources from a different VNet (e.g. where you may be running other Azure services such as Azure Databricks), you will need to peer the two VNets together. You can read more about VNet peering on the Microsoft documentation: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview.

Time Zone Considerations

If you encounter an error regarding a difference in time zone between Okera and Azure SQL (the error might indicate you should set your time zone to Pacific time), edit the Azure settings and change the time_zone setting to UTC +00:00.

Azure MSQL Time Zone Settings

Note: If you are using a MySQL Hive metastore (HMS) for Okera, you can also specify the time zone using the MySQL connectionTimeZone URL option in the Okera CATALOG_DB_CONN_PARAMS configuration parameter.