Prerequisites¶
Azure AD Application¶
Okera requires an Azure AD Application to be registered and granted access to the underlying ADLS Gen2 storage resources.
To create an Azure AD Application by navigating to the Azure AD Application Registration blade:
- Select "New Registration".
- Choose a name for your registration, such as
okera-azure-app
. Leave the rest of the options in their defaults and press "Register". - Store the "Application (client) ID", this will be used later as your "ADLS Client ID".
- Store the "Directory (tenant) ID", this will be used later as your "ADLS Tenant ID".
- Click on the "Endpoints" button, and save the URL in "OAuth 2.0 token endpoint (v1)", this will be used later as your "ADLS Refresh URL".
- Navigate to the "Certificates & secrets" tab, and press the "New client secret" button.
- Choose a name for your secret (e.g.
okera-azure-secret
) and an expiration time of "Never". - Store the value of the generated secret, this will be used later as your "ADLS Client Password".
Note: Refer to Microsoft documentation for creating Azure AD Applications here: https://learn.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal
Granting permissions to ADLS Gen2¶
After your Azure AD application is created, grant it access to your ADLS Gen2 storage using a Contributor role. See the Microsoft Azure documentation on assigning roles.
Note: Refer to Microsoft documentation for granting access to ADLS Gen2 here: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control-model.
Azure Data Lake Storage (ADLS)¶
Okera requires an ADLS Gen2 path where it can store audit and operational logs. This is referred to as Okera ADLS Storage
.
If you do not already have ADLS Gen2 storage created, please refer to Microsoft documentation to create it.
Here are some sample ADLS Gen2 paths:
abfs://<file-system-or-container>@<accountname>.dfs.core.windows.net
abfss://<file-system-or-container>@<accountname>.dfs.core.windows.net/logs/azure.logs.okera.com/ops/
Your Azure AD application should have read/write permissions to this ADLS Gen2 path. For more information, see Access Azure Data Lake Storage Gen2 and Blob Storage.
Resource Group¶
Okera requires a resource group in which you create Okera resources such as your AKS cluster.
To create a resource group, navigate to the Resource Groups page in Microsoft Azure:
- Press the "Add" button.
- Choose the subscription to create this resource group in, and put in a name (e.g.
okera-azure-rg
). - Choose the region you want this resource group to be created in. Note that this should be in the same region as your ADLS storage.
- Press the "Next: Tags" button.
- Add any tags you need to the resource group, and press the "Next: Review + Create" button.
- Review the settings and press the "Create" button.
Database (Azure SQL)¶
Okera is backed by a relational database, and we strongly recommend using Azure SQL. Okera supports MySQL 5.6 and 5.7.
To create an Azure SQL database instance, see the Microsoft Azure documentation. Be sure to:
- Select either
5.6
or5.7
as your MySQL version. - Add any tags you need for the database.
After the database is created, use Azure to review and modify its settings:
- Store the "Server name", this will be used later as your "Catalog DB URL".
- Store the "Server admin login name", this will be used later as your "Catalog DB User".
- Ensure that "SSL Enforce Status" is set to "Enabled".
- Navigate to the "Server parameters" tab.
- Change the
time_zone
setting fromSYSTEM
to+00:00
. - Change the
wait_timeout
setting from120
to28800
.
Be sure to save your changes.
Azure Kubernetes Service (AKS)¶
Okera runs on top of Kubernetes, and on Azure, we leverage AKS as our managed Kubernetes runtime.
You should follow Microsoft documentation for creating AKS clusters, which you can find here: https://learn.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal For reference, a shortened version of these instructions are available here.
To create an AKS cluster, navigate to the "Kubernetes Services" blade:
- Press the "Add" button.
- Select the same subscription and Resource Group (e.g.
okera-azure-rg
) that you had previously created. - Choose a name for your cluster, e.g.
okera-azure-aks
. - Select the "Region" for this DB to be in the same region as the rest of your resources (e.g. US East 2).
- Select the "Kubernetes Version" to use. The minimum supported version is
1.12.7
. - For the node size, select one of
Standard_B8ms
,Standard_D16_v3
,Standard_F32s_v2
. - For the node count, select the number of nodes you wish to have in your cluster (for production uses we recommend at least 3).
- Press the "Next: Node Pools" button.
- Select
Disabled
for both "Virtual Nodes" and "VM Scale Sets". - Press the "Next: Authentication" button.
- Leave the defaults selected of creating a new service principal and
Enabled
for "Enable RBAC". - Press the "Next: Networking" button.
- Leave the defaults selected of
No
for "HTTP Application Routing" andBasic
for "Network Configuration". - Press the "Next: Integrations" button.
- Leave the defaults selected of
Enabled
for "Container monitoring" and the default workspace. - Press the "Next: Tags" button.
- Add any tags you need to the Kubernetes cluster, and press the "Next: Review + Create" button.
- Review the settings and press the "Create" button.
Once the AKS cluster is created, you will need to be able to access it using kubectl
from a machine, which we call the deployer
node.
To do this, install and configure the Azure CLI on that machine, and then execute the following command:
$ az aks get-credentials --resource-group <your resource group> --name <your aks cluster>
Once complete, you should be able to use kubectl
normally, e.g.:
$ kubectl get nodes -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-agentpool-38801922-0 Ready agent 5d11h v1.13.10 10.240.0.35 <none> Ubuntu 16.04.6 LTS 4.15.0-1052-azure docker://3.0.6
aks-agentpool-38801922-1 Ready agent 5d11h v1.13.10 10.240.0.66 <none> Ubuntu 16.04.6 LTS 4.15.0-1052-azure docker://3.0.6
aks-agentpool-38801922-2 Ready agent 5d11h v1.13.10 10.240.0.4 <none> Ubuntu 16.04.6 LTS 4.15.0-1052-azure docker://3.0.6
You can also see all available Kubernetes contexts:
$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
Okera-Demo-AKS Okera-Demo-AKS clusterUser_Okera-Demo_Okera-Demo-AKS
* Okera-Demo-AKS2 Okera-Demo-AKS2 clusterUser_Okera-Demo2_Okera-Demo-AKS2
VNet Access and Peering¶
By default, a dedicated VNet was created with your AKS cluster (you can customize this VNet if desired during creation). This VNet is not accessible from the outside world, and the Okera resources will only be accessible within this VNet.
If you would like to access the resources from a different VNet (e.g. where you may be running other Azure services such as Azure Databricks), you will need to peer the two VNets together. You can read more about VNet peering on the Microsoft documentation: https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview.
Time Zone Considerations¶
If you encounter an error regarding a difference in time zone between Okera and Azure SQL (the error might indicate you should set your time zone to Pacific time), edit the Azure settings and change the time_zone
setting to UTC +00:00
.
Note: If you are using a MySQL Hive metastore (HMS) for Okera, you can also specify the time zone using the MySQL
connectionTimeZone
URL option in the OkeraCATALOG_DB_CONN_PARAMS
configuration parameter.