Prerequisites

Azure AD Application

ODAS requires an Azure AD Application to be registered and granted access to the underlying ADLS storage resources (both Gen1 and Gen2).

To create an Azure AD Application by navigating to the Azure AD Application Registration blade:

  1. Select "New Registration".
  2. Choose a name for your registration, such as odas-azure-app. Leave the rest of the options in their defaults and press "Register".
  3. Store the "Application (client) ID", this will be used later as your "ADLS Client ID".
  4. Store the "Directory (tenant) ID", this will be used later as your "ADLS Tenant ID".
  5. Click on the "Endpoints" button, and save the URL in "OAuth 2.0 token endpoint (v1)", this will be used later as your "ADLS Refresh URL".
  6. Navigate to the "Certificates & secrets" tab, and press the "New client secret" button.
  7. Choose a name for your secret (e.g. odas-azure-secret) and an expiration time of "Never".
  8. Store the value of the generated secret, this will be used later as your "ADLS Client Password".

Note

You can refer to Microsoft documentation for creating Azure AD Applications here: https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal

Granting permissions to ADLS Gen1

Once you have your Azure AD Application, you need to grant it access to your ADLS Gen1 storage.

To do this, navigate to the Data Lake Storage Gen1 blade:

  1. Click the storage account you wish to add access to.
  2. Navigate to the "Access control (IAM)" tab.
  3. Press the "Add" button, selecting "Add role assignment".
  4. Select the "Contributor" role, and search for the name of your Azure AD Application (e.g. odas-azure-app), and press "Save".
  5. Navigate to the "Data Explorer" tab, navigate to the directory to which you want to grant access and press the "Access" button.
  6. In the "Access" modal, press the "Add" button.
  7. and search for the name of your Azure AD Application (e.g. odas-azure-app) and press "Select".
  8. Give Read, Write and Execute permissions, selecting "This folder and all children", and "An access permission entry", pressing "Ok".

Note

You can refer to Microsoft documentation for granting access to ADLS Gen1 here: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data

Azure Data Lake Storage (ADLS)

ODAS requires an ADLS path where it can store audit and operational logs, e.g. adl://<company>.azuredatalakestorage.net/odas. We will refer to this as the ODAS ADLS Storage. If you do not already have ADLS storage created, please refer to Microsoft documentation to create it: https://docs.microsoft.com/en-us/azure/data-lake-store/

Your Azure AD Application you created should have read/write permissions to this path.

Note

ODAS currently has full support for ADLS Gen1 and experimental support for Gen2. Gen2 is only supported for data that is to be queried, and cannot be used for log storage.

Resource Group

You will need a resource group where you will create ODAS resources such as your AKS cluster.

To create a Resource Group, navigate to the Resource Groups blade:

  1. Press the "Add" button.
  2. Choose the subscription to create this resource group in, and put in a name (e.g. odas-azure-rg).
  3. Choose the region you want this resource group to be created in. Note that this should be in the same region as your ADLS storage.
  4. Press the "Next: Tags" button.
  5. Add any tags you need to the resource group, and press the "Next: Review + Create" button.
  6. Review the settings and press the "Create" button.

Database (Azure SQL)

ODAS is backed by a relational database, and we strongly recommend using Azure SQL. ODAS supports MySQL 5.6 and 5.7.

To create an Azure SQL database instance, navigate to the Azure Database for MySQL servers blade:

  1. Press the "Add" button.
  2. Select the same subscription and Resource Group (e.g. odas-azure-rg) that you had previously created.
  3. Choose a name for your database, e.g. odas-azure-sql and an administrator username and password for your database.
  4. Select the "Location" for this DB to be in the same location as the rest of your resources (e.g. US East 2).
  5. Select either 5.6 or 5.7 as your "Version".
  6. For the server resources, you can choose a minimum of 4 cores and 100GB.
  7. Press the "Next: Tags" button.
  8. Add any tags you need to the database, and press the "Next: Review + Create" button.
  9. Review the settings and press the "Create" button.

After the database is created, navigate to the Azure Database for MySQL servers blade:

  1. Select your database, e.g. odas-azure-sql.
  2. Store the "Server name", this will be used later as your "Catalog DB URL".
  3. Store the "Server admin login name", this will be used later as your "Catalog DB User".
  4. Ensure that "SSL Enforce Status" is set to "Enabled".
  5. Navigate to the "Server parameters" tab.
  6. Change the time_zone setting from SYSTEM to +00:00.
  7. Change the wait_timeout setting from 120 to 28800.
  8. Press the "Save" button.

Azure Kubernetes Service (AKS)

ODAS runs on top of Kubernetes, and on Azure, we leverage AKS as our managed Kubernetes runtime.

You should follow Microsoft documentation for creating AKS clusters, which you can find here: https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal For reference, a shortened version of these instructions are available here.

To create an AKS cluster, navigate to the "Kubernetes Services" blade:

  1. Press the "Add" button.
  2. Select the same subscription and Resource Group (e.g. odas-azure-rg) that you had previously created.
  3. Choose a name for your cluster, e.g. odas-azure-aks.
  4. Select the "Region" for this DB to be in the same region as the rest of your resources (e.g. US East 2).
  5. Select the "Kubernetes Version" to use. The minimum supported version is 1.12.7.
  6. For the node size, select one of Standard_B8ms, Standard_D16_v3, Standard_F32s_v2.
  7. For the node count, select the number of nodes you wish to have in your cluster (for production uses we recommend at least 3).
  8. Press the "Next: Node Pools" button.
  9. Select Disabled for both "Virtual Nodes" and "VM Scale Sets".
  10. Press the "Next: Authentication" button.
  11. Leave the defaults selected of creating a new service principal and Enabled for "Enable RBAC".
  12. Press the "Next: Networking" button.
  13. Leave the defaults selected of No for "HTTP Application Routing" and Basic for "Network Configuration".
  14. Press the "Next: Integrations" button.
  15. Leave the defaults selected of Enabled for "Container monitoring" and the default workspace.
  16. Press the "Next: Tags" button.
  17. Add any tags you need to the Kubernetes cluster, and press the "Next: Review + Create" button.
  18. Review the settings and press the "Create" button.

Once the AKS cluster is created, you will need to be able to access it using kubectl from a machine, which we call the deployer node. To do this, install and configure the Azure CLI on that machine, and then execute the following command:

$ az aks get-credentials --resource-group <your resource group> --name <your aks cluster>

Once complete, you should be able to use kubectl normally, e.g.:

$ kubectl get nodes -owide
NAME                       STATUS   ROLES   AGE     VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-agentpool-38801922-0   Ready    agent   5d11h   v1.13.10   10.240.0.35   <none>        Ubuntu 16.04.6 LTS   4.15.0-1052-azure   docker://3.0.6
aks-agentpool-38801922-1   Ready    agent   5d11h   v1.13.10   10.240.0.66   <none>        Ubuntu 16.04.6 LTS   4.15.0-1052-azure   docker://3.0.6
aks-agentpool-38801922-2   Ready    agent   5d11h   v1.13.10   10.240.0.4    <none>        Ubuntu 16.04.6 LTS   4.15.0-1052-azure   docker://3.0.6

You can also see all available Kubernetes contexts:

$ kubectl config get-contexts
CURRENT   NAME                                                  CLUSTER                                               AUTHINFO                                                   NAMESPACE
        Okera-Demo-AKS                                        Okera-Demo-AKS                                        clusterUser_Okera-Demo_Okera-Demo-AKS
*         Okera-Demo-AKS2                                       Okera-Demo-AKS2                                       clusterUser_Okera-Demo2_Okera-Demo-AKS2

VNet Access and Peering

By default, a dedicated VNet was created with your AKS cluster (you can customize this VNet if desired during creation). This VNet is not accessible from the outside world, and the ODAS resources will only be accessible within this VNet.

If you would like to access the resources from a different VNet (e.g. where you may be running other Azure services such as Azure Databricks), you will need to peer the two VNets together. You can read more about VNet peering on the Microsoft documentation: https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview