Prerequisites¶
Azure AD Application¶
ODAS requires an Azure AD Application to be registered and granted access to the underlying ADLS storage resources (both Gen1 and Gen2).
To create an Azure AD Application by navigating to the Azure AD Application Registration blade:
- Select "New Registration".
- Choose a name for your registration, such as
odas-azure-app
. Leave the rest of the options in their defaults and press "Register". - Store the "Application (client) ID", this will be used later as your "ADLS Client ID".
- Store the "Directory (tenant) ID", this will be used later as your "ADLS Tenant ID".
- Click on the "Endpoints" button, and save the URL in "OAuth 2.0 token endpoint (v1)", this will be used later as your "ADLS Refresh URL".
- Navigate to the "Certificates & secrets" tab, and press the "New client secret" button.
- Choose a name for your secret (e.g.
odas-azure-secret
) and an expiration time of "Never". - Store the value of the generated secret, this will be used later as your "ADLS Client Password".
Note
You can refer to Microsoft documentation for creating Azure AD Applications here: https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal
Granting permissions to ADLS Gen1¶
Once you have your Azure AD Application, you need to grant it access to your ADLS Gen1 storage.
To do this, navigate to the Data Lake Storage Gen1 blade:
- Click the storage account you wish to add access to.
- Navigate to the "Access control (IAM)" tab.
- Press the "Add" button, selecting "Add role assignment".
- Select the "Contributor" role, and search for the name of your Azure AD Application (e.g.
odas-azure-app
), and press "Save". - Navigate to the "Data Explorer" tab, navigate to the directory to which you want to grant access and press the "Access" button.
- In the "Access" modal, press the "Add" button.
- and search for the name of your Azure AD Application (e.g.
odas-azure-app
) and press "Select". - Give Read, Write and Execute permissions, selecting "This folder and all children", and "An access permission entry", pressing "Ok".
Note
You can refer to Microsoft documentation for granting access to ADLS Gen1 here: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data
Azure Data Lake Storage (ADLS)¶
ODAS requires an ADLS path where it can store audit and operational logs, e.g. adl://<company>.azuredatalakestorage.net/odas
.
We will refer to this as the ODAS ADLS Storage
.
If you do not already have ADLS storage created, please refer to Microsoft documentation to create it: https://docs.microsoft.com/en-us/azure/data-lake-store/
Your Azure AD Application you created should have read/write permissions to this path.
Note
ODAS currently has full support for ADLS Gen1 and experimental support for Gen2. Gen2 is only supported for data that is to be queried, and cannot be used for log storage.
Resource Group¶
You will need a resource group where you will create ODAS resources such as your AKS cluster.
To create a Resource Group, navigate to the Resource Groups blade:
- Press the "Add" button.
- Choose the subscription to create this resource group in, and put in a name (e.g.
odas-azure-rg
). - Choose the region you want this resource group to be created in. Note that this should be in the same region as your ADLS storage.
- Press the "Next: Tags" button.
- Add any tags you need to the resource group, and press the "Next: Review + Create" button.
- Review the settings and press the "Create" button.
Database (Azure SQL)¶
ODAS is backed by a relational database, and we strongly recommend using Azure SQL. ODAS supports MySQL 5.6 and 5.7.
To create an Azure SQL database instance, navigate to the Azure Database for MySQL servers blade:
- Press the "Add" button.
- Select the same subscription and Resource Group (e.g.
odas-azure-rg
) that you had previously created. - Choose a name for your database, e.g.
odas-azure-sql
and an administrator username and password for your database. - Select the "Location" for this DB to be in the same location as the rest of your resources (e.g. US East 2).
- Select either
5.6
or5.7
as your "Version". - For the server resources, you can choose a minimum of 4 cores and 100GB.
- Press the "Next: Tags" button.
- Add any tags you need to the database, and press the "Next: Review + Create" button.
- Review the settings and press the "Create" button.
After the database is created, navigate to the Azure Database for MySQL servers blade:
- Select your database, e.g.
odas-azure-sql
. - Store the "Server name", this will be used later as your "Catalog DB URL".
- Store the "Server admin login name", this will be used later as your "Catalog DB User".
- Ensure that "SSL Enforce Status" is set to "Enabled".
- Navigate to the "Server parameters" tab.
- Change the
time_zone
setting fromSYSTEM
to+00:00
. - Change the
wait_timeout
setting from120
to28800
. - Press the "Save" button.
Azure Kubernetes Service (AKS)¶
ODAS runs on top of Kubernetes, and on Azure, we leverage AKS as our managed Kubernetes runtime.
You should follow Microsoft documentation for creating AKS clusters, which you can find here: https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal For reference, a shortened version of these instructions are available here.
To create an AKS cluster, navigate to the "Kubernetes Services" blade:
- Press the "Add" button.
- Select the same subscription and Resource Group (e.g.
odas-azure-rg
) that you had previously created. - Choose a name for your cluster, e.g.
odas-azure-aks
. - Select the "Region" for this DB to be in the same region as the rest of your resources (e.g. US East 2).
- Select the "Kubernetes Version" to use. The minimum supported version is
1.12.7
. - For the node size, select one of
Standard_B8ms
,Standard_D16_v3
,Standard_F32s_v2
. - For the node count, select the number of nodes you wish to have in your cluster (for production uses we recommend at least 3).
- Press the "Next: Node Pools" button.
- Select
Disabled
for both "Virtual Nodes" and "VM Scale Sets". - Press the "Next: Authentication" button.
- Leave the defaults selected of creating a new service principal and
Enabled
for "Enable RBAC". - Press the "Next: Networking" button.
- Leave the defaults selected of
No
for "HTTP Application Routing" andBasic
for "Network Configuration". - Press the "Next: Integrations" button.
- Leave the defaults selected of
Enabled
for "Container monitoring" and the default workspace. - Press the "Next: Tags" button.
- Add any tags you need to the Kubernetes cluster, and press the "Next: Review + Create" button.
- Review the settings and press the "Create" button.
Once the AKS cluster is created, you will need to be able to access it using kubectl
from a machine, which we call the deployer
node.
To do this, install and configure the Azure CLI on that machine, and then execute the following command:
$ az aks get-credentials --resource-group <your resource group> --name <your aks cluster>
Once complete, you should be able to use kubectl
normally, e.g.:
$ kubectl get nodes -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-agentpool-38801922-0 Ready agent 5d11h v1.13.10 10.240.0.35 <none> Ubuntu 16.04.6 LTS 4.15.0-1052-azure docker://3.0.6
aks-agentpool-38801922-1 Ready agent 5d11h v1.13.10 10.240.0.66 <none> Ubuntu 16.04.6 LTS 4.15.0-1052-azure docker://3.0.6
aks-agentpool-38801922-2 Ready agent 5d11h v1.13.10 10.240.0.4 <none> Ubuntu 16.04.6 LTS 4.15.0-1052-azure docker://3.0.6
You can also see all available Kubernetes contexts:
$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
Okera-Demo-AKS Okera-Demo-AKS clusterUser_Okera-Demo_Okera-Demo-AKS
* Okera-Demo-AKS2 Okera-Demo-AKS2 clusterUser_Okera-Demo2_Okera-Demo-AKS2
VNet Access and Peering¶
By default, a dedicated VNet was created with your AKS cluster (you can customize this VNet if desired during creation). This VNet is not accessible from the outside world, and the ODAS resources will only be accessible within this VNet.
If you would like to access the resources from a different VNet (e.g. where you may be running other Azure services such as Azure Databricks), you will need to peer the two VNets together. You can read more about VNet peering on the Microsoft documentation: https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview