This document describes how Okera achieves high availability. It describes the high-level design of the system, how failures can be handled at each level and deployment options that can achieve high availability. Okera is designed to be used in critical systems where availability is essential.
Okera, at the highest level, consists of one or more stateless Kubernetes clusters that share the same metadata. This design enables very high availability.
Okera services only persist state in two locations: an RDBMS and a distributed file system. No state is persisted on any of the Okera VMs. The RDBMS, typically MySQL, is used to store all the catalog, policy and Okera metadata and the distributed file system, typically S3 or ADLS, is used only to persist service and audit logs.
This enables Okera to deploy very quickly (low number of minutes) and to run in multiple environments simultaneously.
Kubernetes and Federation¶
Each Okera cluster is a Kubernetes cluster and multiple Okera clusters can be deployed which share the same metadata (i.e. federation). Each Kubernetes provides resiliency within the cluster and is performance and failure isolated from one another. Furthermore, each cluster can be configured differently, for example a different size, in a different physical data center or different cloud provider to provide even greater availability.
This table describes how failures can be handled at each level from the system level as well as the end-user operations perspective.
|Failure||How it’s handled?||Setup Requirements||Repair Time|
|Single Container||Kubernetes||Default||Process restart, ~1 min|
|Single VM||Kubernetes/Cloud Provider (e.g. ASG)||Default||VM provisioning, 10 minutes, temporary capacity loss.|
|AZ Failure||Kubernetes/Cloud Provider||Leverage EKS/AKS for master HA||VM provisioning, 10 minutes, temporary capacity loss.|
|Cluster Failure (whatever reason)||Federated deployment with multiple Okera clusters||Load Balancer/CNAME. Clusters can be run active/active in normal circumstances.||None to switch LB/CNAME. 10s minutes to provision a new cluster.|
|Region Failure||Federated deployment across regions.||Data replication across regions, RDBMS replication. Load balancer/CNAME.||None to switch LB/CNAME. 10s minutes to provision a new cluster.|
As Okera is a Kubernetes application, all Okera services run in containers. Within a cluster, components are run replicated meaning failures are transparent to end-users. Okera leverages Kubernetes to repair these failures automatically. Traffic is automatically routed to the remaining replicas and failed containers are restarted, potentially on a different VM if required.
This is handled automatically by any deployment and requires no operator setup and results in no observable downtime.
Failure of a VM is handled similarly to a container failure. Kubernetes will automatically route traffic to the remaining replicas and move replicas to the remaining machines as necessary. This results in no observable downtime. While the cluster is repairing, there is reduced capacity. For example, a 20 node cluster with 2 failed nodes, operates with the capacity of an 18 node cluster.
Okera does not directly manage the infrastructure at the VM level and instead, relies on the cloud provider’s primitives to do so. Okera is well designed to take advantage of those. For example, on AWS, it is advised to run the Okera Kubernetes cluster either in EKS or by managing the VMs in an auto-scaling group (ASG). Similarly on Azure, the only supported deployment is on top of AKS. In any of these deployment options, the cloud provider’s repair will trigger to repair the VM and once the VM is available, it will automatically be integrated into the Okera cluster which completes the repair. No manual intervention is required.
The only requirement for the operator is to use one of the supported deployment options.
There are two strategies to support HA in the presence of AZ failures.
- Deploy each Kubernetes cluster across multiple AZs
- Deploy multiple Kubernetes clusters in different AZs
For option 1, we recommend leveraging EKS or AKS which handles this scenario automatically. The Kubernetes master is run across AZs. In this case, the failure is handled identically as the VM failures although the time to repair the cluster to full operations will likely be longer as more VMs need to be provisioned.
In AWS, if you are not using EKS, we currently do not support multi-AZ clusters if using AWS auto-scaling groups (ASG) directly.
This option requires no operator setup besides using the cloud-managed Kubernetes deployment options.
For option 2, simply deploy multiple Okera clusters in different AZs and then add a load balancer across them. This would be an active-active configuration which is supported. If active-passive is desired, a DNS cname (using, for example, Route53) that is flipped on failure is also possible. This requires the operator to setup the load balancer/cname.
To handle the case where an Okera cluster completely fails for whatever reason (e.g. accidentally deleted), we recommend option 2 from the AZ failure section and having a federated Okera deployment.
To handle the case where an entire region fails, the same federated deployment option for cluster failure can be used. From an Okera perspective, there is no difference between region failure and entire cluster failure.
There are however, additional requirements that the operator is responsible for. Okera supports but does not manage or automate these:
- Ensuring that the backing RDBMS state is available in the other regions. This can be done by enabling multi-region replication (e.g. Amazon Aurora) or by managing database snapshots and recovering in the new region. Okera expects the metadata to be available in the new region.
- Ensuring the data is replicated in the new region. This can be done with bucket replication on AWS for example, Okera provides some benefits for end users as Okera, by the nature of what it does, abstracts away path details for data consumers.