High Availability

This document describes how Okera Data Access Service (ODAS) is able to achieve high availability. It goes over the high-level design of the system, how failures can be handled at each level and deployment options that can achieve high availability. ODAS is designed to be used in critical systems where availability is essential.

ODAS, at the highest level, consists of one or more stateless Kubernetes clusters that share the same metadata. This design enables very high availability.

Stateless

ODAS services only persist state in two locations: an RDBMS and a distributed file system. No state is persisted on any of the ODAS VMs. The RDBMS, typically MySQL, is used to store all the catalog, policy and ODAS metadata and the distributed file system, typically S3 or ADLS, is used only to persist service and audit logs.

This enables ODAS to deploy very quickly (low number of minutes) and to run in multiple environments simultaneously.

Kubernetes and Federation

Each ODAS cluster is a Kubernetes cluster and multiple ODAS clusters can be deployed which share the same metadata (i.e. federation). Each Kubernetes provides resiliency within the cluster and is performance and failure isolated from one another. Furthermore, each cluster can be configured differently, for example a different size, in a different physical data center or different cloud provider to provide even greater availability.

Summary

This table describes how failures can be handled at each level from the system level as well as the end-user operations perspective.

Failure How it’s handled? Setup Requirements Repair Time
Single Container Kubernetes Default Process restart, ~1 min
Single VM Kubernetes/Cloud Provider (e.g. ASG) Default VM provisioning, 10 minutes, temporary capacity loss.
AZ Failure Kubernetes/Cloud Provider Leverage EKS/AKS for master HA VM provisioning, 10 minutes, temporary capacity loss.
Cluster Failure (whatever reason) Federated deployment with multiple ODAS clusters Load Balancer/CNAME. Clusters can be run active/active in normal circumstances. None to switch LB/CNAME. 10s minutes to provision a new cluster.
Region Failure Federated deployment across regions. Data replication across regions, RDBMS replication. Load balancer/CNAME. None to switch LB/CNAME. 10s minutes to provision a new cluster.

In Detail

Container failure

As ODAS is a Kubernetes application, all ODAS services run in containers. Within a cluster, components are run replicated meaning failures are transparent to end-users. ODAS leverages Kubernetes to repair these failures automatically. Traffic is automatically routed to the remaining replicas and failed containers are restarted, potentially on a different VM if required.

This is handled automatically by any deployment and requires no operator setup and results in no observable downtime.

VM failure

Failure of a VM is handled similarly to a container failure. Kubernetes will automatically route traffic to the remaining replicas and move replicas to the remaining machines as necessary. This results in no observable downtime. While the cluster is repairing, there is reduced capacity. For example, a 20 node cluster with 2 failed nodes, operates with the capacity of an 18 node cluster.

ODAS does not directly manage the infrastructure at the VM level and instead, relies on the cloud provider’s primitives to do so. ODAS is well designed to take advantage of those. For example, on AWS, it is advised to run the ODAS Kubernetes cluster either in EKS or by managing the VMs in an auto-scaling group (ASG). Similarly on Azure, the only supported deployment is on top of AKS. In any of these deployment options, the cloud provider’s repair will trigger to repair the VM and once the VM is available, it will automatically be integrated into the ODAS cluster which completes the repair. No manual intervention is required.

The only requirement for the operator is to use one of the supported deployment options.

AZ Failure

There are two strategies to support HA in the presence of AZ failures.

  1. Deploy each Kubernetes cluster across multiple AZs
  2. Deploy multiple Kubernetes clusters in different AZs

For option 1, we recommend leveraging EKS or AKS which handles this scenario automatically. The Kubernetes master is run across AZs. In this case, the failure is handled identically as the VM failures although the time to repair the cluster to full operations will likely be longer as more VMs need to be provisioned.

Note

In AWS, if you are not using EKS, we currently do not support multi-AZ clusters if using AWS auto-scaling groups (ASG) directly.

This option requires no operator setup besides using the cloud-managed Kubernetes deployment options.

For option 2, simply deploy multiple ODAS clusters in different AZs and then add a load balancer across them. This would be an active-active configuration which is supported. If active-passive is desired, a DNS cname (using, for example, Route53) that is flipped on failure is also possible. This requires the operator to setup the load balancer/cname.

Cluster Failure

To handle the case where an ODAS cluster completely fails for whatever reason (e.g. accidentally deleted), we recommend option 2 from the AZ failure section and having a federated ODAS deployment.

Region Failure

To handle the case where an entire region fails, the same federated deployment option for cluster failure can be used. From an ODAS perspective, there is no difference between region failure and entire cluster failure.

There are however, additional requirements that the operator is responsible for. ODAS supports but does not manage or automate these:

  1. Ensuring that the backing RDBMS state is available in the other regions. This can be done by enabling multi-region replication (e.g. Amazon Aurora) or by managing database snapshots and recovering in the new region. ODAS expects the metadata to be available in the new region.
  2. Ensuring the data is replicated in the new region. This can be done with bucket replication on AWS for example, ODAS provides some benefits for end users as ODAS, by the nature of what it does, abstracts away path details for data consumers.