Platform Memory Usage¶
In general, there are many reasons for a process to manage its own memory, instead of relying on the operating system. Requesting and freeing memory from the operating system carries a substantial performance trade-off. Therefore, Okera minimizes the number of calls to the operating system to deal with allocating or freeing memory. We use the TCMalloc library, in addition to internal memory pooling for efficiency.
Internal Memory Pooling Behavior¶
In its default configuration, Okera assumes it is not running on the same VMs as other resource intensive processes so returning back to the OS doesn't provide much benefit. Even with other resource-intensive processes, there are other ways to tune the process; constantly trading memory back and forth can lead to unpredictability, which is generally not good (we prefer VM level isolation).
For monitoring purposes, memory usage is not proxy for service load. CPU usage and network IO are much better metrics to be tracking.
Only the memory usage of the Okera Enforcement Fleet workers is expected to be high. All the other services maintain very little state.
The general calling pattern for a scan is:
A user makes a request to the Okera Policy Engine (planner), which returns a list of tasks. Since this only involves the Okera Policy Engine, this incurs minimal memory usage. Expect to see high IO and CPU load, as generating the splits can be compute- and IO-intensive for large datasets.
For large datasets, the list of tasks will be far larger than the number of works or processor cores in the compute cluster. For example, we may return 1,000 tasks, but the Amazon EMR cluster only runs 64 (based on Amazon EMR cluster size, etc) at a time. Continuing with this example, the 64 tasks (run from Amazon EMR) get run on all the Okera Enforcement Fleet workers, distributed randomly. In a 10-node Okera cluster, each node would be expected to see 6-7 concurrent tasks, but you will see some variance in practice due to randomness. It would be unsurprising if the range across the Okera Enforcement Fleet workers was 4-10 concurrent tasks.
When each task begins running, it starts using memory. When it is done, it frees that memory (to the Okera Enforcement Fleet worker, not the operating system as above), and Amazon EMR will schedule the next task, which does the same thing. In this case, the peak memory required would be equal the memory requirement per task multiplied by the peak number of concurrent tasks (in this example: 10).
The memory requirement per task is typically low (maintaining a few buffers that read data from Amazon S3), but it is workload dependent. This is typically in the range of 10-100MB per task. In this case, perhaps a peak usage of 1GB. In scenarios with large numbers of concurrent users, we may have 200 concurrent tasks per Okera Enforcement Fleet worker (so 2000 in a 10-node Okera cluster), so more in the 20GB peak range, which we consider acceptable for typical VMs.