Okera Client Configurations¶

This document describes configuration settings available to the client for Hadoop ecosystem tools.

Note: When the configuration settings are specified in Spark, they are prefixed with spark, followed by a period (.). For example, the config recordservice.kerberos.principal, when configured for Spark, should be spark.recordservice.kerberos.principal. This is true for all configuration options.

Specify Configuration Settings¶

The method used to specify configuration settings depends on the tool you are using and uses the tool's standard configuration mechanisms.

Spark¶

Configuration settings can be specified:

Via the command line to spark-submit/spark-shell with --conf
In spark-defaults, typically in /etc/spark-defaults.conf
In the application, via the SparkContext (or related) objects.

Hive¶

Configuration settings can be specified:

Via the command line to Beeline with --hiveconf
On the class path in either hive-site.xml or core-site.xml
In the Beeline session using the SET command

Required Configuration Settings¶

recordservice.planner.hostports

This setting is always required and is a comma-separated list of host:ports where the Okera Planners are running.

Authentication Configuration Settings¶

recordservice.kerberos.principal

This setting identifies the principal of the Okera Planner service to which the Hive and Spark client should connect. The three-part principal is specified in the format: SERVICE_NAME/SERVICE_HOST@REALM. For example, okera/planner.okera.com@OKERA.COM. This is required if the client is authenticating with Okera using Kerberos.

recordservice.delegation-token.token

This setting specifies the token string for this user. Okera can be configured to accept multiple kinds of tokens but it is the same configuration setting for clients.

recordservice.delegation-token.service-name

This setting is only required for versions older than 0.4.5 and should not be set on newer versions. This must be set if token-based authorization is used and should match the SERVICE_NAME portion of the planner principal. In the example above, this value would be okera.

Note: If both the token and principal are specified, the client only authenticates using the token.

These configuration settings are often not required and the defaults should suffice. These can be adjusted if you observe timeout behavior.

recordservice.planner.retry.attempts

This optional setting specifies the maximum number of attempts to retry RPCs with the Okera Planner. The default value is 5.

recordservice.worker.retry.attempts

This optional setting specifies the maximum number of attempts to retry RPCs with the Okera Worker. The default value is 5.

recordservice.planner.retry.sleepMs

This optional setting specifies the number of milliseconds to sleep between retry attempts with the Okera Planner. The initial default value is 300. This grows exponentially.

recordservice.worker.retry.sleepMs

This optional setting specifies the number of milliseconds to sleep between retry attempts with the Okera Worker. The initial default value is 300. This grows exponentially.

recordservice.planner.connection.timeoutMs

This optional setting specifies the timeout period (in milliseconds) that occurs when initially connecting to the Okera Planner. The default value is 10000.

recordservice.worker.connection.timeoutMs

This optional setting specifies the timeout period (in milliseconds) that occurs when initially connecting to the Okera Worker. The default value is 10000.

recordservice.planner.rpc.timeoutMs

This optional setting specifies the timeout period (in milliseconds) for Okera planner RPCs (after a connection is established). The default value is 120000.

recordservice.worker.rpc.timeoutMs

This optional setting specifies the timeout period (in milliseconds) for Okera Worker RPCs (after a connection is established). The default value is 120000.

These settings can fine-tune Okera's system performance. In general, you do not need to specify these settings because the server computes a good value automatically.

recordservice.task.fetch.size

This optional performance-tuning setting specifies the maximum number of records returned when fetching results from the Okera Workers. The default value is 50000.

recordservice.task.plan.maxTasks

This optional performance-tuning setting specifies the hinted maximum number of tasks to generate per Planner request. The default value is determined by the server.

recordservice.task.plan.defer-signing-urls

This optional performance-tuning setting indicates whether presigning URIs in all nScale tasks should be deferred for requests initiated from Spark and Hive. Valid values are true (defer presigning URIs in plan requests) and false (continue presigning all URIs in plan requests). The default is false.