Okera Client Configurations¶
This document describes configuration settings available to the client for Hadoop ecosystem tools.
Note: When the configuration settings are specified in Spark, they are prefixed with
spark
, followed by a period (.). For example, the configrecordservice.kerberos.principal
, when configured for Spark, should bespark.recordservice.kerberos.principal
. This is true for all configuration options.
Specify Configuration Settings¶
The method used to specify configuration settings depends on the tool you are using and uses the tool's standard configuration mechanisms.
Spark¶
Configuration settings can be specified:
- Via the command line to
spark-submit/spark-shell
with--conf
- In
spark-defaults
, typically in/etc/spark-defaults.conf
- In the application, via the SparkContext (or related) objects.
Hive¶
Configuration settings can be specified:
- Via the command line to Beeline with
--hiveconf
- On the class path in either
hive-site.xml
orcore-site.xml
- In the Beeline session using the
SET
command
Required Configuration Settings¶
recordservice.planner.hostports
This setting is always required and is a comma-separated list of host:ports
where the Okera
Planners are running.
Authentication Configuration Settings¶
recordservice.kerberos.principal
This setting identifies the principal of the Okera Planner service to which the Hive and Spark client should connect. The three-part principal is specified in the format: SERVICE_NAME/SERVICE_HOST@REALM
. For example, okera/planner.okera.com@OKERA.COM
. This is required if the client is authenticating with Okera using Kerberos.
recordservice.delegation-token.token
This setting specifies the token string for this user. Okera can be configured to accept multiple kinds of tokens but it is the same configuration setting for clients.
recordservice.delegation-token.service-name
This setting is only required for versions older than 0.4.5 and should not be set on newer versions.
This must be set if token-based authorization is used and should match the SERVICE_NAME
portion of the planner principal. In the example above, this value would be okera
.
Note: If both the token and principal are specified, the client only authenticates using the token.
Network-Related Configuration Settings¶
These configuration settings are often not required and the defaults should suffice. These can be adjusted if you observe timeout behavior.
recordservice.planner.retry.attempts
This optional setting specifies the maximum number of attempts to retry RPCs with the Okera Planner.
The default value is 5
.
recordservice.worker.retry.attempts
This optional setting specifies the maximum number of attempts to retry RPCs with the Okera Worker.
The default value is 5
.
recordservice.planner.retry.sleepMs
This optional setting specifies the number of milliseconds to sleep between retry attempts with the Okera Planner.
The initial default value is 300
. This grows exponentially.
recordservice.worker.retry.sleepMs
This optional setting specifies the number of milliseconds to sleep between retry attempts with the Okera Worker.
The initial default value is 300
. This grows exponentially.
recordservice.planner.connection.timeoutMs
This optional setting specifies the timeout period (in milliseconds) that occurs when initially connecting to the Okera Planner.
The default value is 10000
.
recordservice.worker.connection.timeoutMs
This optional setting specifies the timeout period (in milliseconds) that occurs when initially connecting to the Okera Worker.
The default value is 10000
.
recordservice.planner.rpc.timeoutMs
This optional setting specifies the timeout period (in milliseconds) for Okera planner RPCs (after a connection is established).
The default value is 120000
.
recordservice.worker.rpc.timeoutMs
This optional setting specifies the timeout period (in milliseconds) for Okera Worker RPCs (after a connection is established).
The default value is 120000
.
Performance-Related Settings¶
These settings can fine-tune Okera's system performance. In general, you do not need to specify these settings because the server computes a good value automatically.
recordservice.task.fetch.size
This optional performance-tuning setting specifies the maximum number of records returned when fetching results from the Okera Workers. The default value is 50000
.
recordservice.task.plan.maxTasks
This optional performance-tuning setting specifies the hinted maximum number of tasks to generate per Planner request. The default value is determined by the server.
recordservice.task.plan.defer-signing-urls
This optional performance-tuning setting indicates whether presigning URIs in all nScale tasks should be deferred for requests initiated from Spark and Hive. Valid values are true
(defer presigning URIs in plan requests) and false
(continue presigning all URIs in plan requests). The default is false
.