Okera Client Configurations¶
This document describes configurations available to the client for Hadoop ecosystem tools.
Note: when the configs are set in spark, they are prefixed with spark.
. For example,
the config recordservice.kerberos.principal
, when configured for spark, should
be spark.recordservice.kerberos.principal
. This is true for all configs.
Specifying configs¶
How configs are specified depends on the tool being used and uses the tool's standard configuration mechanisms.
Spark¶
Configs can be specified:
- via the commandline to spark-submit/spark-shell with --conf
- set in spark-defaults, typically in /etc/spark-defaults.conf
- can be set in the application, via the SparkContext (or related) objects
Hive¶
Configs can be specified:
- via the commandline to beeline with --hiveconf
- set on the class path in either hive-site.xml or core-site.xml
- set in the beeline session using the
SET
command
Required configs¶
recordservice.planner.hostports
This is always required and is a comma separated list of host:ports where the ODAS planners are running.
Authentication configs¶
recordservice.kerberos.principal
This is the principal of the planner service to connect to. This is a 3 part
principal: SERVICE_NAME/SERVICE_HOST@REALM
, for example,
okera/planner.okera.com@OKERA.COM
. This is required if the client is
authenticating with ODAS using kerberos.
recordservice.delegation-token.token
This is the token string for this user. ODAS can be configured to accept multiple kinds of tokens but it is the same config for clients.
recordservice.delegation-token.service-name
This is only required for versions < 0.4.5 and should not be set on newer versions.
This must be set if token based auth is being used and should match the SERVICE_NAME
portion of the planner principal. In the above example, this value would be okera
.
Note: If both the token and principal is specified, the client will only authenticate using the token.
Network related configs¶
These configs are often not required and the defaults should suffice. These can be adjusted if the the client observes timeout behavior.
recordservice.planner.retry.attempts
Optional configuration for the maximum number of attempts to retry RPCs with planner. Default value is 5.
recordservice.worker.retry.attempts
Optional configuration for the maximum number of attempts to retry RPCs with worker. Default value is 5.
recordservice.planner.retry.sleepMs
Optional configuration for sleep between retry attempts with planner. Default initial value is 300. This grows exponentially.
recordservice.worker.retry.sleepMs
Optional configuration for sleep between retry attempts with worker. Default initial value is 300. This grows exponentially.
recordservice.planner.connection.timeoutMs
Optional configuration for timeout when initially connecting to the planner service. Default value is 10000.
recordservice.worker.connection.timeoutMs
Optional configuration for timeout when initially connecting to the worker service. Default value is 10000.
recordservice.planner.rpc.timeoutMs
Optional configuration for timeout for planner RPCs (after connection is established). Default value is 120000.
recordservice.worker.rpc.timeoutMs
Optional configuration for timeout for worker RPCs (after connection is established). Default value is 120000.
Performance related configs¶
These settings can fine tune the performance behavior. It is generally not needed to set these as the server will compute a good value automatically.
recordservice.task.fetch.size
Optional configuration option for performance tuning that configures the max number of records returned when fetching results from the workers. Defaule value is 50000.
recordservice.task.plan.maxTasks
Optional configuration for the hinted maximum number of tasks to generate per PlanRequest. Default value is determined by the server.