Okera Client Configurations
This document describes configurations available to the client for Hadoop ecosystem tools.
Note: when the configs are set in spark, they are prefixed with
spark.. For example,
recordservice.kerberos.principal, when configured for spark, should
spark.recordservice.kerberos.principal. This is true for all configs.
How configs are specified depends on the tool being used and uses the tool’s standard configuration mechanisms.
Configs can be specified:
- via the commandline to spark-submit/spark-shell with –conf
- set in spark-defaults, typically in /etc/spark-defaults.conf
- can be set in the application, via the SparkContext (or related) objects
Configs can be specified:
- via the commandline to beeline with –hiveconf
- set on the class path in either hive-site.xml or core-site.xml
- set in the beeline session using the
This is always required and is a comma separated list of host:ports where the ODAS planners are running.
This is the principal of the planner service to connect to. This is a 3 part
SERVICE_NAME/SERVICE_HOST@REALM, for example,
okera/planner.okera.com@OKERA.COM. This is required if the client is
authenticating with ODAS using kerberos.
This is the token string for this user. ODAS can be configured to accept multiple kinds of tokens but it is the same config for clients.
This is only required for versions < 0.4.5 and should not be set on newer versions.
This must be set if token based auth is being used and should match the SERVICE_NAME
portion of the planner principal. In the above example, this value would be
Note: If both the token and principal is specified, the client will only authenticate using the token.
Network related configs
These configs are often not required and the defaults should suffice. These can be adjusted if the the client observes timeout behavior.
Optional configuration for the maximum number of attempts to retry RPCs with planner. Default value is 5.
Optional configuration for the maximum number of attempts to retry RPCs with worker. Default value is 5.
Optional configuration for sleep between retry attempts with planner. Default initial value is 300. This grows exponentially.
Optional configuration for sleep between retry attempts with worker. Default initial value is 300. This grows exponentially.
Optional configuration for timeout when initially connecting to the planner service. Default value is 10000.
Optional configuration for timeout when initially connecting to the worker service. Default value is 10000.
Optional configuration for timeout for planner RPCs (after connection is established). Default value is 120000.
Optional configuration for timeout for worker RPCs (after connection is established). Default value is 120000.
Performance related configs
These settings can fine tune the performance behavior. It is generally not needed to set these as the server will compute a good value automatically.
Optional configuration option for performance tuning that configures the max number of records returned when fetching results from the workers. Defaule value is 50000.
Optional configuration for the hinted maximum number of tasks to generate per PlanRequest. Default value is determined by the server.