REST API¶
This document describes the REST API of the Okera Catalog. This is intended for clients that want to leverage all of the Okera functionality. In addition to this, clients can connect using existing APIs, such as the Hive Metastore API.
The purpose of this API is to provide programmatic access to interact with the catalog.
Authentication¶
Unless otherwise specified, the APIs require users to be authenticated. In general,
authentication can be done via Kerberos or by tokens. For token based authentication,
simply specify the token in the auth header: authorization: Bearer <TOKEN>
.
See the authentication document for details on how to get tokens and check if token authentication is working.
SSL¶
If SSL is enabled, all the calls should be made against HTTPS, instead of HTTP. They are otherwise unchanged.
Executing Hive DDL¶
Endpoint: /api/hive-ddl
[POST]
This API allows you to execute HiveQL SQL commands. This can be used to create datasets, create roles, issues grants, etc. The purpose of this API is to be compatible with beeline.
The POST request takes as a parameter:
{
"query" [String]: Required, HiveQL SQL command.
}
Example:
curl -H "Content-Type: application/json" -X POST -d '{"query":"show databases"}' <okerahost>:8083/api/hive-ddl
As is the case with most SQL dialects, user names containing a dash need to be escaped. This is accomplished by wrapping the username in backticks.
Example:
curl -H "Content-Type: application/json" -X POST -d '{"query":"create role `user-one`"}' <okerahost>:8083/api/hive-ddl
Listing Databases¶
Endpoint: /api/databases
[GET]
Endpoint: /api/databases
[POST]
The POST request takes as a parameter:
{
"filter" [String]: Optional, filter on the name of databases to return. For example,
'log*' returns all databases that start with 'log'.
}
Listing datasets¶
Endpoint: /api/datasets
[GET]
Endpoint: /api/datasets
[POST]
The POST request takes as a parameter:
{
"db" [String]: Optional, database to retrieve datasets from. Default is 'default'.
"filter" [String]: Optional, filter on the name of datasets to return. For example,
'log*' returns all datasets that start with 'log'.
}
Details of a Dataset¶
Endpoint: /api/datasets/{name}
[POST]
Returns: Dataset information as json. This includes the schema as well as other information.
{
'db' [String]: Database containing this dataset.
'name' [String]: Name of dataset
'schema' [List]: List of columns
}
Example:
curl -X POST <okerahost>:8083/api/datasets/okera_sample.sample
Scanning a Dataset¶
Endpoint: /api/scan/{name}
[GET]
Endpoint: /api/scanpage/{name}
[GET]
Endpoint: /api/scan
[POST]
Endpoint: /api/scanpage
[POST]
Returns dataset as json. The scan API will only return the initial rows. Scanpage returns a handle that can be used to retrieve all the records.
Example:
curl <okerahost>:8083/api/scan/okera_sample.sample
The POST request takes as a parameter:
{
"query" [String]: "SQL Query to execute"
}
Example:
curl -X POST -H 'Content-Type: application/json' \
-d '{"query" : "select uid, ccn from okera_sample.users"}' <okerahost>:8083/api/scan
The scanpage
API accepts two optional argument records=
, which is the total number
of records to return for the query, and session_id
. The API returns records in batches
of up to 10,000.
The session_id
value is used on subsequent queries to return successive batches of
records. It must be omitted on the first query.
Note:
session id
s are only valid for 30 seconds, starting from the time that the Okera cluster starts returning data. That timer is reset each time a query is received for a givensession_id
.
$ curl <okera_rest_server_endpoint>/api/scanpage/products?user=presto
$ curl <okera_rest_server_endpoint>/api/scanpage/products?user=presto&records=25000
$ curl <okera_rest_server_endpoint>/api/scanpage/products?user=presto&session_id=77480ad07d743bb1:b4f7822f036c6c91
Each returned object contains:
{
'records' [List]: Each entry is an object containing the field names, types and values of each record.
'session_id' [String]: Key used to return subsequent 'pages'. Each page contains up to 10,000 entries. When the final page is returned, 'session_id' is "-1".
}