Auditing

Introduction

A vital part of data governance is the ability to quickly and coherently respond to security threats. This is also critical for regulatory purposes and compliancy requests. Enterprises are often required to use company-wide security information and event management (SIEM) software that allows the correlation of audit log events generated by IT systems. For any data provided by the Okera Platform, there is a single audit log location with text files that contain JSON formated records for each monitored action. This includes all access to data and metadata, such as the Schema Registry to retrieve dataset information.

Every access to the Okera Platform services is authenticated by the respective endpoint, and the access operation is written to the audit logs in great detail, containing not only the user making the request, but also the API that is used, if the request was successful, and so on. All requests eventually flow through the ODAS Planner(s) and Workers, irrespective of the API used. The Planner is used, among other things, to retrieve dataset definitions and plan the distribution of the query across all available cluster nodes. These requests are authenticated and authorized by the Planner, which then appends an audit event record to its local log. The Workers are responsible to serve the dataset rows as needed by the clients, applying any filtering or masking on the fly. Before any data is returned the client request is again authenticated, creating an audit event entry to be appended to the Worker’s local log. After serving data and upon concluding the client request, the Worker(s) also log the amount of data that was transmitted to and from the client, with which you can determine the usage for individuals and groups, and develop, for instance, charge-back models.

Audit Logs

The Planners, Workers, gateway servers, and so on are configured to upload their local log to a central storage location, usually the configured installation directory. For AWS, this is a directory in an S3 bucket the servers have write access to. The location is configured with the OKERA_STAGING_DIR environment variable set in the env.sh and before starting the Deployment Manager process. If you want to change this environment variable, you would need to update the env.sh file and restart the Deployment Manager.

Note: Changing this setting only applies to cluster that are started thereafter. For an existing cluster a restart would be necessary to set the new log location.

For example, here is what the content of a staging directory may look like:

$ aws s3 ls s3://acme-ops/prod-7/
        PRE audit/
        PRE canary/
        PRE cdas_rest_server/
        PRE cerebro_catalog/
        PRE cerebro_planner/
        PRE cerebro_web/
        PRE cerebro_worker/
        PRE clusters/
        PRE zookeeper/
...

The first directory shown, named audit, is the one containing the uploaded audit logs from all cluster nodes. For performance reasons the data is partitioned by day inside the audit directory:

$ aws s3 ls s3://acme-ops/prod-7/audit/
        PRE ymd=2018-09-05/
        PRE ymd=2018-09-06/
        PRE ymd=2018-09-07/
        PRE ymd=2018-09-08/
        PRE ymd=2018-09-09/
        PRE ymd=2018-09-10/

Looking into a particular partition directory reveals the already uploaded logs, here one from a Planner and another from a Worker instance:

$ aws s3 ls s3://acme-ops/prod-7/audit/ymd=2018-09-05/
2018-09-06 08:18:06     814762 \
  planner-10.1.10.181-0-731c5cf0-4917-4947-b199-7fb257efcfba_audit.log
2018-09-10 15:25:00     143733 \
  worker-10.1.10.181-0-3e80936e-30d0-4080-a519-f1676d6544bf_audit.log

The file name is comprises the service name, the internal IP address of the node hosting it, a squence number, and a UUID, making the log unique for a specific instance of a service.

Note: There is usually no need to know about those files and where they exactly came from. Using the provided audit table and views (see below) will handle the files implicitly when executing queries on top of them.

You can use the Reports Page in the Okera Portal to easily explore and analyze these audit logs. Here an example of the raw JSON record that is appended to the audit log for every client access:

{ "request_time":"2018-09-05 16:19:01.628368000",
  "request_id":"97451917614274a8:bd6c2b3347c651a6",
  "start_unix_time":1536164341628,
  "end_unix_time":1536164342121,
  "auth_failure":false,
  "status":"ok",
  "user":"root",
  "connected_user":"root",
  "client_network_address":"::1:52774",
  "client_application":"okera-execute-ddl",
  "num_results_returned":0,
  "statement_type":"DDL",
  "default_db":"default",
  "statement":"CREATE ROLE IF NOT EXISTS okera_public_role",
  "ae_database":"","ae_function":"",
  "ae_role":"okera_public_role",
  "ae_table":"","ae_view":""}

Other Logs

Planner(s) and Workers generate additional log files that contain all of the log events these processes generate, including the audit logs events. They are located in another top-level directory in the staging location, one per service type (see Cluster Types for a list of services depending on the selected cluster type), and can be used by administrators for debugging purposes. For example:

$ aws s3 ls s3://acme-ops/prod-7/cerebro_planner/
2018-09-05 18:19:05     108632 2018-09-05-16-19-04-planner- \
  10.1.10.181-0-07a6e2e5-2014-4f1d-8abc-14a04618b341.log
2018-09-05 18:19:21      48361 2018-09-05-16-19-20-planner- \
  10.1.10.181-1-7df31dc6-43bd-4265-aa54-f11f5d7b22a8.log
2018-09-05 18:19:37        591 2018-09-05-16-19-36-planner- \
  10.1.10.181-2-62e16fed-b2ba-4ce6-936a-c0911c37636a.log

$ aws s3 ls s3://acme-ops/prod-7/cerebro_web/                                                                    130 ↵
2018-09-05 18:20:21        405 2018-09-05-16-20-19-web- \
  10.1.10.181-0-57950e12-360e-4629-bc89-763b567ad123.log
2018-09-05 18:27:33         76 2018-09-05-16-27-32-web- \
  10.1.10.181-1-acd846c6-6b32-4af7-99d0-880b9b0e3a46.log
2018-09-05 18:52:39         76 2018-09-05-16-52-38-web- \
  10.1.10.181-2-d41ad909-e46f-4b4d-b4a3-95a3ede57aa5.log

$ aws s3 ls s3://acme-ops/prod-7/cdas_rest_server/
2018-09-05 18:20:05      13763 2018-09-05-16-20-04-cdas-rest-server- \
  10.1.10.181-0-ec031b3f-e971-4c38-adb4-01a6d9a28c42.log
2018-09-05 18:20:21        532 2018-09-05-16-20-20-cdas-rest-server- \
  10.1.10.181-1-1102aab4-8571-4d53-bffe-f51fe3084765.log
2018-09-05 18:20:37        364 2018-09-05-16-20-36-cdas-rest-server- \
  10.1.10.181-2-f512e3e5-80b9-42b5-97d0-26e9d8c8259f.log

The audit log records are prefixed in the mixed service logs with Audit.log, on a single line. For example, a service log with technical and audit log entries may look like:

stderr: I0905 16:19:02.174656   228 impala-server.cc:1162] \
  RegisterQuery(): query_id=bb4986e8cab0d521:823bc5dbc662b3b4
stderr: I0905 16:19:02.266245   228 impala-server.cc:1212] \
  UnregisterQuery(): query_id=bb4986e8cab0d521:823bc5dbc662b3b4
stderr: I0905 16:19:02.267323   228 impala-server.cc:679] Audit.log: \
  {"request_time":"2018-09-05 16:19:02.173889000","request_id": \
  "ec4264da5511c330:a38cf5108c8b9b85","start_unix_time":1536164342173, \
  "end_unix_time":1536164342267,"auth_failure":false,"status":"ok", \
  "user":"root","connected_user":"root","client_network_address": \
  "::1:52776","client_application":"okera-execute-ddl", \
  "num_results_returned":0,"statement_type":"DDL","default_db":"default", \
  "statement":"GRANT ROLE okera_public_role TO GROUP __okera_public_group", \
  "ae_database":"","ae_function":"","ae_role":"","ae_table":"","ae_view":""}
stdout: Connecting to planner localhost:12050

Audit Log Format

Each audit log is a JSON record with the schema shown below. By default, the full audit log data is made available within the special okera_system database, as a table named audit_logs.

Note: The fields in the audit log schema are considered stable and will not be removed in future releases. It is expected that more fields are added in the future.

Field Type Description
ymd STRING The table partitioning string, derived from the request_time in the form of YYYY-MM-DD.
request_time STRING Human-formatted GMT time when the request started.
request_id STRING Unique ID for this request.
start_unix_time BIGINT Unix epoch time at which the request completed (or failed).
end_unix_time BIGINT Unix epoch time at which the audit entry was generated.
auth_failure BOOLEAN Set to true if the request failed due to authorization or authentication failure.
status STRING Status message of the request. “ok” if the request succeeded; error details otherwise.
user STRING The effective user running the request. This is the user whose access permissions are used to validate the request.
connected_user STRING The user that is authenticated at the network connection level. This can be different from ‘user’ if the ‘connected_user’ is running on behalf of another user.
client_application STRING The client application, including client version, that originated the request. Examples: execute-ddl, presto (1.0.0)
client_network_address STRING The client host and port.
statement_type STRING The type of the request.
statement STRING Log of the request. Depending on the statement type, this will vary in format but captures the request being made.
num_results_returned BIGINT The number of records returned by the query.
ae_database STRING A list of the databases accessed during the processing of the query.
ae_table STRING A list of the tables accessed during the processing of the query.
ae_view STRING A list of the views accessed during the processing of the query.
ae_function STRING A list of the functions executed during the processing of the query.
ae_role STRING A list of the roles modified during the processing of the query.

Note: Only stable fields are documented here. Others found in the logs are not considered stable and are subject to deprecation or unannounced removal.

Audit Log Views

For convenience, the okera_system database is also providing special views on top of the raw audit log table. These views provide implicit filtering based on the partitioning information.

Daily Audit Log View

The first view, named daily_audit_logs, restricts the data to the data of the last partition. Since the audit table is partitioned by day, this means the view returns the data of the current day.

Note: The audit log views use the ODAS-specific LAST PARTITION feature.

For instance, the daily audit log view is defined like so:

CREATE VIEW IF NOT EXISTS okera_system.daily_audit_logs
COMMENT 'Audited logs from the current day.'
AS SELECT * from okera_system.steward_audit_logs(LAST PARTITION);

Weekly Audit Log View

The weekly_audit_logs view is similar to the daily log view, but sets the number of partitions to return to seven, that is, one week. Using this view gives access to the last seven days, including the current day (which most likely is still accumulating data).

Monthly Audit Log View

The monthly_audit_logs view is similar to the daily log view, but sets the number of partitions to return to thirty, that is, one month. Using this view gives access to the last thirty days, including the current day (which most likely is still accumulating data).

Steward Audit Log View

A more tailored audit log view is available as steward_audit_logs. It only returns audit log rows that refer to objects, that is, database, table, or view name, which the current user has administrative access to. For example, assuming there is a dataset named salesdb.transactions and a user bob that is part of the sales_steward role, a catalog or database administrator would need to grant the ALL privilege to the sales_steward role on the salesdb.transactions object so that user bob is able to see the audit log rows that pertain to that dataset.

Reporting Audit Log View

The Okera Portal and its Reports Page use the reporting_audit_logs view that builds on the steward audit log view to provide access to the most important columns of the audit logs, while applying the same filtering. In other words, the user running reports will only see audit log records concerning objects they have full access to. The reporting view also filters out unnecessary audit records that would otherwise cause for the reports to show inaccurate aggregates. For example, the view filters out ODAS service internal communication, as well as one record for events that have both a start and end entry in the logs.