Skip to content

Tutorial: Analyzing the ODAS audit logs

This self-service tutorial shows you how you can access and query the ODAS audit log datasets. You will learn how to write SQL queries for important information stored in the logs, which is useful when managing a cluster or when performing an audit.

Difficulty: Intermediate
Time needed: 4 hour

Introduction

Clients, that is, users and applications, can use any of the available ODAS endpoints to issue metadata requests and query managed datasets. Every access from these clients is protocolled by the ODAS Audit Engine as JSON records stored in text files. These files are uploaded regularly to a configurable storage service and made available as system tables and related views, allowing them to be queried by administrators.

The following shows the audit logs related datasets in the ODAS WebUI. Selected is the main table, called okera_system.audit_logs, containing the audit records, here stored in a location on AWS S3:

WebUI with audit logs related datasets

Data managers, such as data owners, stewards, or line-of-business (LOB) managers, often require insight into what users do with the data they manage. The audit logs related system datasets provide a great source of information to create dashboards and reports using standard SQL queries.

Audit Log Records

As explained in our Auditing documentation, the JSON records stored in the log file comprise a long list of columns that contain details about the request that was made.

The following shows an example where the user creates and then drops a database.

Example: Audit records showing a database being created and subsequently dropped

{
  "request_time":"2020-10-30 10:05:01.437238000",
  "request_id":"504717762e22e373:79a9d59cfeb9b2a5",
  "session_id":"ef4b2d937103c170:8a64f031fecfeab9",
  "start_unix_time":1604052301437,
  "end_unix_time":1604052301455,
  "auth_failure":false,
  "status":"ok",
  "user":"admin",
  "connected_user":"root",
  "client_network_address":"172.23.77.80:40374",
  "client_application":"okera-rest-server",
  "num_results_returned":0,
  "num_results_read":0,
  "bytes_scanned":0,
  "server_total_time_ms":0,
  "queue_time_ms":0,
  "planning_time_ms":0,
  "execution_time_ms":0,
  "statement_type":"CREATE_DATABASE",
  "for_reporting":true,
  "default_db":"default",
  "statement":"create database foobar",
  "ae_attribute":"",
  "ae_database":"",
  "ae_function":"",
  "ae_path":"",
  "ae_role":"",
  "ae_table":"",
  "ae_view":""
}
...
{
  "request_time":"2020-10-30 10:05:07.519814000",
  "request_id":"d44cf1ac90cb9d52:7811f9cb5065a29a",
  "session_id":"ca4532336b844b72:a4bbece430b38587",
  ...
  "statement_type":"DROP_DATABASE",
  ...
  "statement":"drop database foobar",
  "ae_attribute":"",
  "ae_database":"foobar",
  "ae_function":"",
  ...
}

The second record was abbreviated for the sake of brevity. Notice though how the record is containing the actual SQL statement, as well as the statement type, which is determined when the ODAS Planner is parsing the query.

The audit records shown where taken from the Planner audit log files. ODAS Workers emit their own audit records to separate log files, but to the same shared storage location. It means that both Planner and Worker audit records are accessible through the same same system tables, mitigating the need to merge them manually.

Here are the most important statement types the records may contain, along with the service emitting them:

Type Kind Source Notes
PLAN DML Planner Emitted when the Planner services plans a query
SCAN_START DML Worker Emitted by the Worker(s) when executing a query task
SCAN_END DML Worker Emitted by the Worker(s) when completing a query task
INTERNAL_API DDL Planner Emitted when the Planner is returning metadata to another ODAS service
DESCRIBE/DESCRIBE_DB DDL Planner Emitted when a client sends a DESCRIBE statement
SHOW_{CREATE_TABLE,DBS,GRANT,ROLES,...} DDL Planner Emitted when a client sends a SHOW statement
ALTER_{DATABASE,TABLE} DDL Planner Emitted when a client sends an ALTER statement
CREATE_{DATABASE,TABLE,VIEW,...} DDL Planner Emitted when a client sends a CREATE statement
DROP_{DATABASE,TABLE,VIEW,...} DDL Planner Emitted when a client sends a DROP statement
GRANT_{PRIVILEGE,ROLE} DCL Planner Emitted when a client sends a GRANT statement
REVOKE_{PRIVILEGE,ROLE} DCL Planner Emitted when a client sends a REVOKE statement
{CREATE,ASSIGN,DELETE,...}_ATTRIBUTES DCL Planner Emitted when a client sends an ABAC related statement
UNKNOWN - Planner Emitted when the statement is not categorized or failed to parse

If you want to see all the statement types in your cluster, you can execute the following query.

Example: Query all statement types from the audit logs table

SELECT statement_type, count(statement_type) AS cnt
FROM okera_system.audit_logs
GROUP BY statement_type
ORDER BY cnt DESC

Note

We assume in this tutorial that you are using the Presto SQL dialect, by either selecting the "Presto" option in the WebUI's Workspace, or by using an external SQL client, such as DBeaver or SQLWorkbench/J.

Use-Cases

The following sections introduce you to the audit logs related datasets and their content in more detail. Practical examples guide through common use-cases for data managers.

Correlate Audit Records for Queries

Each query in ODAS goes through two phases: the planning phase and execution phase. The former parses the query in the context of the user and plans the physical execution, while applying all necessary data access policies. Once the query plan is compiled it is returned to the caller, which then uses it to subsequently execute the plan in a series of query tasks. How this happens is client dependent, but assuming a modern compute framework such as Spark or Presto, the query tasks are executed at some level of parallelization, where executors or worker processes run the tasks in parallel using CPU multithreading. This means that from an ODAS perspective, the query is triggering more than one audit event: one for the planning phase, and one or more for each task in the execution phase.

Example: Audit events for a simple SQL query

{
  "request_time":"2020-10-29 13:56:54.541251000",
  "request_id":"704d151f52e9f5f8:24c1fa9321700000",
  "session_id":"e49b9dc6af964db:60a778ce5801a186",
  ...
  "num_results_returned":1,...
  "planning_time_ms":0,
  "execution_time_ms":0,
  "statement_type":"PLAN",
  ...
  "statement":"select count(*) from `marketing`.`user_account_data`",
  "ae_attribute":"marketing.approved",
  "ae_database":"marketing",
  "ae_function":"",
  "ae_path":"",
  "ae_role":"",
  "ae_table":"marketing.user_account_data",
  "ae_view":""
}

{
  "request_time":"2020-10-29 13:56:54.733802000",
  "request_id":"704d151f52e9f5f8:24c1fa9321700001",
  "session_id":"e49b9dc6af964db:60a778ce5801a186",
    ...
  "statement_type":"SCAN_START",
  ...
  "statement":"select count(*) from `marketing`.`user_account_data`",
  ...
}

{
  "request_time":"2020-10-29 13:56:54.734082000",
  "request_id":"704d151f52e9f5f8:24c1fa9321700001",
  "session_id":"e49b9dc6af964db:60a778ce5801a186",
  ...
  "statement_type":"SCAN_END",
  ...
  "statement":"select count(*) from `marketing`.`user_account_data`",
  ...
}

As can be seen in the example, the tasks, here just one, emit two distinct events, one for the start of the read operation, and one for the end of it. Important to note is the request_id value of the events, which has the following qualities:

  • Its format is two hexadecimal numbers divided by a colon: <prefix>:<postfix>.
  • For a specific client query, the request ID prefix (e.g. 704d151f52e9f5f8) is the same for all events of that query.
  • The postfix starts at zero in the right-most digits for the first query event and is incremented for each task (e.g. 24c1fa9321700000 for the PLAN event, and 24c1fa9321700001 for both the SCAN events of the same task) .

In other words, the scan start and end event have the same request ID. Since the end event is more interesting than the start, you could filter out that event completely. This allows you to correlate the events and query them to see the details for each query phase.

Example: Correlate plan and query task events

SELECT request_id, statement_type, statement 
FROM okera_system.steward_audit_logs
WHERE statement_type IN ('PLAN', 'SCAN_END') 
ORDER BY 1

The simple sorting aligns all of the request ID prefixes next to each other, and will effectively group together the query related events.

Of note is that the query is using the okera_system.steward_audit_logs view, which implicitly is filtering the audit records for all rows the current user has access too. This allows for this view to be accessible by many users, possibly even everyone, as each user can only see what they are allowed to see. Other views, for example the okera_system.reporting_audit_logs view (shown in the next use-case) would not work, as it also filters on the source of the events: it only shows Planner events, but not Worker related ones.

Find Events in Time

Another common task for auditors, managers and administrators alike is finding who did access what datasets through one of the Okera endpoints. This requires the use of the okera_system.audit_logs table and rather simple SQL queries to determine what happened during a specific period of time. First we need to look at the available columns in the audit logs table:

The columns of the audit logs table

There are a few things to note:

  • The columns that record the start (start_unix_time) and end time (end_unix_time) of an operation are defined as BIGINT. They are actually Unix epoch based, recording the time elapsed since January 1st, 1970. For ODAS, this elapsed time is given in millisecond granularity, while some clients or SQL functions may expect full seconds instead.

  • The audit logs table is partitioned by date (ymd) so that queries can take advantage of eliminating unwanted partitions altogether. Queries can either try to rely on the Planner to detect predicates well enough to be able to prune partitions automatically, or they can simply specify the date range explicitly and force the pruning.

  • The user responsible for the audit record is stated in the user column and can be used to filter down the audit records further.

For example, if you want to find all audit records for a specific time period, you could write a simple SQL query combining the above knowledge.

Example: Query all audit events for a specific time period

SELECT *
FROM okera_system.reporting_audit_logs
WHERE
  ymd BETWEEN '2020-09-17' AND '2020-09-18'
  AND start_unix_time BETWEEN 1600300801197 AND 1600387206546
  -- AND ae_table in ('marketing.transactions', 'creditcard.transactions_cc')
  AND user NOT IN ('okera', 'admin')

Note

The query also shows a commented out line that can further filter down the audit records by objects that were affected in the event. Feel free to experiment with the available columns to pinpoint what you need.

The results shown the ODAS WebUIs Workspace tab may look like this:

Results of a simple audit logs table query

Converting epoch-based values into human-readable dates and back again is not very pleasant. The next step is using the built-in date conversion functions and some simple math to convert the milliseconds to seconds (and back) where necessary.

Example: Using the built-in conversion functions to handle readable dates

SELECT
  from_unixtime(cast(start_unix_time/1000 as int)) as start_time,
  from_unixtime(cast(end_unix_time/1000 as int)) as end_time,
  auth_failure,
  status,
  user,
  client_application,
  statement_type
FROM okera_system.reporting_audit_logs
WHERE
  ymd BETWEEN '2020-09-17' AND '2020-09-18'
  AND start_unix_time BETWEEN to_unixtime(
    from_iso8601_timestamp('2020-09-17T00:00:00Z'))*1000
    AND to_unixtime(
    from_iso8601_timestamp('2020-09-18T00:00:00Z'))*1000
  AND user NOT IN ('okera', 'admin')

The outcome looks like this:

Converting dates to human-readable form

Finally, we can further simplify the query, as the used okera_system.reporting_audit_logs view has the conversion of the column values already built in. For instance, start_unix_time is available as start_time_utc and allows to completely drop the usage of the conversion functions.

Example: Use the provided columns of the view to display and filter records

SELECT
  start_time_utc,
  end_time_utc,
  auth_failure,
  status,
  user,
  client_application,
  statement_type
FROM okera_system.reporting_audit_logs
WHERE
   ymd BETWEEN '2020-09-17' AND '2020-09-18'
   AND start_time_utc BETWEEN '2020-09-17' AND '2020-09-18'
   AND user not in ('okera', 'admin', 'root')

Choose whatever option suits you best to access the audit logs table; either way you should be able to select the exact set of records based on time, user, and objects affected.

Permissions Granted to Users

Building on the previous use-case, it is also common to ask who had been assigned specific permissions to objects in the Schema Registry during some period of time. This can be traced by looking further into the audit record values, which report the type of the statement executed. We use this fact to filter for the data control language (DCL) events that are used to grant and revoke permissions, which for ODAS means the granting of a role to a group of users or a specific user.

Example: Filtering audit records pertaining to permission changes

SELECT 
  start_time_utc,
  user,
  statement_type,
  statement
FROM okera_system.reporting_audit_logs
WHERE
  ymd BETWEEN '2020-10-05' AND '2020-10-06'
  AND (statement_type = 'GRANT_ROLE' OR statement_type = 'REVOKE_ROLE')

The outcome may look like this:

Events pertaining to permission changes

Applying some string manipulation allows to make the result more readable, as it extracts the role and group from the statement into dedicated columns.

Example: Extracting role and group from the filtered events

SELECT 
  start_time_utc,
  statement_type AS action,
  regexp_extract(statement, 
    '(?i)(?:GRANT|REVOKE) ROLE `?([^;`]+)`? (?:FROM|TO) GROUP `?([^`;]+)`?[;]?', 1) 
    AS role,
  regexp_extract(statement, 
    '(?i)(?:GRANT|REVOKE) ROLE `?([^;`]+)`? (?:FROM|TO) GROUP `?([^`;]+)`?[;]?', 2) 
    AS `group`,
  `user` AS executing_user
FROM okera_system.reporting_audit_logs
WHERE
  ymd BETWEEN '2020-10-05' AND '2020-10-06'
  AND (statement_type = 'GRANT_ROLE' OR statement_type = 'REVOKE_ROLE')

The final result looks like this:

Extracting role and group names from the statement

Using the stored DCL-related records, you can also reconstruct the state of a role and group at any point in time. This is left as an exercise to the reader.

Access Grouped by User and Date

Another common report data administrators need is answering the following question: How many times did users access a specific registered dataset?

For that we can use the same approach as before, that is use the audit logs table and/or views to filter by date range, while also grouping by the user and building an aggregate.

Example: Compute number of accesses to datasets for specific users

SELECT
  ae_table AS query_table, 
  user AS query_user, 
  client_application AS app, 
  ymd, 
  count(*) AS count
FROM okera_system.reporting_audit_logs
WHERE
  ymd BETWEEN '2020-09-17' AND '2020-09-18'
GROUP BY ae_table, user, client_application, ymd
ORDER BY query_table, user, client_application, ymd ASC;

The result may look like this:

Number of accesses by time and user

The following examples widen or narrow the scope of the above query. Results of these queries are omitted for the sake of brevity.

Example: Narrow down result by adding specific user

SELECT
  ae_table AS query_table,
  client_application AS app,
  ymd,
  count(*) AS count
FROM okera_system.reporting_audit_logs
WHERE
  ymd BETWEEN '2020-09-17' AND '2020-09-18'
  user IN ('janedoe', 'johndoe')
GROUP BY ae_table, user, client_application, ymd
ORDER BY query_table, user, client_application, ymd ASC;

Example: Broaden the scope by removing the application info

SELECT
  ae_table AS query_table, 
  user AS query_user,
  count(*) AS count
FROM okera_system.reporting_audit_logs
WHERE
  ymd BETWEEN '2020-09-17' AND '2020-09-18'
GROUP BY ae_table, user
ORDER BY query_table, user ASC;

Example: Adding the statement the user ran to the report

SELECT
  user AS query_user,
  ae_table AS query_table,
  statement,
 count(*) AS count
FROM okera_system.reporting_audit_logs
WHERE
  ymd BETWEEN '2020-09-17' AND '2020-09-18'
GROUP BY user, ae_table, statement
ORDER BY user, query_table ASC;

You may notice that some rows will not have any values for the query_table column. This is due to the nature of the commands that were run by the user: some statements do not contain a specific table, for instance, GRANT ROLE ... TO GROUP ... statements. This can be fixed by further filtering on the statement type, and if wanted all the way down to just the reading of data.

Example: Filtering only read access events

SELECT
  user AS query_user,
  ae_table AS query_table,
  statement,
 count(*) AS count
FROM okera_system.reporting_audit_logs
WHERE
  ymd BETWEEN '2020-09-17' AND '2020-09-18'
  AND statement_type in ('PLAN')
GROUP BY user, ae_table, statement
ORDER BY user, query_table ASC;

Adding the filter for PLAN-type events, the result is now comprised of only the audit records that were created by SELECT statements, that is, read access to the table:

Filtering read events only

Roles, Groups and Users

The system database okera_system includes other interesting datasets, not just the main audit logs table, that are useful for data administrators. A whole set of tables relates to the users issuing the query (the user column in the audit records) and their known groups, as well as roles that are associated to them. The following diagram explains the relationship of the first set of datasets.

Note

The diagrams use the standard entity relationship diagram (ERD) notation. This serves more as a help to think about how to use them in SQL queries, but less for accuracy: in data lakes there usually is no notion of referential integrity, since there are no primary or foreign keys.

ERD of roles and groups tables with alternative view

The diagram shows the okera_system.role_names and okera_system.group_names, which are simple lists where a distinct name of a role or group is stored with a unique ID. The okera_system.role_group_ids table links them together in a many-to-many relationship, connecting distinct role names with all the groups they have been grant to. The DCL statements that create a role, group and connecting the two looks like this (but obviously this can be done via the WebUI's Roles tab as well).

Example: DCL statements creating a role and group, and connecting them

> CREATE ROLE developer_role;
> GRANT ROLE developer_role TO GROUP developer_group;

There is also a view that converts the role and group IDs to human-readable names, that is, okera_system.role_groups, which can be used to present a mapping list to interactive users.

The next set of datasets is related to the users that have (successfully) authenticated with one of the many ODAS endpoints, which includes a step called group resolution that determines the currently available groups of the user. These groups are later matched with the groups shown in the previous set of tables, namely okera_system.group_names. A user commonly presents more than one group and not all of them may have a role assigned to them. The following diagram shows the authentication and group lookup related datasets.

ERD of users and groups tables with alternative views

The design is similar to the previous set of datasets: there are two simple list tables with an ID and a name, and then a table that joins the two in a many-to-many relationship, though here with with an extra last_observed field. It records the time when a user authentication last presented the given group.

There is also a similar view, okera_system.groups_by_user, which replaces the IDs for the groups and users with the actual text representation. Note though, the view only shows the groups a user was authenticated with the last time they interacted with the cluster.

The additional okera_system.user_group_info view provides a list of all groups a user ever was authenticated with, that is, even at different times. It omits the last_observed column, that is, just listing all registered user/group combinations, and does this using the text representation of both again.

You can make use of the base tables that back these views and write a query that shows you all groups of all users, together with the time they were recorded.

Example: List all groups of all users at any time

SELECT u.user_name, g.group_name, ug.last_observed 
FROM okera_system.users u 
JOIN okera_system.user_group_mappings ug ON ug.user_name_id = u.id 
JOIN okera_system.groups g on ug.group_name_id = g.group_id 
ORDER BY u.user_name, ug.last_observed

The result may look like this:

List all groups of all users at any time

Finally, we can combine the audit logs table with the above to compile a comprehensive list of all access to data, while also listing the groups that where used to execute the query.

Example: Combine audit logs with user groups

SELECT 
  al.start_time_utc, 
  u.user_name, 
  g.group_name, 
  al.statement
FROM okera_system.reporting_audit_logs al
JOIN okera_system.users u ON u.user_name = al.user
JOIN okera_system.user_group_mappings ug ON ug.user_name_id = u.id
JOIN okera_system.groups g on ug.group_name_id = g.group_id 
WHERE 
  al.start_time_utc > (
    SELECT max(ugm.last_observed) 
    FROM okera_system.user_group_mappings ugm
    WHERE ugm.user_name_id = ug.user_name_id
      AND ugm.last_observed <= al.start_time_utc 
  )
  AND al.statement_type IN ('PLAN')
-- AND al.user IN ('janedoe', 'johndoe')
  AND al.user NOT IN ('admin', 'okera')
  AND al.ymd BETWEEN '2020-11-01' AND '2020-11-3'
  AND g.group_name NOT IN ('__okera_public_group')
  AND al.status = 'ok'
ORDER BY al.start_time_utc, u.user_name

The query combines the audit records with the groups the user had reported just before the event took place. In other words, it reports the groups a user present for each read access to data.

The result may look like this:

Combine audit logs with user groups

There are many more ways you can slice and dice the audit and user related information. This tutorial just covered the surface of what is possible, but should have started you on that journey.