Authorization Builtins

ODAS supports authorization builtins to facilitate implementing fine grained access control. While these primitives are not strictly needed, they allow creating more expressive views. This can greatly improve usability for data consumers and stewards.

These builtins are intended to be used to implement the access control views from the schema design best practices. They are a family of builtins, that return true or false, depending on the authenticated user. These builtins otherwise behave like all ODAS builtins, meaning they can be used anywhere an expression can be used in the SQL statement. This includes for example, in the SELECT list, as a WHERE clause, or as a JOIN condition. These builtins can also be used with other expressions.

We’ll first describe the builtins in detail, and then a few examples of how we expect them to be used.

Builtins

has_access(STRING)

has_access takes a list of a comma-separated catalog paths and returns true if the user has SELECT access to all of them. The paths can be db or db.table. As usual, catalog paths are not case sensitive.

For example

SELECT has_access('prod_db') -> 'True' -- If the user has access to all of prod db
SELECT has_access('prod_db.sales_data') -> 'True' -- If the user has access to this table (or view)

-- To query multiple catalog objects:
SELECT has_access('prod_db1,prod_db2') -> 'True' -- If the user has access to both databases.

has_roles(STRING)

has_roles takes a list of a comma-separated roles and returns true if the user has all of the roles. As usual, role names are not case sensitive.

For example

SELECT has_roles('admin_role') -> 'True' -- If the user has the admin role.
SELECT has_roles('steward_role,analyst_role') -> 'True' -- If the user has both roles.

Expected usage

The primary expected use for these builtins is to express dynamic views where the view definition varies based on the user. This gives the benefit of allowing different users to query the same view by name as well consolidating as what would have been multiple views, which simplifies policy management.

A few common examples:

Tokenize a column if the user does not have access to the base table

In this case, we want to enforce the policy where users with access to the base table see the full value, but users without see the column tokenized. We could create this view by:

CREATE VIEW okera_sample.users_secure AS SELECT
  uid,
  dob,
  gender,
  if (has_access('okera_sample.users'), ccn, mask_ccn(uid)) as 'ccn'
FROM okera_sample.users

This creates a view, okera_sample.users_secure. Users that have access to the base table, okera_sample.users, are allowed to see the credit card column (ccn) but users without only see it after masking. Per the schema design best practices, all users and applications should read from this view.

Implement row level filter based on user roles

In this example, we will construct a view which based on the user’s roles, will only show a subset of the rows. This can be used to implement policies such as having a particular user be able to see sales data from Germany, another user from Great Britain and a third user being able to see both.

CREATE VIEW okera_sample.sales_data_secure AS
SELECT * FROM okera_sample.sales_data WHERE
  has_roles('de_role') AND country = 'de' OR
  has_roles('gbr_role') AND country = 'gbr'

Users that have the de_role will be able to see all data where the country is de. Users that have the grb_role will be able to see all data where the country is gbr. Users that have both, will be able to see both.

Both builtins can also be used together:

CREATE VIEW okera_sample.sales_data_secure AS
SELECT * FROM okera_sample.sales_data WHERE
  has_roles('de_role') AND country = 'de' OR
  has_roles('gbr_role') AND country = 'gbr' OR
  has_access('okera_sample.samples_data')

This is similar to the above, except users that have access to the base table have full access.

Performance impact

In the cases above, where the input to the builtins is constant (they are catalog objects names or role names), this has no runtime cost. The ODAS engine does constant folding at planning time and the access check is fully resolved during planning.

In the tokenization example, as part of constant folding, the query gets transformed:

SELECT
  uid,
  dob,
  gender,
  if (has_access('okera_sample.users'), ccn, mask_ccn(uid))
FROM okera_sample.users;

-- Becomes
SELECT
  uid,
  dob,
  gender,
  if (true, ccn, mask_ccn(uid))
FROM okera_sample.users;

-- Becomes
SELECT
  uid,
  dob,
  gender,
  ccn
FROM okera_sample.users;

The result is exactly the query we want to execute when reading data.

Similar in the row filtering example, for a user that just has the gbr_role:

SELECT * FROM okera_sample.sales_data
WHERE
  has_roles('de_role') AND country = 'de' OR
  has_roles('gbr_role') AND country = 'gbr'

-- Becomes
SELECT * FROM okera_sample.sales_data
WHERE
  false AND country = 'de' OR
  true AND country = 'gbr'

-- Becomes
SELECT * FROM okera_sample.sales_data
WHERE
  false OR
  country = 'gbr'

-- Becomes
SELECT * FROM okera_sample.sales_data
WHERE
  country = 'gbr'

Note again that all of these transformations happen entirely at planning time and have zero per-record overhead.