Role-based Access Control

This document is for DBAs, developers, and data architects, who are looking for information about how role-based access control works, as related to the Okera Policy Engine.

Introduction

As introduced in the Catalog Overview documentation, the Okera Catalog services include the Okera Policy Engine, which is used throughout the platform to power the self-service authorization process. The Okera Platform is in many ways similar to an RDBMS, as it exposes data primarily through a table structure, no matter what the underlying storage technology provides. Okera also exposes RDBMS-like functionality when it comes to authorizing access to data.

By default, no data is accessible when standing up an Okera Active Data Access Platform and pointing users to it. Instead, the initial administrators of the platform (which must be configured at the cluster level) can grant access to objects in a variety of ways. The administrator can also delegate administrative rights of an object and its child objects to regular users or groups of users. For example, an initial admin could grant full access and administrative rights for the database "sales" to the "salesadmin" user group. The Policy Engine supports this by providing a very flexible approach, called role-based access control (RBAC).

Access Control

Instead of linking users or groups of users directly to resources, such as database or datasets, the Policy Engine decouples the former from the latter using roles. These are arbitrarily named collections of permissions that pertained objects in the Schema Registry. For instance, you could create a role "admins" that has full access to the entire catalog, which subsumes all objects contained. Or you could create a role for just a particular purpose inside the sales department, such as being able to read and write all tables necessary for customer relationship management (CRM). The following diagram shows how users are commonly grouped together and then assigned roles from the Policy Engine.

The advantage of decoupling roles from the user groups is that the roles can be combined to assign all the required permissions to the users of a particular group. You can define a role that can read a subset of a particular table, and another that allows to write into another table subset. Assigning both these roles to a group of users will enable that group to do both operations. Then you assign a user to such a group and give immediate access to the combined functionality.

Note: It is common for Linux/Unix based operating systems (OS) to assign a user a dedicated group that has the same name as the user. You can use that specific group to assign particular roles at a user level. In practice though it is more common to create broader groups with more than one user to assign permissions in bulk.

Managing Roles

The syntax for creating (or removing) a role is straight forward:

CREATE ROLE <role_name>;
DROP ROLE <role_name>;

Using this command is equally simple:

Example: Creating a role

CREATE ROLE sales_analysts;

Note: When you drop a role, all of its assigned permissions are removed as well (as expected).

Privileges and Scope

For role-based access control, each role can be assigned a permission, called a privilege, to a specific object, at the scope of the object type. See Privileges for an explanation of all the privileges and scopes supported by The Okera Policy Engine.

Managing Data Access

The combination of role, privilege, and object is referred to as granting in SQL. Okera supports the following SQL statements to give or remove access to roles:

GRANT
    <privilege_level> [(<column_list>)]
    ON <object_type_name>
    TO [ROLE] <role_name>
    [WITH GRANT OPTION]

REVOKE
    [GRANT OPTION FOR]
    <privilege_level>
    ON <object_type_name>
    FROM [ROLE] <role_name>

SHOW
   ROLES [LIKE <pattern>]

<privilege_level>: {
    ALL
  | SELECT
  | CREATE
  | ALTER
  | INSERT
  | SHOW
}

<object_type_name>: {
    CATALOG
  | DATABASE <database_name>
  | TABLE <table_name>
  | URI <uri>
}

Further, the following commands can be used to add to and remove roles from user groups:

GRANT
    ROLE <role_name>
    TO GROUP <group_name>

REVOKE
    ROLE <role_name>
    FROM GROUP <group_name>

The following short examples show how these statements are used:

Example: Granting full read access for a table to a role

GRANT SELECT ON TABLE sales.transactions TO ROLE sales_analysts;

Example: Granting read access for specific columns of a table to a role

GRANT SELECT (txnid, sku, price) ON TABLE sales.transactions TO ROLE sales_analysts;

Example: Granting show access on a database to a role

GRANT SHOW ON DATABASE sales TO ROLE sales_analysts;

Combining a role with a group of users is shown in the next set of examples:

Example: Granting a role to a user group

GRANT ROLE sales_analysts TO GROUP test

Example: Granting a role to a specific user

GRANT ROLE admin_role TO GROUP janedoe

Example: show roles matching pattern

pattern is case insensitive. This includes star(*) as wild card to match any number of characters bar(|) for alternates. Either pattern matches

SHOW ROLES LIKE '*sales*'
SHOW ROLES LIKE 'sales*|marketing*'

Best Practices

Using roles, privileges, and permissions properly takes practice. The following sections discuss the best-practices as recommended by Okera, based on working with enterprise customers.

Role Mapping

Roles can be used complementary, that is, they can be combined to form an effective user access profile. Dependent on the business role a user is assigned to, it makes sense to define specific roles that are then assigned to the appropriate user groups.

Global Administrators

There are two ways of defining global administrators, which are those user that have unrestricted access to the entire ODAS cluster and its associated Catalog.

  1. By Configuration

    You can define a list of user groups that are given global administrative rights as part of the Deployment Manager configuration:

    Example: Defining specific user groups as global catalog administrators

    export OKERA_CATALOG_ADMINS="sysadmins,devadmins,dataadmins"
    
  2. By Role

    The other option is to define a role that grants all permissions to at the catalog level:

    Example: Granting global administrative rights to a role

    GRANT ALL ON CATALOG TO ROLE global_admin_role;
    GRANT ROLE global_admin_role TO GROUP admins;
    

Either way, through configuration or role, any user in such a group is considered a global administrator. The difference is that the latter can be modified without having to restart the ODAS cluster providing the Catalog services.

Data Stewards

With the advent of big data there has been an increase in the number of datasets an organization has to manage. The business role of data steward has been gaining importance in this context, as its tasks include the classification of datasets for use by regular data consumers, such as analysts or data scientists. In a way the data steward is an administrative role that needs to have access to all datasets, and being able to grant access to those datasets with varying levels of permissions.

Since data stewards are not system administrators, their scope can be limited as needed. For example, there could be multiple data steward roles for different parts of the business. The general approach is shown in the following example:

Example: Defining a global data steward role

CREATE ROLE data_stewards;
GRANT SELECT ON CATALOG TO ROLE data_stewards WITH GRANT OPTION;

Example: grant all on a database to steward role

GRANT ALL ON DATABASE TO ROLE data_stewards WITH GRANT OPTION;

Compartmentalized Data Access

It is common in practice to group databases and datasets together based on a specific business requirement. An example here is to group catalog objects by departments, such as sales, marketing, or administration. Within each group of objects it is recommended to split up responsibilities in group administrators and regular users. An analyst may also create datasets, often views over existing tables. So, (s)he may be given the create privilege. A producer role may be used for creating or altering table. An example would be an ETL user. This is achieved by defining two roles for each object grouping, while giving each the appropriate privileges:

Example: Defining admin and user roles for an object group

CREATE ROLE sales_admins;
CREATE ROLE sales_analysts;
CREATE ROLE sales_producer;

GRANT ALL ON DATABASE sales TO ROLE sales_admins;
GRANT ALL ON URI 's3://acme-data/department/sales' TO ROLE sales_admins;

GRANT SELECT ON DATABASE sales TO ROLE sales_analysts;
GRANT CREATE ON DATABASE sales TO ROLE sales_analysts;

GRANT CREATE ON DATABASE sales TO ROLE sales_producer;
GRANT INSERT ON DATABASE sales TO ROLE sales_producer;
GRANT ALTER ON DATABASE sales TO ROLE sales_producer;

The example creates a role for all admins of the sales department, and another role for all analysts that need to access the data. Note that since the sales administrator role has no global rights, it is necessary to grant it access to the URI pointing to where its data is located.

Public Data

Another use-case for roles is to define a shared role that is opening access to all public datasets.

Example: Defining a global role for access to all public datasets

CREATE ROLE public;

GRANT SELECT ON DATABASE publicdata TO ROLE public;
GRANT SELECT ON TABLE sales.transactions TO ROLE public;
...

The example creates the role and grants access to as many public datasets as required. Granting this role to any user group will immediately open access to all public datasets for reading. And since all public datasets are covered by one role, it is easy to add or remove a dataset and instantly have this reflected for all users.

Developing Functions

As explained in URI Objects, only administrators are allowed to register user-defined functions (UDFs). This is deliberately the case to protect the ODAS services from faulty, erroneous, or even malicious code being executed in the server processes. It is normal in practice for ODAS service to run for many days, weeks, or months before a scheduled maintenance is taking place. How can a developer design, implement, test, and deploy a new function or SerDe (see Extending ODAS) without disruption of production services?

Okera recommends a multi-step process that can solve this problem:

  1. Develop UDF using Hive

    Using Hive with the ADD JAR statement allows adding a resource under development in Hive only. The JAR will not be loaded into (or even be made aware to) the ODAS processes, and creating a view making use of it would execute in the context of Hive only. In this case ODAS will provide the lower-level tables only.

    Note: For users who use Hive provided by EMR that is integrated with ODAS, Okera recommends the use of cluster local databases to circumvent any interference with ODAS. An alternative is the use of external views, which are evaluated only in the context of Hive, which includes the UDF that is in development.

  2. Add to ODAS for long-term use

    Once the UDF is considered done, it can be moved into ODAS by uploading the resources (that is, the JAR file containing the code) to a release directory that is accessible by the ODAS processes (commonly a shared file system such as S3 in case of AWS setups) and declaring the function as usual.

Local Databases

Extending on the example of UDFs above, Okera recommends the use of cluster local databases for all temporary schema tasks. Okera recommends to define a proper, company-wide naming scheme for local databases, as they share the same namespace as other databases. Should a user have a local and managed database with the same name, the local one takes preference. This is intentionally allowing a developer to experiment with existing databases by temporarily overlaying them with something different. Otherwise it is advised to use specific names for local database, for example, by prefixing the name with "localdb_".

Public Role

Starting in ODAS 1.2.0, users automatically have the okera_public_role role regardless of the user’s groups. The only special property of this role is that all users have it, without needing to be granted the role. Otherwise, the role behaves like any other and catalog admins can grant and revoke privileges to and from this role. By default, this role has SELECT access to the database okera_sample which includes some sample data and diganostic views.

We expect this role to be used to grant access to the public portion of the catalog.