Role-Based Access Control (RBAC)¶

This document provides information about how role-based access control works in the Okera Policy Engine. For information about attribute-based access control (ABAC), see Attribute-Based Access Control (ABAC).*

As introduced in the Catalog Overview documentation, the Okera Catalog services include the Okera Policy Engine, which is used throughout the platform to power the self-service authorization process. The Okera platform is in many ways similar to an RDBMS, as it exposes data primarily through a table structure, no matter what the underlying storage technology provides. Okera also exposes RDBMS-like functionality when it comes to authorizing access to data.

By default, no data is accessible when the Okera platform is initially installed. Instead, administrators of the platform (who must be configured at the cluster-level) can grant access to objects in a variety of ways. The administrator can also delegate administrative rights of an object and its child objects to regular users or groups of users. For example, an administrator can grant full access and administrative rights for the database "sales" to the "salesadmin" user group. The Policy Engine supports this by providing a very flexible approach, called role-based access control (RBAC).

Access Control¶

Instead of linking users or groups of users directly to resources, such as database or datasets, the Policy Engine decouples the former from the latter using roles. These are arbitrarily named collections of permissions that pertained objects in the Schema Registry. For instance, you could create a role "admins" that has full access to the entire catalog, which subsumes all objects contained. Or you could create a role for just a particular purpose inside the sales department, such as being able to read and write all tables necessary for customer relationship management (CRM). The following diagram shows how users are commonly grouped together and then assigned roles from the Policy Engine.

Role-based Access Control

The advantage of decoupling roles from the user groups is that the roles can be combined to assign all the required permissions to the users of a particular group. You can define a role that can read a subset of a particular table, and another that allows to write into another table subset. Assigning both these roles to a group of users will enable that group to do both operations. Then you assign a user to such a group and give immediate access to the combined functionality.

Note: It is common for Linux/Unix based operating systems (OS) to assign a user a dedicated group that has the same name as the user. You can use that specific group to assign particular roles at a user level. In practice though it is more common to create broader groups with more than one user to assign permissions in bulk.

Managing Roles¶

The syntax for creating (or removing) a role is straight forward:

CREATE ROLE <role_name>;
DROP ROLE <role_name>;

Using this command is equally simple:

Example: Creating a role

CREATE ROLE sales_analysts;

Note: When you drop a role, all of its assigned permissions are removed as well (as expected).

Privileges and Scope¶

For role-based access control, each role can be assigned a permission, called a privilege, to a specific object, at the scope of the object type. See Privileges for an explanation of all the privileges and scopes supported by The Okera Policy Engine.

Managing Data Access¶

The combination of role, privilege, and object is referred to as granting in SQL. Okera supports the following SQL statements to give or remove access to roles:

GRANT
    <privilege_level> [(<column_list>)]
    ON <object_type_name>
    TO [ROLE] <role_name>
    [WITH GRANT OPTION]

REVOKE
    [GRANT OPTION FOR]
    <privilege_level>
    ON <object_type_name>
    FROM [ROLE] <role_name>

SHOW
   ROLES [LIKE <pattern>]

SHOW GRANT <principal_type> <principal>

<privilege_level>: {
    ALL
  | SELECT
  | CREATE
  | ALTER
  | INSERT
  | SHOW
}

<object_type_name>: {
    CATALOG
  | DATABASE <database_name>
  | TABLE <dataset_name>
  | URI <uri>
}

<principal_type>: {
    ROLE
  | GROUP
  | USER
}

Note: Revoking permissions is always done in a cascading manner. This means when, for example, revoking permission at the database level, all dataset level permissions are also removed.

In addition, the following commands can be used to add to and remove roles from user groups:

GRANT
    ROLE <role_name>
    TO GROUP <group_name>

REVOKE
    ROLE <role_name>
    FROM GROUP <group_name>

The following short examples show how these statements are used:

Example: Granting full read access for a table to a role

GRANT SELECT ON TABLE sales.transactions TO ROLE sales_analysts;

Example: Granting read access for specific columns of a table or a view to a role

GRANT SELECT (txnid, sku, price) ON TABLE sales.transactions TO ROLE sales_analysts;

Example: Granting show access on a database to a role

GRANT SHOW ON DATABASE sales TO ROLE sales_analysts;

Combining a role with a group of users is shown in the next set of examples:

Example: Granting a role to a user group

GRANT ROLE sales_analysts TO GROUP test

Example: Granting a role to a specific user

GRANT ROLE admin_role TO GROUP janedoe

Example: Showing roles matching pattern

Note: The pattern is case insensitive. It supports star ("*") as a wild card to match any number of characters and bar("|") for specifying more than one pattern (where any of them may match and therefore apply).

SHOW ROLES LIKE '*sales*'
SHOW ROLES LIKE 'sales*|marketing*'

Example: Showing grants for a particular role

SHOW GRANT ROLE admin_role

Example: Showing grants for a particular group

SHOW GRANT GROUP admin

Example: Showing grants for a particular user

SHOW GRANT USER okera

Best Practices¶

Using roles, privileges, and permissions properly takes practice. The following sections discuss the best-practices as recommended by Okera, based on working with enterprise customers.

Role Mapping¶

Roles can be used complementary, that is, they can be combined to form an effective user access profile. Dependent on the business role a user is assigned to, it makes sense to define specific roles that are then assigned to the appropriate user groups.

Global Administrators¶

See Administration Basics for information on how to configure one or more global administrator. They have the authority to create the roles discussed here, while being able to delegate administration further or manage other parts of the Okera system. Typically there is only a small, selected group of people that have global administrative permissions while day-to-day tasks are handled by other roles, such as the next one.

Data Stewards¶

With the advent of big data there has been an increase in the number of datasets an organization has to manage. The business role of data steward has been gaining importance in this context, as its tasks include the classification of datasets for use by regular data consumers, such as analysts or data scientists. In a way the data steward is an administrative role that needs to have access to all datasets, and being able to grant access to those datasets with varying levels of permissions.

Since data stewards are not system administrators, their scope can be limited as needed. For example, there could be multiple data steward roles for different parts of the business. The general approach is shown in the following example:

Example: Defining a global data steward role

CREATE ROLE data_stewards;
GRANT SELECT ON CATALOG TO ROLE data_stewards WITH GRANT OPTION;

Example: grant all on a database to steward role

GRANT ALL ON DATABASE marketing TO ROLE data_stewards WITH GRANT OPTION;

Compartmentalized Data Access¶

It is common in practice to group databases and datasets together based on a specific business requirement. An example here is to group catalog objects by departments, such as sales, marketing, or administration. Within each group of objects it is recommended to split up responsibilities in group administrators and regular users. An analyst may also create datasets, often views over existing tables. So, (s)he may be given the create privilege. A producer role may be used for creating or altering table. An example would be an ETL user. This is achieved by defining two roles for each object grouping, while giving each the appropriate privileges:

Example: Defining admin and user roles for an object group

CREATE ROLE sales_admins;
CREATE ROLE sales_analysts;
CREATE ROLE sales_producer;

GRANT ALL ON DATABASE sales TO ROLE sales_admins;
GRANT ALL ON URI 's3://acme-data/department/sales' TO ROLE sales_admins;

GRANT SELECT ON DATABASE sales TO ROLE sales_analysts;
GRANT CREATE ON DATABASE sales TO ROLE sales_analysts;

GRANT CREATE ON DATABASE sales TO ROLE sales_producer;
GRANT INSERT ON DATABASE sales TO ROLE sales_producer;
GRANT ALTER ON DATABASE sales TO ROLE sales_producer;

The example creates a role for all admins of the sales department, and another role for all analysts that need to access the data. Note that since the sales administrator role has no global rights, it is necessary to grant it access to the URI pointing to where its data is located.

Public Data¶

Another use-case for roles is to define a shared role that is opening access to all public datasets.

Example: Defining a global role for access to all public datasets

CREATE ROLE public;

GRANT SELECT ON DATABASE publicdata TO ROLE public;
GRANT SELECT ON TABLE sales.transactions TO ROLE public;
...

The example creates the role and grants access to as many public datasets as required. Granting this role to any user group will immediately open access to all public datasets for reading. And since all public datasets are covered by one role, it is easy to add or remove a dataset and instantly have this reflected for all users.

Developing Functions¶

As explained in URI Objects, only administrators are allowed to register user-defined functions (UDFs). This is deliberately the case to protect the Okera services from faulty, erroneous, or even malicious code being executed in the server processes. It is normal in practice for Okera service to run for many days, weeks, or months before a scheduled maintenance is taking place. How can a developer design, implement, test, and deploy a new function or SerDe (see Extending Okera) without disruption of production services?

Okera recommends a multi-step process that can solve this problem:

Develop UDF using Hive

Using Hive with the ADD JAR statement allows adding a resource under development in Hive only. The JAR will not be loaded into (or even be made aware to) the Okera processes, and creating a view making use of it would execute in the context of Hive only. In this case Okera will provide the lower-level tables only.

Note: For users who use Hive provided by Amazon EMR that is integrated with Okera, Okera recommends the use of cluster local databases to circumvent any interference with Okera. An alternative is the use of external views, which are evaluated only in the context of Hive, which includes the UDF that is in development.
Add to Okera for long-term use

Once the UDF is considered done, it can be moved into Okera by uploading the resources (that is, the JAR file containing the code) to a release directory that is accessible by the Okera processes (commonly a shared file system such as Amazon S3 in case of AWS setups) and declaring the function as usual.

Local Databases¶

Extending on the example of UDFs above, Okera recommends the use of cluster local databases for all temporary schema tasks. Okera recommends to define a proper, company-wide naming scheme for local databases, as they share the same namespace as other databases. Should a user have a local and managed database with the same name, the local one takes preference. This is intentionally allowing a developer to experiment with existing databases by temporarily overlaying them with something different. Otherwise it is advised to use specific names for local database, for example, by prefixing the name with "localdb_".

Public Role¶

Starting in Okera 1.2.0, users automatically have the okera_public_role role regardless of the user's groups. The only special property of this role is that all users have it, without needing to be granted the role. Otherwise, the role behaves like any other and catalog admins can grant and revoke privileges to and from this role. By default, this role has SELECT access to the database okera_sample which includes some sample data and diagnostic views.

We expect this role to be used to grant access to the public portion of the catalog.