Skip to content

Apache Ranger Migration ( Preview Feature)

Okera provides a script to extract Hive, Hadoop Distributed File System (HDFS), and Starburst Enterprise (Trino) policies from an Apache Ranger policy server. The script connects to Ranger, queries for the policies, and generates equivalent Okera DDL. The script can also automatically run the resulting Okera DDL against a running Okera cluster.

You can migrate all of your Ranger policies or just portions of them at a time. After the initial migration has occurred, you can use Okera to maintain your policies instead. If you stop using Apache Ranger, this can be a one-time migration.

There are two reasons for performing such a migration:

  1. You might need to migrate your policies from Apache Ranger because you are migrating from one cloud platform to another (for example, moving data from databases to Amazon S3 or to another warehouse). Okera supports multiple cloud platforms.

  2. You might want to reduce the policy drift that occurs in Apache Ranger. With Ranger, policies are defined by data source type, so if you need the same policy applied to multiple different data sources, you have to define and maintain the policies multiple times, leading to policy drift. With Okera, policies are more flexible and can be applied to the entire catalog (with multiple data sources) as well as to individual databases, datasets (tables), URIs, roles, and data store connections. In addition, Okera's flexible attribute-based access control support uses tags that can be applied automatically to data as each table is registered to Okera.

Prerequisites

Before you can run the script, the following prerequisites must be met.

  • Okera 2.11 or later must be installed.
  • Ranger must be installed and integrated with HDFS, Hive, or Starburst.

Limitations

At this time, the migration script has the following limitations:

  • Only Hive, HDFS, and Starburst Enterprise policies can be migrated.
  • Apache Ranger data masking policies cannot be migrated.

Run the Script

The script is a Python script called ranger-sync.py. Okera provides it in a docker container.

To run the script, make sure the script run includes the argument --rm quay.io/repository/okera/ranger-extract-docker, followed by the latest 2.11 repository tag (currently 2.11.0.12). For example:

sudo docker run -it 
--rm quay.io/okera/ranger-extract-docker:2.11.0.12 
--server-api-url <address of your Ranger API> 
--ranger-user <username of Ranger user with READ access to Ranger policies> 
--ranger-password <password for Ranger user>   
--okera-host <host name of your Okera cluster> 
--okera-port <Okera planner port> 
--skip-security-zones <true / false> 
--dry-run <true / false> 

The sample script above connects to a Ranger server, extracts the Ranger policies, and generates and prints the appropriate Okera SQL commands on the console. The SQL commands can be copied from the console and manually run in the Okera Workspace. In the example above, the --skip-security-zones argument indicates that Ranger security zones should be skipped in the migration and the --dry-run argument indicates that the generated SQL commands should be printed on the console.

As input to the script, you can specify either the:

  • Ranger server API URL (with an appropriate Ranger username and password). To do this, use the --server-api-url, --ranger-user, and --ranger-password arguments in the script run.
  • Fully qualified path to a JSON file containing your Ranger policies (the JSON file can be generated using Ranger). To do this, use the --use-policy-name argument in the script run.

The script always reads the policies and generates appropriate Okera SQL commands. You can request one of two things to occur after the Okera SQL commands are generated.

  • The SQL commands can be printed on the console. To do this, specify true as the value for the --dry-run argument for the run.
  • The SQL commands can be automatically run in Okera. To do this specify false as the value for the --dry-run argument for the run.

Descriptions of the arguments you can use with this script are provided in the Script Argument Reference.

HDFS Policies

When the script migrates Ranger HDFS policies (policies on URIs), it maps them directly to Okera URI policies.

Hive Policies

When the script migrates Ranger Hive policies, it matches the Ranger Hive metastore (HMS) catalog to an Okera HMS catalog. Typically, this is accomplished by having Ranger and Okera share the same underlying HMS DB. This means that the database, table, and column names in these policies (that reference the HMS-backed Ranger server) are mapped to the same catalog names in Okera.

Starburst Enterprise (Presto/Trino) Policies

Starburst systems have a three-part catalog naming convention: catalog.schema.table. Okera, on the other hand, has a two-part catalog: db.table. Consequently, you can run the migration script in one of two modes to account for this difference:

  1. You can ignore the catalog part of the Starburst name In this mode, catalog.schema.table is mapped to schema.table in Okera. To enable this mode, use the --trino-catalog-behavior ignore option of the script.

  2. You can combine the catalog and schema part of the name. In this mode, catalog.schema.table is mapped to catalog__schema.table. This is useful, for example, when the existing system manages both Hive and Redshift catalogs. This is the default behavior. Configuration options and defaults for this mode are:

      --trino-catalog-behavior prefix
      --trino-catalog-separator "__"
    

Script Help

You can get help on running the script by running:

sudo docker run -it 
--rm quay.io/okera/ranger-extract-docker:2.11.0.12 
--help

Script Argument Reference

The following table describes all of the script arguments you can specify.

Argument Valid Values Default Required? Description
--abort-on-error true or false false No Indicates whether the script should abort if an error or unsupported policy is encountered (true) or should not abort if an error or unsupported policy is encountered (false).
--allow-data-masking true or false false No Indicates whether data masking policies should be extracted (true) or should not be extracted (false). This argument should not be changed at this time. Migration of data masking policies is not supported at this time.
--allow-deny true or false false No Indicates whether Ranger Deny policies should be extracted (true) or should not be extracted (false).
--allow-excludes-resources true or false true No Indicates whether excludes on resources should be extracted by the script (true) or should not be extracted by the script (false). This argument should not be changed at this time.
--allow-non-recursive-paths true or false false No Indicates whether non-recursive prefix paths should be processed by the script (true) or should not be processed by the script (false).
--allow-recursive-catalog true or false false No Indicates whether recursive catalog objects should be processed by the script (true) or should not be processed by the script (false).
--allow-row-filters true or false true No Indicates whether the script should extract row filter policies (true) or should not extract row filter policies (false).
--allow-wildcards true or false true No Indicates whether wildcards are supported (true) or are not supported (false). If wildcards are enabled in Okera, you can enable them here.
--canonicalize-user true or false true No Indicates whether user names should be normalized (canonicalized) by the script (true) or should not be normalized by the script (false).
--collapse-all true or false true No Indicates whether the Apache Ranger grants should be collapsed into a single grant (true) or if per-level grants should be created (false).
--docs true or false false No Displays the Readme file for the migration script on the console when set to true.
--drop-existing-okera-role true or false true No Indicates whether existing Okera roles should be dropped before they are recreated by the script (true) or should not be dropped before they are recreated by the script (false).
--dry-run true or false true No Indicates whether the SQL commands should be stored in a JSON file (true) or automatically applied to a running Okera cluster (false). This argument must be set to false if you want to apply the DDL to an Okera cluster.
--groups-blacklist No Specifies a comma-separated list of groups that should not be extracted. If no list is provided, all groups are extracted.
--groups-whitelist No Specifies a comma-separated list of groups that should be extracted. If no list is provided, all groups are extracted.
--handle-all-dbs-as-catalog true or false true No Indicates whether the script should treat all database wildcards as a catalog grant (true) or should not treat all database wildcards as a catalog grant (false).
--ignore-disabled true or false true No Indicates whether disabled policies should be extracted by the script(true) or should not be extracted by the script (false).
--ignore-owner true or false true No Indicates whether the script should ignore the OWNER that Ranger grants (true) or should not ignore the OWNER that Ranger grants (false).
--ignore-user true or false true No Indicates whether the script should ignore the USER that Ranger grants (true) or should not ignore the USER that Ranger grants (false).
--include-disabled-roles true or false false No Indicates whether disabled roles should be extracted by the script (true) or not (false).
--include-nested-roles true or false true No Indicates whether nested roles are extracted by the script (true) or are not extracted by the script (false).
--loglevel critical, debug, error, exception, fatal, info, warn warn No Specifies the level of logging that occurs for the migration script.
--map-create-to-create-as-owner true or false true No Indicates whether the script should convert the Ranger CREATE privilege to the CREATE AS OWNER privilege (true) or should not convert the Ranger CREATE privilege to the CREATE AS OWNER privilege (false).
--okera-group-role-suffix __group_role No Specifies a suffix that the extract script should append to implicit roles that are generated based on Ranger groups.
--okera-host localhost No Specifies the hostname of the Okera server. Only required if you want the script to apply the SQL to an Okera environment. If you just want the SQL statements and will run them yourselves, this is not required.
--okera-port 12050 No Specifies the port number of the Okera server.
--okera-role-prefix okera_policy_extract_ No Specifies a prefix used for roles created in Okera. This can be empty.
--okera-token Yes Specifies the access token required to authenticate with the Okera server.
--okera-user-role-suffix __user_role No Specifies a suffix that the extract script should append to implicit roles that are generated based on Ranger users.
--policy-file No Specifies the fully qualified path to a Ranger-extracted JSON policy file. The Okera migration script will use this JSON policy file for the migration instead of connecting to the Ranger server to perform the migration.
--ranger-password rangeradmin1 No Specifies the password associated with the Apache ranger user name specified by the --ranger-user argument. This argument is required only if Okera needs to connect to the Ranger server to extract the policies.
--ranger-user admin No Specifies the Apache Ranger user name used to extract policies from the Ranger server. This argument is required only if Okera needs to connect to the Ranger server to extract the policies. When specified, the user must have the ability to read the Ranger policies you want to migrate.
--roles-blacklist No Specifies a comma-separated list of roles that should not be extracted. If no list is provided, all roles are extracted.
--roles-whitelist No Specifies a comma-separated list of roles that should be extracted. If no list is provided, all roles are extracted.
--security-zone-names No Specifies a comma-separated list of Ranger zone names to extract. If this argument is not set, all zones are extracted.
--security-zone-use-as No Specifies the Ranger security zone to use when specifying the --policy-file and --security-zones-file arguments.
--security-zones-file No Specifies the fully qualified path to a Ranger-extracted JSON security zones file. The Okera migration script will use this JSON security zones file for the migration instead of connecting to the Ranger server to perform the migration.
--server-api-url http://0.0.0.0:6080/service/public/v2/api No Specifies the URL of the Ranger API server. Typically ends in /v2/api. The argument is required only if Okera needs to connect to the Ranger server to extract the policies. If you want the policies extracted from a JSON file (generated in Ranger and containing your Ranger policies), this argument is not necessary.
--skip-policies true or false false No Indicates whether the migration script should skip pulling roles from the Ranger policies API (true) or should not skip pulling roles from the Ranger policies API (false).
--skip-roles true or false false No Indicates whether the migration script should skip pulling roles from the Ranger roles API (true) or should not skip pulling roles from the Ranger roles API (false).
--skip-security-zones true or false false No Indicates whether the migration script should skip pulling roles from the Ranger security zones API (true) or should not skip pulling roles from the Ranger security zones API (false).
--trino-catalog No For migration script runs against Starburst policies only, this argument specifies the list of fully qualified Starburst catalogs to expand to. For testing. A value does not need to be specified.
--trino-catalog-behavior prefix or ignore prefix No For migration script runs against Starburst policies only, this argument specifies whether the catalog and schema names should be combined (prefix) or the catalog name should be ignored (ignore). When combined, the –trino-catalog-separator argument specifies the connecting character between the catalog and schema names.
--trino-catalog-file No For migration script runs against Starburst policies only, this argument specifies the file containing JSON Starburst catalog mappings. A value does not need to be specified.
--trino-catalog-separator . No For migration script runs against Starburst policies only, this argument specifies the separator that should be used by the Okera migration script to combine the catalog and schema names.
--trino-ignore-information-schema true or false true No For migration script runs against Starburst policies only, this argument indicates whether policies for information_schemas should be ignored (true) should not be ignored (false).
--use-policy-name true or false true No Indicates whether the Ranger policy name should be used to autogenerate Okera role names (true) or not (false).
--users-blacklist No Specifies a comma-separated list of users that should not be extracted. If no list is provided, all users are extracted.
--users-whitelist No Specifies a comma-separated list of users that should be extracted. If no list is provided, all users are extracted.