Apache Ranger Migration ^{( Preview Feature)}¶

Okera provides a script to extract Hive, Hadoop Distributed File System (HDFS), and Starburst Enterprise (Trino) policies from an Apache Ranger policy server. The script connects to Ranger, queries for the policies, and generates equivalent Okera DDL. The script can also automatically run the resulting Okera DDL against a running Okera cluster.

You can migrate all of your Ranger policies or just portions of them at a time. After the initial migration has occurred, you can use Okera to maintain your policies instead. If you stop using Apache Ranger, this can be a one-time migration.

There are two reasons for performing such a migration:

You might need to migrate your policies from Apache Ranger because you are migrating from one cloud platform to another (for example, moving data from databases to Amazon S3 or to another warehouse). Okera supports multiple cloud platforms.
You might want to reduce the policy drift that occurs in Apache Ranger. With Ranger, policies are defined by data source type, so if you need the same policy applied to multiple different data sources, you have to define and maintain the policies multiple times, leading to policy drift. With Okera, policies are more flexible and can be applied to the entire catalog (with multiple data sources) as well as to individual databases, datasets (tables), URIs, roles, and data store connections. In addition, Okera's flexible attribute-based access control support uses tags that can be applied automatically to data as each table is registered to Okera.

Prerequisites¶

Before you can run the script, the following prerequisites must be met.

Okera 2.11 or later must be installed.
Ranger must be installed and integrated with HDFS, Hive, or Starburst.

Limitations¶

At this time, the migration script has the following limitations:

Only Hive, HDFS, and Starburst Enterprise policies can be migrated.
Apache Ranger data masking policies cannot be migrated.

Run the Script¶

The script is a Python script called ranger-sync.py. Okera provides it in a docker container.

To run the script, make sure the script run includes the argument --rm quay.io/repository/okera/ranger-extract-docker, followed by the latest 2.11 repository tag (currently 2.11.0.12). For example:

sudo docker run -it 
--rm quay.io/okera/ranger-extract-docker:2.11.0.12 
--server-api-url <address of your Ranger API> 
--ranger-user <username of Ranger user with READ access to Ranger policies> 
--ranger-password <password for Ranger user>   
--okera-host <host name of your Okera cluster> 
--okera-port <Okera planner port> 
--skip-security-zones <true / false> 
--dry-run <true / false>

The sample script above connects to a Ranger server, extracts the Ranger policies, and generates and prints the appropriate Okera SQL commands on the console. The SQL commands can be copied from the console and manually run in the Okera Workspace. In the example above, the --skip-security-zones argument indicates that Ranger security zones should be skipped in the migration and the --dry-run argument indicates that the generated SQL commands should be printed on the console.

As input to the script, you can specify either the:

Ranger server API URL (with an appropriate Ranger username and password). To do this, use the --server-api-url, --ranger-user, and --ranger-password arguments in the script run.
Fully qualified path to a JSON file containing your Ranger policies (the JSON file can be generated using Ranger). To do this, use the --use-policy-name argument in the script run.

The script always reads the policies and generates appropriate Okera SQL commands. You can request one of two things to occur after the Okera SQL commands are generated.

The SQL commands can be printed on the console. To do this, specify true as the value for the --dry-run argument for the run.
The SQL commands can be automatically run in Okera. To do this specify false as the value for the --dry-run argument for the run.

Descriptions of the arguments you can use with this script are provided in the Script Argument Reference.

HDFS Policies¶

When the script migrates Ranger HDFS policies (policies on URIs), it maps them directly to Okera URI policies.

Hive Policies¶

When the script migrates Ranger Hive policies, it matches the Ranger Hive metastore (HMS) catalog to an Okera HMS catalog. Typically, this is accomplished by having Ranger and Okera share the same underlying HMS DB. This means that the database, table, and column names in these policies (that reference the HMS-backed Ranger server) are mapped to the same catalog names in Okera.

Starburst Enterprise (Presto/Trino) Policies¶

Starburst systems have a three-part catalog naming convention: catalog.schema.table. Okera, on the other hand, has a two-part catalog: db.table. Consequently, you can run the migration script in one of two modes to account for this difference:

You can ignore the catalog part of the Starburst name In this mode, catalog.schema.table is mapped to schema.table in Okera. To enable this mode, use the --trino-catalog-behavior ignore option of the script.
You can combine the catalog and schema part of the name. In this mode, catalog.schema.table is mapped to catalog__schema.table. This is useful, for example, when the existing system manages both Hive and Redshift catalogs. This is the default behavior. Configuration options and defaults for this mode are:
```
  --trino-catalog-behavior prefix
  --trino-catalog-separator "__"
```

Script Help¶

You can get help on running the script by running:

sudo docker run -it 
--rm quay.io/okera/ranger-extract-docker:2.11.0.12 
--help

Script Argument Reference¶

The following table describes all of the script arguments you can specify.

Argument	Valid Values	Default	Required?	Description
`--abort-on-error`	`true` or `false`	`false`	No	Indicates whether the script should abort if an error or unsupported policy is encountered (`true`) or should not abort if an error or unsupported policy is encountered (`false`).
`--allow-data-masking`	`true` or `false`	`false`	No	Indicates whether data masking policies should be extracted (`true`) or should not be extracted (`false`). This argument should not be changed at this time. Migration of data masking policies is not supported at this time.
`--allow-deny`	`true` or `false`	`false`	No	Indicates whether Ranger Deny policies should be extracted (`true`) or should not be extracted (`false`).
`--allow-excludes-resources`	`true` or `false`	`true`	No	Indicates whether excludes on resources should be extracted by the script (`true`) or should not be extracted by the script (`false`). This argument should not be changed at this time.
`--allow-non-recursive-paths`	`true` or `false`	`false`	No	Indicates whether non-recursive prefix paths should be processed by the script (`true`) or should not be processed by the script (`false`).
`--allow-recursive-catalog`	`true` or `false`	`false`	No	Indicates whether recursive catalog objects should be processed by the script (`true`) or should not be processed by the script (`false`).
`--allow-row-filters`	`true` or `false`	`true`	No	Indicates whether the script should extract row filter policies (`true`) or should not extract row filter policies (`false`).
`--allow-wildcards`	`true` or `false`	`true`	No	Indicates whether wildcards are supported (`true`) or are not supported (`false`). If wildcards are enabled in Okera, you can enable them here.
`--canonicalize-user`	`true` or `false`	`true`	No	Indicates whether user names should be normalized (canonicalized) by the script (`true`) or should not be normalized by the script (`false`).
`--collapse-all`	`true` or `false`	`true`	No	Indicates whether the Apache Ranger grants should be collapsed into a single grant (`true`) or if per-level grants should be created (`false`).
`--docs`	`true` or `false`	`false`	No	Displays the Readme file for the migration script on the console when set to `true`.
`--drop-existing-okera-role`	`true` or `false`	`true`	No	Indicates whether existing Okera roles should be dropped before they are recreated by the script (`true`) or should not be dropped before they are recreated by the script (`false`).
`--dry-run`	`true` or `false`	`true`	No	Indicates whether the SQL commands should be stored in a JSON file (`true`) or automatically applied to a running Okera cluster (`false`). This argument must be set to `false` if you want to apply the DDL to an Okera cluster.
`--groups-blacklist`	–	–	No	Specifies a comma-separated list of groups that should not be extracted. If no list is provided, all groups are extracted.
`--groups-whitelist`	–	–	No	Specifies a comma-separated list of groups that should be extracted. If no list is provided, all groups are extracted.
`--handle-all-dbs-as-catalog`	`true` or `false`	`true`	No	Indicates whether the script should treat all database wildcards as a catalog grant (`true`) or should not treat all database wildcards as a catalog grant (`false`).
`--ignore-disabled`	`true` or `false`	`true`	No	Indicates whether disabled policies should be extracted by the script(`true`) or should not be extracted by the script (`false`).
`--ignore-owner`	`true` or `false`	`true`	No	Indicates whether the script should ignore the OWNER that Ranger grants (`true`) or should not ignore the OWNER that Ranger grants (`false`).
`--ignore-user`	`true` or `false`	`true`	No	Indicates whether the script should ignore the USER that Ranger grants (`true`) or should not ignore the USER that Ranger grants (`false`).
`--include-disabled-roles`	`true` or `false`	`false`	No	Indicates whether disabled roles should be extracted by the script (`true`) or not (`false`).
`--include-nested-roles`	`true` or `false`	`true`	No	Indicates whether nested roles are extracted by the script (`true`) or are not extracted by the script (`false`).
`--loglevel`	`critical`, `debug`, `error`, `exception`, `fatal`, `info`, `warn`	`warn`	No	Specifies the level of logging that occurs for the migration script.
`--map-create-to-create-as-owner`	`true` or `false`	`true`	No	Indicates whether the script should convert the Ranger CREATE privilege to the CREATE AS OWNER privilege (`true`) or should not convert the Ranger CREATE privilege to the CREATE AS OWNER privilege (`false`).
`--okera-group-role-suffix`	—	`__group_role`	No	Specifies a suffix that the extract script should append to implicit roles that are generated based on Ranger groups.
`--okera-host`	—	`localhost`	No	Specifies the hostname of the Okera server. Only required if you want the script to apply the SQL to an Okera environment. If you just want the SQL statements and will run them yourselves, this is not required.
`--okera-port`	—	`12050`	No	Specifies the port number of the Okera server.
`--okera-role-prefix`	—	`okera_policy_extract_`	No	Specifies a prefix used for roles created in Okera. This can be empty.
`--okera-token`	—	—	Yes	Specifies the access token required to authenticate with the Okera server.
`--okera-user-role-suffix`	–	`__user_role`	No	Specifies a suffix that the extract script should append to implicit roles that are generated based on Ranger users.
`--policy-file`	–	–	No	Specifies the fully qualified path to a Ranger-extracted JSON policy file. The Okera migration script will use this JSON policy file for the migration instead of connecting to the Ranger server to perform the migration.
`--ranger-password`	—	`rangeradmin1`	No	Specifies the password associated with the Apache ranger user name specified by the `--ranger-user` argument. This argument is required only if Okera needs to connect to the Ranger server to extract the policies.
`--ranger-user`	—	`admin`	No	Specifies the Apache Ranger user name used to extract policies from the Ranger server. This argument is required only if Okera needs to connect to the Ranger server to extract the policies. When specified, the user must have the ability to read the Ranger policies you want to migrate.
`--roles-blacklist`	–	–	No	Specifies a comma-separated list of roles that should not be extracted. If no list is provided, all roles are extracted.
`--roles-whitelist`	—	—	No	Specifies a comma-separated list of roles that should be extracted. If no list is provided, all roles are extracted.
`--security-zone-names`	—	—	No	Specifies a comma-separated list of Ranger zone names to extract. If this argument is not set, all zones are extracted.
`--security-zone-use-as`	–	–	No	Specifies the Ranger security zone to use when specifying the `--policy-file` and `--security-zones-file` arguments.
`--security-zones-file`	–	–	No	Specifies the fully qualified path to a Ranger-extracted JSON security zones file. The Okera migration script will use this JSON security zones file for the migration instead of connecting to the Ranger server to perform the migration.
`--server-api-url`	—	`http://0.0.0.0:6080/service/public/v2/api`	No	Specifies the URL of the Ranger API server. Typically ends in `/v2/api`. The argument is required only if Okera needs to connect to the Ranger server to extract the policies. If you want the policies extracted from a JSON file (generated in Ranger and containing your Ranger policies), this argument is not necessary.
`--skip-policies`	`true` or `false`	`false`	No	Indicates whether the migration script should skip pulling roles from the Ranger policies API (`true`) or should not skip pulling roles from the Ranger policies API (`false`).
`--skip-roles`	`true` or `false`	`false`	No	Indicates whether the migration script should skip pulling roles from the Ranger roles API (`true`) or should not skip pulling roles from the Ranger roles API (`false`).
`--skip-security-zones`	`true` or `false`	`false`	No	Indicates whether the migration script should skip pulling roles from the Ranger security zones API (`true`) or should not skip pulling roles from the Ranger security zones API (`false`).
`--trino-catalog`	–	–	No	For migration script runs against Starburst policies only, this argument specifies the list of fully qualified Starburst catalogs to expand to. For testing. A value does not need to be specified.
`--trino-catalog-behavior`	`prefix` or `ignore`	`prefix`	No	For migration script runs against Starburst policies only, this argument specifies whether the catalog and schema names should be combined (`prefix`) or the catalog name should be ignored (`ignore`). When combined, the `–trino-catalog-separator` argument specifies the connecting character between the catalog and schema names.
`--trino-catalog-file`	–	–	No	For migration script runs against Starburst policies only, this argument specifies the file containing JSON Starburst catalog mappings. A value does not need to be specified.
`--trino-catalog-separator`	–	`.`	No	For migration script runs against Starburst policies only, this argument specifies the separator that should be used by the Okera migration script to combine the catalog and schema names.
`--trino-ignore-information-schema`	`true` or `false`	`true`	No	For migration script runs against Starburst policies only, this argument indicates whether policies for `information_schemas` should be ignored (`true`) should not be ignored (`false`).
`--use-policy-name`	`true` or `false`	`true`	No	Indicates whether the Ranger policy name should be used to autogenerate Okera role names (`true`) or not (`false`).
`--users-blacklist`	–	–	No	Specifies a comma-separated list of users that should not be extracted. If no list is provided, all users are extracted.
`--users-whitelist`	–	–	No	Specifies a comma-separated list of users that should be extracted. If no list is provided, all users are extracted.

Apache Ranger Migration ( Preview Feature)¶