The Data page lets you browse, filter, and take actions on Okera databases and datasets. Only data objects for which you have access appear on this page.
Admins and users with certain permissions can also create databases, edit metadata, and manage access to data objects from this page. You can learn more about these actions in Taking Actions on Data and Managing Permissions on Data. You can also review user inactivity for databases and data sets using the User Inactivity Analysis tab.
Browse and View Data¶
When the Data page first displays, a list of all the databases to which you have access is shown. Each database card displays the database name, an optional description, the number of datasets within it, and any tags assigned the database.
Select a database name to obtain more information about it. The database's information page appears. To return to the main Data page, select Databases in the header or Data in the menu.
The database description and any tags assigned it are shown at the top of the database information page. Admins or users with certain permissions can edit this information. To learn more about this, see Editing Metadata.
Select the Datasets tab to see a list of all the datasets contained in the database. Only datasets to which you have access are shown.
Select the Permissions tab to see a list of all the roles that have been granted access to this database. You can learn more about these permissions in Managing Permissions on Data. Only admins or users with the ability to grant access to this database can see the Permissions tab.
Select the Details tab to see additional information about the database, such as its location and creator.
Select the User Inactivity Analysis tab to review user inactivity information for the database. See User Inactivity Analysis.
Datasets are listed on the Datasets tab and in dataset search results. To learn more about dataset search, see Searching Across Datasets.
Each dataset card shows the dataset name, description, type (whether it is a table, internal view, or external view), source, any tags assigned it, and the time it was last updated. For more information about a dataset, select its name. The dataset detail information page appears.
The dataset description and any tags assigned it are shown at the top of the dataset detail information page. Admins or users with certain permissions can edit this information. To learn more, see Editing Metadata.
Select the Schema tab to see a list of all the columns in the dataset. For more details, see Understanding Schemas.
Select the Permissions tab to see a list of all the roles that have been granted some level of access to the dataset. You can learn more about these permissions in Managing Permissions on Data. Only admins or users with the ability to grant access to this dataset can see the Permissions tab.
Select the Details tab to see additional information about the dataset, such as its location, creator, and any relevant view lineage. For more details about view lineage, see Understanding View Lineage.
Select the User Inactivity Analysis tab to review user inactivity information for the dataset. See User Inactivity Analysis.
The column names in the data are listed on the Schema tab along with each column’s type, any tags assigned the column, and an optional comment. Admins or users with certain permissions can tag columns and edit comments. To learn more about this, see Editing Metadata.
The Access column in the schema indicates whether or not you have access to a column. A green checkmark in this column indicates that you have access to this column while a lock icon indicates that you do not.
Columns for which you do not have access will not appear when you preview the dataset. For more details about previewing, see Previewing Datasets.
A column name with a grey background indicates that this column is a partitioning column.
Complex types appear as
STRUCT in the Type column of the Schema tab.
What Does an
admin Tag Mean?¶
Depending on your permissions, you may see a grey
admin tag on your databases and datasets. This tag indicates that you have ALL privileges on this data object. In other words, you have full and unrestricted access to this data object and any of its descendants.
Admins can see all data marked with this tag. For more about administrative usage, see Portal for Admins.
Search Database Page Lists¶
You can search and filter the Database page lists for databases, for datasets within databases, for columns within datasets, and for datasets across the entire catalog.
Search for Databases¶
To locate a specific database or group of databases, use the filters at the top of the main Databases page.
Use the box on the top left to search by a specified database name (or partial name). Use the box on the top right to select and search by database-level tag. The list is filtered to show only databases that include the specified database name characters or the selected tags.
To see only databases for which you have ALL privileges, select the Admin privileges checkbox.
Search Within Databases for Datasets¶
After selecting a database, you can search it for a specific dataset or group of datasets.
You can search datasets by name or by dataset-level tags. Use the box on the top left to search by a specified dataset name (or partial name). Use the box on the top right to select and search by dataset-level tag. The list is filtered to show only datasets that include the specified dataset name characters or the selected tags.
To show only starred datasets, select the Starred checkbox. To show only datasets for which you have ALL privileges, select the Admin privileges checkbox.
Search for Specific Dataset Columns¶
After selecting a dataset, you can search for specific column names within a dataset’s schema using the search bar on the Schema tab. Use the search box at the top of the tab to specify a column name (or partial name).
The list is filtered to show only columns that include the specified column name characters.
To show only tagged columns, select Only show tagged columns.
Searching For Datasets Across the Catalog¶
To search for datasets across the entire catalog (without first choosing a database), select Search all datasets in the top right corner of the Databases page.
Use this to search multiple databases for datasets by name or tag. Use the box on the top left to optionally select one or more databases for the search. Use the box in the middle to search by a specified dataset name (or partial name). Use the box on the top right to select and search by a tag.
Note: The tag search filters on all tags, regardless of where they are applied. See Filtering by Tag.
To show only starred datasets, select the Starred checkbox. To show only databases and datasets for which you have
ALL privileges, select the Admin privileges checkbox.
Filter by Tag¶
When you search all datasets by tag, you may see results for any of the following:
- Datasets that are tagged with the relevant tag
- Datasets that contain columns tagged with the relevant tag
- Datasets that are within databases tagged with the relevant tag
Create a New Database¶
Admins and users with
CREATE_AS_OWNER privileges on the catalog can create databases. To create a database, select on the main Data page.
A dialog appears prompting you to supply a name and optional description for your database. After supplying them, select . After the database is created, another dialog prompts you to either go to the new database or create another one.
To add datasets to your newly created database, go to the Registration page. See Data Registration.
Admins, users with
ALL privileges on a data object, or users with
ALTER privileges on a data object, can edit the object’s description or comment at the top of the Data page.
Additionally, admins, users with
ALL privileges on a data object, or users with
REMOVE_ATTRIBUTE privileges on a data object can edit the object's tags. Tags can be added and edited at the database, dataset, or column level.
Datasets can be previewed using the preview button () at the end of a dataset card or on a dataset’s information page.
The preview displays up to 100 rows and only shows columns to which you have access. Data in preview mode may also appear masked or obscured if a transform function has been applied to it.
Filter by Column¶
Use the Filter by column dropdown at the top left of the preview window to filter by certain columns.
Preview Complex Types¶
When previewing a table with complex types, the data displays as an expandable JSON structure. Select the '+' icon to expand the cell and reveal the data values.
Preview Partitioned Data¶
To speed up preview for large tables with many partitions, the preview only shows results from the last partition. If the preview is empty, verify that there is data in the last partition.
Create a View of Data¶
Admins can create internal views of datasets. Users with
ALL privileges for a database can also create internal views of any datasets they have
ALL access to. These views can only be created in databases for which the user has
To create a new view from a dataset, select Create internal view on the dataset’s information page.
Provide a name for your view in the View name box and to select a database in which to create it. Okera defaults to creating the view in the database of the base table. Then select which columns from the base table you want included in this new view. Finally select .
Understand View Lineage¶
Admins or users with
ALL privileges for a dataset can see view lineage information on the dataset’s Details tab.
View lineage currently only shows direct parents and children, as well as root base table information. It does not list grandparent or grandchild information in cases where the descendant chain is long.
Note: Only lineage for views created through Okera are visible. If Okera is connected to an external metastore, and views were created inside that metastore bypassing Okera, their view lineage is not displayed. In addition, view lineage is only present for views created through Okera after the 2.1 release. Finally, dropping a dataset (table or view) and recreating it removes its view lineage.
Identify Favorite Datasets¶
Select the star next to a dataset’s name to add that dataset to your starred datasets list (your favorites) for easy access. A list of your recently starred datasets appears on the homepage. You can also filter datasets by Starred.
View Dataset Usage¶
To start using a dataset, select Use on the dataset details page.
Sections of different sample code display: Spark, SQL, Python, R, and cURL. Select the relevant tab to copy (select Copy) the integration code suitable for your application.
Manage Data Permissions¶
Admins can view and manage all permissions by default. For a non-admin user to view and manage permissions, two conditions must be met.
The user must have
GRANTauthorization for at least one data object. You can give a user
GRANTauthorization by selecting the Include ability to grant checkbox when creating a permission.
The user's group must be assigned to the role "okera_policy_management_role".
Users with the ability to view permissions see a Permissions tab for the database or dataset list on the data object pages.
The Permissions tab lists all the roles with access to the relevant database or dataset. The table also lists the groups assigned to each role, as well as the specific permissions that this role has on the data object. It also indicates whether the permission is enabled or disabled for a role.
Select a role name to view it in the Roles page and see its full permissions.
Note: Some permissions may be direct (a role has been granted access directly to this data object), while others may be inherited (a role has access to the entire data catalog, and thus has access to this data object by default).
Admins and users with authorization to view permissions can add permissions on the Data page. Select at the top of the Permissions tab to add a new permission.
Select an access level and the role to which permission should be granted. For more information about access levels, see Privileges. For more information about granting access in general, see Creating Permissions in the UI.
Admins and users with authorization to view permissions can edit permissions using the Data page. To edit a permission, select in the same row as the permission.
The Edit permission dialog appears. Make your changes and select .
Admins and users with authorization to view and grant permissions to a database or dataset can also delete permissions for them. To delete a permission, select in the same row as the permission.
A dialog appears prompting you to confirm the deletion.
Select to delete the permission.
User Inactivity Analysis¶
The User Inactivity Analysis tab shows the User Inactivity report, which lists users who have access to selected data objects, but have not queried the data objects within a given timeframe. This lets you review users who are not using their access to data and who should possibly have this access revoked.
Set the Report Time Range¶
Select a range for the report in the range box or specify a custom range by selecting the Use custom range checkbox and supply a custom range in the range box. If you specify a custom range, select either range in seconds or range in days to indicate the range time units. The specified time range is used to filter the report.
For this report, "having access" to a data object means that a user:
- has read access on the relevant database/dataset,
- or has read access on an object within the relevant database/dataset (e.g. a column),
- or has inherited read access on the relevant database/dataset due to having read access on a higher-level object (for example, a user who has
ALLaccess on the entire catalog)
Note: Users who only have authorization to view or edit data object metadata but do not have authorization to view the data itself do not appear in this list.
The report has several columns:
- User: The username of a user who has not accessed the relevant database or dataset within the specified timeframe.
- Last accessed: The last recorded time that this user successfully accessed the relevant database or dataset. This column may display Never queried, which means that Okera has no record of this user ever running a successful query for the relevant database/dataset.
- Role granting access: Any roles this user belongs to which grant read access for the relevant database or dataset. This column indicates how a user acquired access to these data objects. You can click on any role to view it on the Roles page and see a full list of the permissions it grants. To learn more about roles, see Managing Roles in the UI.
- Groups containing user: Users cannot be assigned directly to roles and must instead be assigned via a group. This column shows all groups that include the user and are assigned to roles listed in Role granting access. This column shows how a user was assigned to these roles. You can revoke a user's access to the database or dataset by removing the user from these groups.
- Access levels: The types of access this user has for the relevant database or dataset. Some users may have direct access to the relevant database or dataset, while others may inherit their access by having access to a higher-level data object, such as the catalog. Users may also have access to specific objects within the relevant database or dataset, such as a column.
Overall, this report displays a list of users who have not accessed the database or dataset within the specified timeframe and should possibly have their access revoked. It also shows the groups and roles that these users should be removed from to have their access revoked.
This report can be downloaded as a CSV file by selecting the Download as CSV button.
You might see this message in red on your report:
This warning indicates that your Okera instance has precise information for only a portion of the selected time range. Consequently, it will not have had enough time to collect user data, so your report may be inaccurate and incomplete. Okera recommends that you wait the number of days (or seconds) specified for the time range before running this report. For example,if you'd like to run this report for the time range
30 days or more, you should wait for 30 days after the Okera deployment date before running this report. This message updates every day to inform you how many days of data have been collected.