Click the Datasets navigation tab at the top to access the datasets page.
Use the Datasets page to browse, search, filter, preview, and star your accessible Okera datasets.
Any dataset where you have any level of access appears in this list. Datasets to which you have no access will not appear at all.
Searching and Filtering Datasets¶
At the top of the page, you can select datasets search and filter options:
Search by dataset name. Any dataset name containing your input as a substring is displayed.
You can quickly clear this box with the ESC key.
Filter by database multi-select box
Filter the list to include only datasets in a particular database or set of databases.
Filter by User Tags multi-select box
Filter the list to include only datasets that are tagged as selected.
Filter by Starred datasets checkbox
Filter the list to show only datasets that you've starred.
Filter by Admin checkbox
Filter the list to show only datasets that you are Admin on (you have ALL access to these datasets).
What does Admin mean?¶
If you have ALL access on a particular object (database or dataset), you are considered an admin for that object within Okera. You will also see an “Admin” tag on the dataset details panel.
For more about administrative usage, see Portal for Admins.
To display a details panel for a given dataset, click its dataset card in the list. The details panel will appear on the right side of the screen.
Depending on if you are an admin or not on this dataset, this page contains:
- The dataset's metadata, including:
- Database name
- Dataset name
- Created (the date and time the dataset was created)
- Metadata Changed (the date and time the metadata was last changed)
- View Definition (if dataset is an external view)
- Partitioning Columns (if dataset is partitioned)
- Tags on columns
- Dataset type
- Storage desc params
- SerDe information
- Other table information (any other properties added to a table as key/value pairs will appear here)
- Lineage information
- Base tables
- Base table location
- The dataset schema described in full:
- Total number of columns in the schema (next to 'Schema' tab)
- Partitioning Column (if dataset is partitioned)
- Each column in the dataset is shown, including:
- Access (whether you have read-access to view the cells in this column)
If a column's name has a gray background, it is a partitioning column
If you are an admin on a dataset, you can see the lineage of Okera's datasets in the UI.
For each dataset you can see:
- parent datasets (tables or views)
- child views
- root base table(s)
- location of these base tables
A few important notes with view lineage:
- View lineage currently only shows direct parents and children, and root base table information. It does not list grandparent or grandchild information in the case where the descendant chain is long.
- You can only see view lineage for views created through Okera. If Okera is connected to an external metastore, and views were created inside that metastore bypassing Okera, they will not show view lineage.
- View lineage will only be present for views created through Okera after 2.1.
Dropping a dataset (table or view), and recreating it, will remove its view lineage.
Editing Dataset Detail Information¶
A dataset's description and table-level tags can be edited in the Dataset Details. An edit button appears beside Description and Tags on dataset if you have editing permissions on the selected dataset.
Complex Types in the Schema¶
Complex types will be marked as “STRUCT” datatype in the schema. All the nested child elements will appear underneath with the parent elements greyed out. See also Previewing Complex types.
If your schema looks similar to the table below, you do not have access to every column in the dataset.
To display a list of groups with access to the data column, click See Groups associated with the applicable data column name. To gain access to that column's data, you must be added to one of these groups.
Clicking on the star next to the dataset name will add that dataset to your “Starred datasets” list. You can see a list of your recently starred datasets on the homepage.
Dataset Action Menu¶
At the top right of the dataset details, you will see an Action menu with these options:
- (Admins only) Create View
- Preview Dataset
- View Dataset usage snippets
- Open dataset in Permissions
- Open dataset in Workspace (Admins only)
To create a view click the icon in the upper right of the details panel.
Admin users can select a database to place the view in.
Admin users can input a view name.
The new view name cannot have special characters, white space, and or capitalized letters.
- Admin users can either choose to inherit all of the columns from the parent dataset or select specific columns to include in the new view.
Inheriting will include column names, types, and ABAC attributes. Comments will not be inherited by the new view.
- The schema summary panel provides an easy way to display which columns have been selected so far. Admin users can either remove selected columns by un-checking the checkbox or clicking the 'x' icon and removing it from the schema summary.
A database, view name, and columns must be selected in order to proceed in creating a new view.
To preview a dataset, click the icon in the upper right of the details panel.
Users can search by column header.
No more than 200 rows are shown (to view more, see "Dataset usage" below).
Column information with no access is not visible.
The modal can scroll vertically and horizontally.
If there are more than 60 columns, horizontal pagination will be available.
Note In order to speed up preview for large tables with many partitions, preview only shows results from the last partition. If you observe no results, please verify there is data in the last partition.
Previewing Complex Types¶
When previewing a table with complex types, the data will be displayed as an expandable JSON structure. Simply click the + to expand the cell and reveal the data values.
Pagination for columns in Preview¶
If a table has many columns, it may be paginated. Use the horizontal pagination control in the top right of the table to traverse through all the columns.
Column search in Preview¶
Tables with more than one column can be filtered by column name. Search for column names or select them directly from the filter to view specific columns.
To start using a dataset, click View usage icon on the upper right of its details panel.
Sections of different sample code are displayed, such as for Spark, Hive, Python, R, and CURL. Click the tab of your choice to retrieve integration code suitable for your application.
Only users with the ability to grant on the selected dataset or its database will see the permissions tab.
For users who have access, a Permissions tab will be available on the given dataset details pane.
This permissions table closely mirrors that on the Roles page. Here you can see any permissions that grant access to that dataset, grouped by role.
Clicking on a role will take the user to the corresponding role in the Roles page. Clicking on the groups icon will open a dialog listing all groups for that role.
For users with grant access on the selected dataset, clicking on the Add new permission button above the table will launch the create policy dialog.
Unlike on the Roles page where the role is already selected, the dialog for creating new permissions allows the user to choose which role the new permission should be granted to.
You can only create permissions on the current dataset scope, and as such the database and dataset drop downs are not available here.
CREATE and CREATE_AS_OWNER are not applicable at the dataset scope. Therefore these options are unavailable in the policy dialog access level drop down. Policies with CREATE/CREATE_AS_OWNER can be created from the Roles page.
For more information about policy management see the Permissions section on the Roles page.