The Data page enables you to browse, filter, and take actions on Okera databases and datasets. Only data objects that you have access to will appear on this page.
Catalog admins and users with certain permissions can also create databases, edit metadata, and manage access to data objects from this page. You can learn more about these actions in Taking Actions on Data and Managing Permissions on Data.
Browsing and Viewing Data¶
When you first arrive on the Data page, you will see a list of all the databases you have access to. Each database card displays the database name, an optional description, the number of datasets within it, and any tags on this database.
To see more information about a given database, click on the database name and you will be taken to this database’s information page. To return to the main Data page, click on the ‘Databases’ header.
At the top of the database information page, you will see the database description and any tags on this database. Catalog admins or users with certain permissions can edit this information. To learn more about this, see Editing metadata
On the Datasets tab, you will see a list of all the datasets contained within this database. Only datasets that you have access to will be shown.
On the Permissions tab, you will see a list of all the roles that have been granted access to this database. You can learn more about these permissions in Managing Permissions on Data. Only admins or users with the ability to grant access to this database will see the Permissions tab.
Lastly, on the Details tab, you will see additional information about the database, such as its location and creator.
Datasets are listed in the Datasets tab of a database, and in Dataset Search results. To learn more about Dataset Search, see Searching across datasets.
Each dataset card displays a dataset’s name, description, type (whether it is a table, internal view, or external view), source, any tags on it, and the time it was last updated.
To see more information about a given dataset, click on the dataset name and you will be taken to this dataset’s information page.
At the top of the dataset information page, you will see the dataset description and any tags on this dataset. Catalog admins or users with certain permissions can edit this information. To learn more about this, see Editing metadata.
On the Schema tab, you will see a list of all the columns in this dataset. For more details about this tab, see Understanding schemas.
On the Permissions tab, you will see a list of all the roles that have been granted some level of access to this dataset. You can learn more about these permissions in Managing Permissions on Data. Only admins or users with the ability to grant access to this dataset will see the Permissions tab.
Lastly, on the Details tab, you will see additional information about the dataset, such as its location, creator, and any relevant view lineage. For more details about view lineage, see Understanding view lineage.
Column names are listed on the Schema tab along with each column’s type, any tags on the column, and an optional comment. Catalog admins or users with certain permissions can tag columns and edit comments. To learn more about this, see Editing metadata.
The Access column of the schema indicates whether or not you have access to a column. A green checkmark in this column indicates that you have access to this column while a lock icon indicates that you do not.
Columns that you do not have access to will not appear when you preview this dataset. For more details about previewing, see Previewing datasets.
A column name with a grey background indicates that this column is a partitioning column.
Complex types will appear as ‘STRUCT’ in the type column of the Schema tab.
What does ‘Admin’ mean?¶
Depending on your permissions, you may see a grey ‘Admin’ tag on your databases and datasets. This tag indicates that you have ALL privileges on this data object, i.e. you have full and unrestricted access to this data object and any of its descendants.
Catalog admins will see all data marked with this tag. For more about administrative usage, see Portal for Admins.
Searching and Filtering Data¶
Searching for databases¶
If you want to find a specific database or group of databases, you can use the filters at the top of the main Data page.
You can filter databases by name or by database-level tags. You can also use the ‘Admin privileges’ checkbox to show only databases that you have ALL privileges on.
Searching within databases and datasets¶
You can also filter within a given database to find a specific dataset or group of datasets.
You can filter datasets by name or by dataset-level tags. You can also use the ‘Starred’ checkbox to show only starred datasets, or the ‘Admin privileges’ checkbox to show only datasets that you have ALL privileges on.
You can also search for specific columns within a dataset’s schema using the search bar on the Schema tab.
Searching across datasets¶
If you want to search for datasets across the entire catalog (i.e. without having to choose a database first), click the Search all datasets button at the top right of the page to access the Dataset Search page.
On this page, you can filter datasets by multiple databases, and search for datasets by name or tag.
Filtering by tag¶
Please note that the tag filter on the Dataset Search page will filter on all tags, regardless of where they are applied. When you filter by tag, you may see results for any of the following:
- Datasets that are tagged with the relevant tag(s)
- Datasets that contain columns tagged with the relevant tag(s)
- Datasets that are within databases tagged with the relevant tag(s)
Taking Actions on Data¶
Creating a new database¶
Catalog admins and users with CREATE or CREATE_AS_OWNER privileges on the catalog can create databases. To create a database, click the ‘Create new database’ button on the main Data page.
You will be prompted to name your database and give an optional description. If you want to start adding data to your newly created database, go to the Registration page. For more information about registering data, see Data Registration.
Catalog admins, users with ALL privileges on a data object, or users with ALTER privileges on a data object, can edit the object’s description or comment.
Additionally, catalog admins, users with ALL privileges on a data object, or users with ADD_ATTRIBUTE and REMOVE_ATTRIBUTE privileges on a data object can edit the object's tags. Tags can be added and edited at the database, dataset, or column level.
Datasets can be previewed using the preview button at the end of a dataset card (indicated with an eye icon), or using the preview button on a dataset’s information page.
The preview will display up to 100 rows and will only show columns that you have access to. Data in preview may also appear masked or obscured if a transform function has been applied to it.
Filtering by column¶
Use the ‘Filter by column’ dropdown at the top left of the preview window to filter by certain columns.
Previewing complex types¶
When previewing a table with complex types, the data will be displayed as an expandable JSON structure. Click the '+' icon to expand the cell and reveal the data values.
Previewing partitioned data¶
In order to speed up preview for large tables with many partitions, the preview will only show results from the last partition. If the preview is empty, please verify that there is data in the last partition.
Creating a view on data¶
Catalog admins can create internal views on datasets. Users with CREATE or ALL privileges on a database can also create internal views on any datasets they have ALL access to. These views can only be created in databases that this user has CREATE or ALL privileges on.
To create a new view from a dataset, click the ‘Create new internal view’ button on the dataset’s information page.
You will be prompted to name your view and to specify a database to create it in. Okera will default to creating the view in the database of the base table. You can then select which columns from the base table you want to include in this new view.
Understanding view lineage¶
Catalog admins or users with ALL privileges on a dataset can see view lineage information on the dataset’s Details tab.
View lineage currently only shows direct parents and children, as well as root base table information. It does not list grandparent or grandchild information in the case where the descendant chain is long.
Please note that only lineage for views created through Okera will be visible. If Okera is connected to an external metastore, and views were created inside that metastore bypassing Okera, their view lineage will not be displayed. View lineage will also only be present for views created through Okera after the 2.1 release.
Note also that dropping a dataset (table or view), and recreating it, will remove its view lineage.
Click on the star next to a dataset’s name to add that dataset to your ‘Starred datasets’ list for easy access. You can see a list of your recently starred datasets on the homepage. You can also filter datasets by ‘Starred’.
Viewing dataset usage¶
To start using a dataset, click the ‘Use’ icon.
Sections of different sample code will be displayed, e.g. Spark, Hive, Python, R, and CURL. Click the relevant tab to retrieve integration code suitable for your application.
Managing Permissions on Data¶
Catalog admins can view all permissions by default. For a non-admin user to view permissions, they must first have grant ability on at least one data object - you can give a user grant ability by checking the "Include ability to grant" box when creating a permission. Secondly, the user's group must be assigned to the role "okera_policy_management_role".
Users with the ability to view permissions will see a Permissions tab on the relevant data object information pages.
The Permissions tab shows a table of all the roles with access to the relevant data object. The table also shows a list of the groups assigned to each role, as well as the specific permissions that this role has on the data object.
Click on a role name to view it in the Roles page and see its full permissions.
Note that some permissions may be direct (e.g. a role that has been granted access directly to this data object), while others may be inherited (e.g. a role that has access to the entire data catalog, and thus has access to this data object by default).
Adding and editing permissions¶
Users with the ability to view permissions can also add and edit permissions via the Data page. Click the ‘Add new permission’ button at the top of an object’s Permissions tab to add a new permission.
You will be prompted to select a role to grant access to, as well as an access level to grant them. To learn more about access levels, see Privileges.
To learn more about granting access in general, see Creating Policies in the UI.
Catalog admins and users with the ability to grant access to a given data object, can also edit or delete permissions on relevant data objects. Use the edit and delete icons at the end of each permission row to make these changes.