Skip to content

Crawler Overview

Crawlers search a data store to discover its datasets and infer their schemas. After the datasets have been discovered, they can be registered to an Okera database. This topic describes how to create and run a crawler. For information about registering datasets, see Register Datasets.

To see the programmatic DDL commands for crawlers see Programmatic Registration.

You must have an Okera connection created for the data store you wish to crawl. See Connect to Data Sources.

Automated tagging, or autotagging, can reduce the manual work of tagging by detecting when a column of data is likely to contain a certain type of formatted data, such as a phone number, and then applying the relevant tag to that column. If autotagging is enabled on the Okera cluster, autotags are applied on newly discovered datasets containing data that matches the specified autotag rules. After the dataset has been registered to the Okera catalog, you you can see that the tags are applied to the data. For more information on configuring autotagging, see Configure Autotagging.

See the following sections for more information on crawlers: