Skip to content

Configure Object Storage Crawlers

The following cloud object storage URI structures are supported for object storage crawlers:

  • AWS - s3://mybucket/

  • Azure ADLS Gen2 - abfss://<file_system>@<account_name>.dfs.core.windows.net/mypath/

    Note: Okera supports Azure Blob Filesystem Storage (abfs) dfs URIs (*.dfs.core.windows.net), but does not support blob URIs (*.blob.core.windows.net).

  • Google - gs://mybucket/

In addition, object storage crawlers can be set up to assume either that datasets in their own directories contain many files (following the Hive convention) or that each file in a directory is its own dataset.

  • If you want the crawler to assume that datasets in their own directories contain many files, select the Dataset files are in the same directory option at the bottom of the crawler creation dialog.

  • If you want the crawler to assume that each file in a directory is its own dataset, select the Each dataset file is in a separate directory option at the bottom of the crawler cration dialog.

configure object storage crawler