Skip to content

BigQuery Data Source Connections

This page describes how to connect to BigQuery data. Some configuration is required in BigQuery prior to setting up the connection in Okera.

Required BigQuery Configuration

Before you create a BigQuery connection in Okera, you must set up a BigQuery Service account and download the associated service account key.

  1. Create a BigQuery Service account. Looker's documentation has some easy-to-follow instructions on how to do this.
  2. Grant the desired BigQuery permissions for the Okera connection to this service account.
  3. Download the BigQuery service account private key certificate.
  4. Upload and set the location of your BigQuery service account key on your Okera cluster as BIGQUERY_PRIVATE_KEY in your cluster configuration file.

Create a BigQuery Connection

  1. Create a new BigQuery connection from the Connections Page.

    BigQuery connection example

  2. Provide values for these fields:

    • Project Name: The Google project ID. Note please use the project ID here and not the project name.
    • Service Account Name: The username for your BigQuery service account.
  3. Test your connection to see if it works. If you receive an error please check that:

    • BIGQUERY_PRIVATE_KEY has been correctly configured on your cluster
    • Your BigQuery service account has enough permissions to read data from your BigQuery project
    • That you input the project ID and not the project name

Enable Query Pushdown

To enable transparent query pushdown, add bigquery to the list of enabled engines in the OKERA_CTE_REWRITE_ENABLED_ENGINES configuration setting. Multiple engines can be enabled (in addition to the ones enabled by default) by providing them as comma-separated values in this configuration setting.

For example:

OKERA_CTE_REWRITE_ENABLED_ENGINES: dremio:direct,redshift,bigquery

Inject the Okera Connection Query ID Into BigQuery History

You can inject the Okera connection query ID into the BigQuery history and Okera audit logs. This can be used to correlate the BigQuery project history with the logging in Okera audit logs.

To enable and disable this functionality, use the connection configuration parameter inject.okera-id in the Okera connection definition. Set inject.okera-id to true to enable this functionality; set it to false to disable this functionality.

For most connections, the default is false, but for BigQuery connections, the default is true.

When set to true, the connection query ID is injected as a comment in the SQL for a connection and included in any logging that occurs for that connection.

BigQuery Query Rewrites

A dedicated API endpoint, /api/v2/query/rewrite can be used for Okera's rewrites of BigQuery queries. POST is the only method provided with this endpoint. There are three parameters (one required):

  • The query parameter is a required string parameter and specifies the SQL query to be authorized and rewritten.

  • The cteRewrite parameter is an optional boolean parameter. It indicates whether common table expressions should be used in the rewrite. Valid values are true and false. The default is false.

  • The dialect parameter is an optional string parameter that specifies the dialect for the SQL query (in double quotes). Valid values are "BIG_QUERY", "HIVE", "IMPALA", "PRESTO", and "SNOWFLAKE". The default is "PRESTO".

For information about any Okera API endpoint, see the Okera API documentation, available after you log into the Web UI by appending /api/v2-docs/api/ after the web UI port number (8083). For example: https://my.okera.installation:8083/api/v2-docs/api/.

Programmatic Connection Example

Here is a programmatic example of a BigQuery connection.

CREATE DATACONNECTION bigquery_connection_03 CXNPROPERTIES
(
  'connection_type'='JDBC',
  'jdbc_driver'='bigquery',
  'connection_properties'='{
    "url":"jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<Project_ID>;OAuthType=0;OAuthServiceAcctEmail=<Service_account_name>;OAuthPvtKeyPath={BIGQUERY_PRIVATE_KEY}"
    }'
);