BigQuery Data Source Connections¶
This page describes how to connect to BigQuery data. Some configuration is required in BigQuery prior to setting up the connection in Okera.
Required BigQuery Configuration¶
Before you create a BigQuery connection in Okera, you must set up a BigQuery Service account and download the associated service account key.
- Create a BigQuery Service account. Looker's documentation has some easy-to-follow instructions on how to do this.
- Grant the desired BigQuery permissions for the Okera connection to this service account.
- Download the BigQuery service account private key certificate.
- Upload and set the location of your BigQuery service account key on your Okera cluster as
BIGQUERY_PRIVATE_KEY
in your cluster configuration file.
Create a BigQuery Connection¶
-
Create a new BigQuery connection from the Connections Page.
-
Provide values for these fields:
- Project Name: The Google project ID. Note please use the project ID here and not the project name.
- Service Account Name: The username for your BigQuery service account.
-
Test your connection to see if it works. If you receive an error please check that:
BIGQUERY_PRIVATE_KEY
has been correctly configured on your cluster- Your BigQuery service account has enough permissions to read data from your BigQuery project
- That you input the project ID and not the project name
Enable Query Pushdown¶
To enable transparent query pushdown, add bigquery
to the list of enabled engines in the OKERA_CTE_REWRITE_ENABLED_ENGINES
configuration setting. Multiple engines can be enabled (in addition to the ones enabled by default) by providing them as comma-separated values in this configuration setting.
For example:
OKERA_CTE_REWRITE_ENABLED_ENGINES: dremio:direct,redshift,bigquery
Inject the Okera Connection Query ID Into BigQuery History¶
You can inject the Okera connection query ID into the BigQuery history and Okera audit logs. This can be used to correlate the BigQuery project history with the logging in Okera audit logs.
To enable and disable this functionality, use the connection configuration parameter inject.okera-id
in the Okera connection definition. Set inject.okera-id
to true
to enable this functionality; set it to false
to disable this functionality.
For most connections, the default is false
, but for BigQuery connections, the default is true
.
When set to true
, the connection query ID is injected as a comment in the SQL for a connection and included in any logging that occurs for that connection.
BigQuery Query Rewrites¶
A dedicated API endpoint, /api/v2/query/rewrite
can be used for Okera's rewrites of BigQuery queries. POST is the only method provided with this endpoint. There are three parameters (one required):
-
The
query
parameter is a required string parameter and specifies the SQL query to be authorized and rewritten. -
The
cteRewrite
parameter is an optional boolean parameter. It indicates whether common table expressions should be used in the rewrite. Valid values are true and false. The default is false. -
The
dialect
parameter is an optional string parameter that specifies the dialect for the SQL query (in double quotes). Valid values are"BIG_QUERY"
,"HIVE"
,"IMPALA"
,"PRESTO"
, and"SNOWFLAKE"
. The default is"PRESTO"
.
For information about any Okera API endpoint, see the Okera API documentation, available after you log into the Web UI by appending /api/v2-docs/api/
after the web UI port number (8083). For example: https://my.okera.installation:8083/api/v2-docs/api/
.
Programmatic Connection Example¶
Here is a programmatic example of a BigQuery connection.
CREATE DATACONNECTION bigquery_connection_03 CXNPROPERTIES
(
'connection_type'='JDBC',
'jdbc_driver'='bigquery',
'connection_properties'='{
"url":"jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<Project_ID>;OAuthType=0;OAuthServiceAcctEmail=<Service_account_name>;OAuthPvtKeyPath={BIGQUERY_PRIVATE_KEY}"
}'
);