Skip to content

Advanced Querying of JDBC Data Sources

For information on how Okera uses JDBC drivers, see JDBC Driver Use in Okera.

Querying Database Tables

Besides regular scans, you can also include the JDBC-based datasets into more complex SQL queries.

Example: Query that joins data between two JDBC tables

SELECT t.*, u.*
FROM marketingdb.transactions_schemaed t
LEFT JOIN marketingdb.users u ON (u.userid = t.userid)
WHERE t.txnid = 304;
304|10/26/2016 08:59 PM|sku20|34225|100.000000|9821-5981-3318-9164|819.427.617.534|AU|34225|Melissa|Kenneth|Melissa Kenneth|Melissa.Kenneth@sample.com

Example: Combine two tables from the same JDBC source

SELECT *
FROM marketingdb.transactions_schemaed
WHERE txnid = 304
UNION ALL
  SELECT *
  FROM marketingdb.transactions_schemaed_demo
  WHERE txnid = 376;
304|10/26/2016 08:59 PM|sku20|34225|100.000000|9821-5981-3318-9164|819.427.617.534|AU
376|12/18/2016 08:39 AM|sku19|51145|150.000000|3360-2156-3415-5155|224.872.343.772|FR

Okera also supports UNION ALL operations between JDBC and Amazon S3 data sources. For example, if the data is spread across JDBC and Amazon S3, we can run a UNION ALL query to retrieve them from these two sources in the same combined resultset.

Example: Combine two datasets from different storage systems

SELECT j.txnid, j.dt_time, j.sku, j.userid, j.price, j.creditcard, j.ip
FROM marketingdb.transactions_schemaed_demo j
WHERE j.txnid = 304
UNION ALL
  SELECT *
  FROM demo_test.transactions s3t
  WHERE s3t.txnid = 98314894;
304|10/26/2016 08:59 PM|sku20|34225|100.000000|9821-5981-3318-9164|819.427.617.534
98314894|03/13/2016 10:41 PM|sku47|22505|960.000000|9018-5233-8275-5887|105.424.939.543

Inline SQL Views

Many times, users would want to run complex queries on JDBC data sources. One way to run the repeated queries is to persist the SQL as a view in the database and refer to it as a view on Okera. However, this could create many bloated views in the JDBC data source. Also users who only have SELECT permissions on the database would not be able to create view on the databases for this purpose. Okera supports persisting these INLINE VIEWs as part of the catalog table definitions. Users can create tables with a backing inline view in the form of a SQL string, as part of the CREATE TABLE command. During scan operations, Okera would automatically replace the tablename with the actual SQL string specified during table creation. The SQL string can be written in a form that is specific to the database. For example, if the underlying database is MySQL, the SQL string can be the supported MySQL version compliant SQL statement. The table name provided as part of this command is used as the alias of the resulting inline view of the converted SQL. Additionally, this removes the need to load all table definitions into the catalog.

Note: This feature is ONLY available for JDBC data source tables. Example: Create a catalog table backed by a SQL string (inline view)

CREATE EXTERNAL TABLE marketingdb.table_using_jdbc_complex_query
STORED AS JDBC
TBLPROPERTIES (
 'credentials.file' = 's3://<bucket>/<object>',
 'driver' = 'mysql')
USING VIEW AS
'SELECT t.*, u.fname, lname, fullname, emailid FROM marketingdb.transactions_schemaed t
LEFT JOIN marketingdb.users u ON (u.userid = t.userid)'

Note: The describe operation on the table would display the full definition of the resulting schema.

DESCRIBE marketingdb.table_using_jdbc_complex_query;
  txnid  bigint
  dt_time  varchar(30)
  sku  varchar(40)
  userid  int
  price  double
  creditcard  varchar(40)
  ip  varchar(40)
  region  varchar(2)
  fname  varchar(50)
  lname  varchar(50)
  fullname  varchar(101)
  emailid  varchar(150)

The extended DESCRIBE FORMATTED command would also show the actual query that would be used during query execution as part of the TBLPROPERTIES key/value list.

Example: Retrieve the extended definition for dataset

DESCRIBE FORMATTED marketingdb.table_using_jdbc_complex_query;
<Table Column info>
...
Table Parameters:   NULL    NULL
    EXTERNAL                TRUE
    credentials.file        s3://<bucket>/<object>
    driver                  mysql
    jdbc.query              SELECT t.*, u.fname, lname, fullname, emailid FROM jdbc_demo.transactions_schemaed t
                                LEFT JOIN jdbc_demo.users u ON (u.userid = t.userid)
    mapred.input.table.name marketingdb.table_using_jdbc_complex_query

Scan Datasets With Inline Views

Scan operations are similar to those on regular tables. The WHERE clauses can be applied on the table as usual.

Example: Scanning a dataset with inline view

SELECT *
FROM marketingdb.test_using_jdbc_complex_query
WHERE txnid = 304;

Internally the actual SQL is converted into the following:

SELECT *
FROM (
  SELECT t.*, u.fname, lname, fullname, emailid
  FROM marketingdb.transactions_schemaed t
  LEFT JOIN marketingdb.users u ON (u.userid = t.userid)
  ) AS test_using_jdbc_complex_query
WHERE txnid = 304;

The table can be used for complex joins with other tables as well.

Note: This is for representational purpose and we join to same users table to demo the capability Example: Perform a complex join on datasets with inline views

SELECT n.*
FROM marketingdb.table_using_jdbc_complex_query n
JOIN marketingdb.users u ON (u.userid = n.userid)
WHERE n.txnid = 376;
376|12/18/2016 08:39 AM|sku19|51145|150.000000|3360-2156-3415-5155|224.872.343.772|FR|Stephanie|Maria|Stephanie Maria|Stephanie.Maria@sample.com