Advanced Querying of JDBC Data Sources¶
For information on how Okera uses JDBC drivers, see JDBC Driver Use in Okera.
Querying Database Tables¶
Besides regular scans, you can also include the JDBC-based datasets into more complex SQL queries.
Example: Query that joins data between two JDBC tables
SELECT t.*, u.*
FROM marketingdb.transactions_schemaed t
LEFT JOIN marketingdb.users u ON (u.userid = t.userid)
WHERE t.txnid = 304;
304|10/26/2016 08:59 PM|sku20|34225|100.000000|9821-5981-3318-9164|819.427.617.534|AU|34225|Melissa|Kenneth|Melissa Kenneth|Melissa.Kenneth@sample.com
Example: Combine two tables from the same JDBC source
SELECT *
FROM marketingdb.transactions_schemaed
WHERE txnid = 304
UNION ALL
SELECT *
FROM marketingdb.transactions_schemaed_demo
WHERE txnid = 376;
304|10/26/2016 08:59 PM|sku20|34225|100.000000|9821-5981-3318-9164|819.427.617.534|AU
376|12/18/2016 08:39 AM|sku19|51145|150.000000|3360-2156-3415-5155|224.872.343.772|FR
Okera also supports UNION ALL
operations between JDBC and Amazon S3 data sources.
For example, if the data is spread across JDBC and Amazon S3, we can run a UNION ALL
query to retrieve them from these two sources in the same combined resultset.
Example: Combine two datasets from different storage systems
SELECT j.txnid, j.dt_time, j.sku, j.userid, j.price, j.creditcard, j.ip
FROM marketingdb.transactions_schemaed_demo j
WHERE j.txnid = 304
UNION ALL
SELECT *
FROM demo_test.transactions s3t
WHERE s3t.txnid = 98314894;
304|10/26/2016 08:59 PM|sku20|34225|100.000000|9821-5981-3318-9164|819.427.617.534
98314894|03/13/2016 10:41 PM|sku47|22505|960.000000|9018-5233-8275-5887|105.424.939.543
Inline SQL Views¶
Many times, users would want to run complex queries on JDBC data sources.
One way to run the repeated queries is to persist the SQL as a view in the database and refer to it as a view on Okera.
However, this could create many bloated views in the JDBC data source.
Also users who only have SELECT permissions on the database would not be able to create view on the databases for this purpose.
Okera supports persisting these INLINE VIEW
s as part of the catalog table definitions.
Users can create tables with a backing inline view in the form of a SQL string, as part of the CREATE TABLE
command.
During scan operations, Okera would automatically replace the tablename with the actual SQL string specified during table creation.
The SQL string can be written in a form that is specific to the database.
For example, if the underlying database is MySQL, the SQL string can be the supported MySQL version compliant SQL statement.
The table name provided as part of this command is used as the alias of the resulting inline view of the converted SQL.
Additionally, this removes the need to load all table definitions into the catalog.
Note: This feature is ONLY available for JDBC data source tables. Example: Create a catalog table backed by a SQL string (inline view)
CREATE EXTERNAL TABLE marketingdb.table_using_jdbc_complex_query STORED AS JDBC TBLPROPERTIES ( 'credentials.file' = 's3://<bucket>/<object>', 'driver' = 'mysql') USING VIEW AS 'SELECT t.*, u.fname, lname, fullname, emailid FROM marketingdb.transactions_schemaed t LEFT JOIN marketingdb.users u ON (u.userid = t.userid)'
Note: The describe operation on the table would display the full definition of the resulting schema.
DESCRIBE marketingdb.table_using_jdbc_complex_query; txnid bigint dt_time varchar(30) sku varchar(40) userid int price double creditcard varchar(40) ip varchar(40) region varchar(2) fname varchar(50) lname varchar(50) fullname varchar(101) emailid varchar(150)
The extended DESCRIBE FORMATTED
command would also show the actual query that would be used during query execution as part of the TBLPROPERTIES
key/value list.
Example: Retrieve the extended definition for dataset
DESCRIBE FORMATTED marketingdb.table_using_jdbc_complex_query;
<Table Column info>
...
Table Parameters: NULL NULL
EXTERNAL TRUE
credentials.file s3://<bucket>/<object>
driver mysql
jdbc.query SELECT t.*, u.fname, lname, fullname, emailid FROM jdbc_demo.transactions_schemaed t
LEFT JOIN jdbc_demo.users u ON (u.userid = t.userid)
mapred.input.table.name marketingdb.table_using_jdbc_complex_query
Scan Datasets With Inline Views¶
Scan operations are similar to those on regular tables.
The WHERE
clauses can be applied on the table as usual.
Example: Scanning a dataset with inline view
SELECT *
FROM marketingdb.test_using_jdbc_complex_query
WHERE txnid = 304;
Internally the actual SQL is converted into the following:
SELECT *
FROM (
SELECT t.*, u.fname, lname, fullname, emailid
FROM marketingdb.transactions_schemaed t
LEFT JOIN marketingdb.users u ON (u.userid = t.userid)
) AS test_using_jdbc_complex_query
WHERE txnid = 304;
The table can be used for complex joins with other tables as well.
Note: This is for representational purpose and we join to same users table to demo the capability Example: Perform a complex join on datasets with inline views
SELECT n.* FROM marketingdb.table_using_jdbc_complex_query n JOIN marketingdb.users u ON (u.userid = n.userid) WHERE n.txnid = 376; 376|12/18/2016 08:39 AM|sku19|51145|150.000000|3360-2156-3415-5155|224.872.343.772|FR|Stephanie|Maria|Stephanie Maria|Stephanie.Maria@sample.com