This document describes how Okera handles data types and values. We differentiate between the two in that data types are used when specifying schemas (for example, during a ‘create table’ call) and values are the data that exists in a given row within a table.
Currently Supported Data Types
See the NOTES section at the bottom of this page for more information on types.
Okera must convert both values as well as data types in some situations, based on the storage format and the compute engine being used. Some platforms do not not support the full range of types that ODAS does.
Parquet and Spark DataFrames
These are the conversions that occur when working with Parquet data or Spark DataFrames values.
|Datatype||Parquet type||Spark Data frame type||Avro type|
- The string and binary data types are stored as a binary blob and not interpreted in any way.
- DATE type is now supported. The display format is “YYYY-MM-DD”. It is stored internally as an int number of days since unix epoch. Note that timestamp literal is accepted in the filters. The time portion is ignored.
- REAL type is now supported in ODAS. Since, Hive does not support REAL data type, odb may be used to create a field with REAL datatype. DOUBLE type can be used as an alias for REAL.
- For complex datatypes, refer to complex types
- Decimal type is returned as a string in the json resultset when the client connects to odas rest server The rest server client may choose to convert it back to decimal type as needed. Note that most compute engines/applications connect to ODAS planner directly and support and retrieve decimal type directly.
- ODAS supports the above mentioned data types for JSON file formats. Since everything is stored as a string, there is no tight mapping and the users need to select the appropriate data type in the table creation or use the auto-inference by providing sample JSON file.