Skip to content

Data Types

This document describes how Okera handles data types and values. We differentiate between the two in that data types are used when specifying schemas (for example, during a 'create table' call) and values are the data that exists in a given row within a table.

Currently Supported Data Types

  • BOOL
  • INT
  • CHAR
  • DATE
  • REAL

See the NOTES section at the bottom of this page for more information on types.


Okera must convert both values as well as data types in some situations, based on the storage format and the compute engine being used. Some platforms do not not support the full range of types that ODAS does.

Parquet and Spark DataFrames

These are the conversions that occur when working with Parquet data or Spark DataFrames values.

Datatype Parquet type Spark Data frame type Avro type
boolean boolean BooleanType boolean
tinyint int32 IntegerType int
smallint int32 IntegerType int
int int32 IntegerType int
bigint int64 LongType long
float float FloatType float
double double DoubleType double
date int32 IntegerType int
timestamp int96 TimestampType string
string byte_array StringType string
binary byte_array NA bytes
decimal fixed_len_byte_array createDecimalType() bytes
binary byte_array BinaryType bytes
real double DoubleType double

Spark notes

Filter pushdown on date partition, and hence partition pruning, is supported in spark. For this to work, spark requires DATE function to be used, to cast to date type.

Example: This will push the date partition filter to ODAS.

select * from datedb.part_test where d = DATE('2019-01-01')

This will NOT push the date partition filter to ODAS.

select * from datedb.part_test where d = '2019-01-01'


  • The string and binary data types are stored as a binary blob and not interpreted in any way.
  • DATE type is now supported. The display format is "YYYY-MM-DD". It is stored internally as an int number of days since unix epoch. Note that timestamp literal is accepted in the filters. The time portion is ignored.
  • REAL type is now supported in ODAS. Since, Hive does not support REAL data type, odb may be used to create a field with REAL datatype. DOUBLE type can be used as an alias for REAL.
  • For complex datatypes, refer to complex types
  • Decimal type is returned as a string in the json resultset when the client connects to odas rest server The rest server client may choose to convert it back to decimal type as needed. Note that most compute engines/applications connect to ODAS planner directly and support and retrieve decimal type directly.
  • ODAS supports the above mentioned data types for JSON file formats. Since everything is stored as a string, there is no tight mapping and the users need to select the appropriate data type in the table creation or use the auto-inference by providing sample JSON file.