drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurram Faraaz <kfar...@maprtech.com>
Subject Behavior of TIME type column in Drill
Date Mon, 08 Jun 2015 21:47:21 GMT
Hi All,

Can someone please review this ?

SQL standard (ISO/IEC 9075-1:2011(E)) specifies that a value of datatype
TIME comprises values of DATETIME fields HOUR, MINUTE and SECONDS.
(HH:MM:SS)

The question here is for the same input data Drill and Postgres return
different output/results, when we cast the column to TIME in Drill.

Drill 1.0 results below are with cast to TIME. Note that there are three
digits after the seconds. (HH:MM:SS.sss)

{code}

0: jdbc:drill:schema=dfs.tmp> SELECT cast( columns[8] as TIME ) FROM
`allData.csv` limit 10;

*+---------------+*

*| **   EXPR$0    ** |*

*+---------------+*

*| *08:16:08.580 * |*

*| *09:11:49.170 * |*

*| *18:44:25.430 * |*

*| *20:52:08.560 * |*

*| *19:46:10.420 * |*

*| *15:21:34.390 * |*

*| *10:31:55.240 * |*

*| *01:37:47.310 * |*

*| *01:50:05.110 * |*

*| *11:28:25.100 * |*

*+---------------+*

10 rows selected (0.173 seconds)


explain plan for above query


*| *00-00    Screen

00-01      SelectionVectorRemover

00-02        Limit(fetch=[10])

00-03          Project(EXPR$0=[CAST(ITEM($0, 8)):TIME(0)])

00-04            Scan(groupscan=[EasyGroupScan
[selectionRoot=/tmp/allData.csv, numFiles=1, columns=[`columns`[8]],
files=[maprfs:///tmp/allData.csv]]])


{code}


Drill results below are without cast to TIME, these are same as Postgres
results. However, we should note that there are only two digits after SS in
the results, and this is not as per the SQL specification for TIME
datatype.


{code}

0: jdbc:drill:schema=dfs.tmp> SELECT columns[8] FROM `allData.csv` limit 10;

*+--------------+*

*| **   EXPR$0   ** |*

*+--------------+*

*| *8:16:8.58   * |*

*| *9:11:49.17  * |*

*| *18:44:25.43 * |*

*| *20:52:8.56  * |*

*| *19:46:10.42 * |*

*| *15:21:34.39 * |*

*| *10:31:55.24 * |*

*| *1:37:47.31  * |*

*| *1:50:5.11   * |*

*| *11:28:25.1  * |*

*+--------------+*

10 rows selected (0.264 seconds)


explain plan for query that does not cast to TIME


*| *00-00    Screen

00-01      SelectionVectorRemover

00-02        Limit(fetch=[10])

00-03          Project(EXPR$0=[ITEM($0, 8)])

00-04            Scan(groupscan=[EasyGroupScan
[selectionRoot=/tmp/allData.csv, numFiles=1, columns=[`columns`[8]],
files=[maprfs:///tmp/allData.csv]]])

{code}


Postgres 9.3 results are


{code}

postgres=# SELECT col_tm FROM all_typs_tbl limit 10;

   col_tm

-------------

 08:16:08.58

 09:11:49.17

 18:44:25.43

 20:52:08.56

 19:46:10.42

 15:21:34.39

 10:31:55.24

 01:37:47.31

 01:50:05.11

 11:28:25.1

(10 rows)
{code}

Thanks,
Khurram

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message