drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: Behavior of TIME type column in Drill
Date Mon, 08 Jun 2015 22:00:33 GMT
for the 2nd query (without cast to time) I think Drill just displays the
column as VARCHAR. You are only seeing 2 digits after SS. because that's
how the data is stored in the file (I suppose).

I don't know what's the standard is, but looking at SQL Server
documentation, you can have up to 3 digits after the period:

Milliseconds can be preceded by either a colon (:) or a period (.). If a
> colon is used, the number means thousandths-of-a-second. If a period is
> used, a single digit means tenths-of-a-second, two digits mean
> hundredths-of-a-second, and three digits mean thousandths-of-a-second.



On Mon, Jun 8, 2015 at 2:47 PM, Khurram Faraaz <kfaraaz@maprtech.com> wrote:

> Hi All,
>
> Can someone please review this ?
>
> SQL standard (ISO/IEC 9075-1:2011(E)) specifies that a value of datatype
> TIME comprises values of DATETIME fields HOUR, MINUTE and SECONDS.
> (HH:MM:SS)
>
> The question here is for the same input data Drill and Postgres return
> different output/results, when we cast the column to TIME in Drill.
>
> Drill 1.0 results below are with cast to TIME. Note that there are three
> digits after the seconds. (HH:MM:SS.sss)
>
> {code}
>
> 0: jdbc:drill:schema=dfs.tmp> SELECT cast( columns[8] as TIME ) FROM
> `allData.csv` limit 10;
>
> *+---------------+*
>
> *| **   EXPR$0    ** |*
>
> *+---------------+*
>
> *| *08:16:08.580 * |*
>
> *| *09:11:49.170 * |*
>
> *| *18:44:25.430 * |*
>
> *| *20:52:08.560 * |*
>
> *| *19:46:10.420 * |*
>
> *| *15:21:34.390 * |*
>
> *| *10:31:55.240 * |*
>
> *| *01:37:47.310 * |*
>
> *| *01:50:05.110 * |*
>
> *| *11:28:25.100 * |*
>
> *+---------------+*
>
> 10 rows selected (0.173 seconds)
>
>
> explain plan for above query
>
>
> *| *00-00    Screen
>
> 00-01      SelectionVectorRemover
>
> 00-02        Limit(fetch=[10])
>
> 00-03          Project(EXPR$0=[CAST(ITEM($0, 8)):TIME(0)])
>
> 00-04            Scan(groupscan=[EasyGroupScan
> [selectionRoot=/tmp/allData.csv, numFiles=1, columns=[`columns`[8]],
> files=[maprfs:///tmp/allData.csv]]])
>
>
> {code}
>
>
> Drill results below are without cast to TIME, these are same as Postgres
> results. However, we should note that there are only two digits after SS in
> the results, and this is not as per the SQL specification for TIME
> datatype.
>
>
> {code}
>
> 0: jdbc:drill:schema=dfs.tmp> SELECT columns[8] FROM `allData.csv` limit
> 10;
>
> *+--------------+*
>
> *| **   EXPR$0   ** |*
>
> *+--------------+*
>
> *| *8:16:8.58   * |*
>
> *| *9:11:49.17  * |*
>
> *| *18:44:25.43 * |*
>
> *| *20:52:8.56  * |*
>
> *| *19:46:10.42 * |*
>
> *| *15:21:34.39 * |*
>
> *| *10:31:55.24 * |*
>
> *| *1:37:47.31  * |*
>
> *| *1:50:5.11   * |*
>
> *| *11:28:25.1  * |*
>
> *+--------------+*
>
> 10 rows selected (0.264 seconds)
>
>
> explain plan for query that does not cast to TIME
>
>
> *| *00-00    Screen
>
> 00-01      SelectionVectorRemover
>
> 00-02        Limit(fetch=[10])
>
> 00-03          Project(EXPR$0=[ITEM($0, 8)])
>
> 00-04            Scan(groupscan=[EasyGroupScan
> [selectionRoot=/tmp/allData.csv, numFiles=1, columns=[`columns`[8]],
> files=[maprfs:///tmp/allData.csv]]])
>
> {code}
>
>
> Postgres 9.3 results are
>
>
> {code}
>
> postgres=# SELECT col_tm FROM all_typs_tbl limit 10;
>
>    col_tm
>
> -------------
>
>  08:16:08.58
>
>  09:11:49.17
>
>  18:44:25.43
>
>  20:52:08.56
>
>  19:46:10.42
>
>  15:21:34.39
>
>  10:31:55.24
>
>  01:37:47.31
>
>  01:50:05.11
>
>  11:28:25.1
>
> (10 rows)
> {code}
>
> Thanks,
> Khurram
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message