drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krystal (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5381) convert_from(col, 'TIMESTAMP_IMPALA') returns incorrect timestamp if there are multiple nulls
Date Fri, 24 Mar 2017 18:50:41 GMT
Krystal created DRILL-5381:
------------------------------

             Summary: convert_from(col, 'TIMESTAMP_IMPALA') returns incorrect timestamp if
there are multiple nulls 
                 Key: DRILL-5381
                 URL: https://issues.apache.org/jira/browse/DRILL-5381
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.9.0, 1.8.0, 1.10.0
            Reporter: Krystal


In drill-1.10, setting `store.parquet.reader.int96_as_timestamp`=true returns expected data:

select voter_id,create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` limit
15;
+-----------+------------------------+
| voter_id  |    create_timestamp    |
+-----------+------------------------+
| 1         | 2016-10-23 20:03:58.0  |
| 2         | null                   |
| 3         | 2016-09-09 12:01:18.0  |
| 4         | 2017-03-06 20:35:55.0  |
| 5         | 2017-01-20 22:32:43.0  |
| 6         | 2016-10-22 05:46:12.0  |
| 7         | 2016-09-19 10:21:29.0  |
| 8         | null                   |
| 9         | 2016-07-23 13:39:02.0  |
| 10        | 2017-01-28 17:27:19.0  |
| 11        | 2016-10-23 10:55:44.0  |
| 12        | 2016-06-07 22:44:03.0  |
| 13        | 2016-05-04 13:59:20.0  |
| 14        | 2016-11-08 17:20:14.0  |
| 15        | 2016-05-14 11:23:53.0  |
+-----------+------------------------+

However, setting  `store.parquet.reader.int96_as_timestamp`=false returns incorrect timestamp
when it encounters the second "null" value.

select voter_id,convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from dfs.`/user/hive/warehouse/voter_hive_parquet`
limit 15;
+-----------+------------------------+
| voter_id  |         EXPR$1         |
+-----------+------------------------+
| 1         | 2016-10-23 20:03:58.0  |
| 2         | null                   |
| 3         | 2016-09-09 12:01:18.0  |
| 4         | 2017-03-06 20:35:55.0  |
| 5         | 2017-01-20 22:32:43.0  |
| 6         | 2016-10-22 05:46:12.0  |
| 7         | 2016-09-19 10:21:29.0  |
| 8         | 2016-07-23 13:39:02.0  |
| 9         | 2016-10-23 10:55:44.0  |
| 10        | 2016-06-07 22:44:03.0  |
| 11        | 2016-05-04 13:59:20.0  |
| 12        | 2016-11-08 17:20:14.0  |
| 13        | 2016-05-14 11:23:53.0  |
| 14        | 2016-06-20 16:18:51.0  |
| 15        | 2016-09-09 10:02:28.0  |
+-----------+------------------------+

Notice that the timestamp for voter_id=9 shifts to voter_id=8 which suppose to have value
of "null".  The rest of the timestamps after voter_id=7 are incorrect.  This issue is also
reproducible on both drill-1.8.0 and drill-1.9.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message