drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krystal (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC
Date Thu, 10 Nov 2016 20:45:58 GMT
Krystal created DRILL-5034:
------------------------------

             Summary: Select timestamp from hive generated parquet always return in UTC
                 Key: DRILL-5034
                 URL: https://issues.apache.org/jira/browse/DRILL-5034
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.9.0
            Reporter: Krystal


commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904

Reading timestamp data against a hive parquet table from drill automatically converts the
timestamp data to UTC. 

SELECT TIMEOFDAY() FROM (VALUES(1));
+----------------------------------------------+
|                    EXPR$0                    |
+----------------------------------------------+
| 2016-11-10 12:33:26.547 America/Los_Angeles  |
+----------------------------------------------+

data schema:
message hive_schema {
  optional int32 voter_id;
  optional binary name (UTF8);
  optional int32 age;
  optional binary registration (UTF8);
  optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
  optional int32 voterzone;
  optional int96 create_timestamp;
  optional int32 create_date (DATE);
}

Using drill-1.8, the returned timestamps match the table data:

select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from `/user/hive/warehouse/voter_hive_parquet`
limit 5;
+------------------------+
|         EXPR$0         |
+------------------------+
| 2016-10-23 20:03:58.0  |
| null                   |
| 2016-09-09 12:01:18.0  |
| 2017-03-06 20:35:55.0  |
| 2017-01-20 22:32:43.0  |
+------------------------+
5 rows selected (1.032 seconds)

If the user timzone is changed to UTC, then the timestamp data is returned in UTC time.

Using drill-1.9, the returned timestamps got converted to UTC eventhough the user timezone
is in PST.

select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from dfs.`/user/hive/warehouse/voter_hive_parquet`
limit 5;
+------------------------+
|         EXPR$0         |
+------------------------+
| 2016-10-24 03:03:58.0  |
| null                   |
| 2016-09-09 19:01:18.0  |
| 2017-03-07 04:35:55.0  |
| 2017-01-21 06:32:43.0  |
+------------------------+

alter session set `store.parquet.reader.int96_as_timestamp`=true;
+-------+---------------------------------------------------+
|  ok   |                      summary                      |
+-------+---------------------------------------------------+
| true  | store.parquet.reader.int96_as_timestamp updated.  |
+-------+---------------------------------------------------+

select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
+------------------------+
|    create_timestamp    |
+------------------------+
| 2016-10-24 03:03:58.0  |
| null                   |
| 2016-09-09 19:01:18.0  |
| 2017-03-07 04:35:55.0  |
| 2017-01-21 06:32:43.0  |
+------------------------+


 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message