drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zelaine Fong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC
Date Sat, 12 Nov 2016 23:13:59 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15660451#comment-15660451
] 

Zelaine Fong commented on DRILL-5034:
-------------------------------------

Here's my understanding of this issue:

OLD BEHAVIOR (Before DRILL-4373):
(O1) CONVERT_FROM(int96_col, 'TIMESTAMP_IMPALA') - returns the int96 data as a timestamp value
in the local timezone

NEW BEHAVIOR (After DRILL-4373):
(N1) If store.parquet.reader.int96_as _timestamp is set to true, then selecting int96 data
returns a timestamp value in the UTC timezone
(N2) CONVERT_FROM(int96_col, 'TIMESTAMP_IMPALA') - returns the int96 data as a timestamp value
in the UTC timezone
(N3) If the hive property hive.parquet.timestamp.skip.conversion is set to true, then N2 returns
the int96 data in the local timezone
(N4) CONVERT_FROM(int96_col, 'TIMESTAMP_IMPALA_LOCALTIMEZONE') - returns the int96 data as
a timestamp value in the local timezone

So, the gist of this bug is that (N2) is different behavior from the original (O1) behavior.

To get the original (O1) behavior, users either need to do (N3) or (N4). 

[~vitalii]'s reasoning for changing the behavior for (N2) is because Impala is not storing
timezone with their timestamp values, use of the name 'TIMESTAMP_IMPALA' and having it return
timestamp data in the local timezone is misleading.

So, I guess this boils down to -- do we keep the old behavior with the misleading name? 

> Select timestamp from hive generated parquet always return in UTC
> -----------------------------------------------------------------
>
>                 Key: DRILL-5034
>                 URL: https://issues.apache.org/jira/browse/DRILL-5034
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.9.0
>            Reporter: Krystal
>            Assignee: Kunal Khatua
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically converts
the timestamp data to UTC. 
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +----------------------------------------------+
> |                    EXPR$0                    |
> +----------------------------------------------+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +----------------------------------------------+
> {code}
> data schema:
> {code}
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from `/user/hive/warehouse/voter_hive_parquet`
limit 5;
> +------------------------+
> |         EXPR$0         |
> +------------------------+
> | 2016-10-23 20:03:58.0  |
> | null                   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> +------------------------+
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the user timezone
is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from dfs.`/user/hive/warehouse/voter_hive_parquet`
limit 5;
> +------------------------+
> |         EXPR$0         |
> +------------------------+
> | 2016-10-24 03:03:58.0  |
> | null                   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> +------------------------+
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +-------+---------------------------------------------------+
> |  ok   |                      summary                      |
> +-------+---------------------------------------------------+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +-------+---------------------------------------------------+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> +------------------------+
> |    create_timestamp    |
> +------------------------+
> | 2016-10-24 03:03:58.0  |
> | null                   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> +------------------------+
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message