drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC
Date Thu, 23 Feb 2017 14:55:44 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880589#comment-15880589
] 

ASF GitHub Bot commented on DRILL-5034:
---------------------------------------

Github user vdiravka commented on the issue:

    https://github.com/apache/drill/pull/656
  
    @bitblender Parquet files with INT96 TIMESTAMP values usually are generated with hive.
Those TIMESTAMP values represent the local timezone of the host where the data was written.
To read that values hive considers the local timezone (in fact shift between local and UTC
timezones is adding to the timestamp values).
    The aim of this patch to make the same behaviour like in HIVE while reading parquet INT96
TIMESTAMP vals.
    As the result in different timezones we have different data after query or in other words
this test depends from the local timezone and `baselineValues` for different timezones will
be different. 
    
    However I applied the logic of converting timestamps to the local timezone for the test's
`baselineValues`.
    So now this test works on the every timezone properly.
    
    The last commit is updated.


> Select timestamp from hive generated parquet always return in UTC
> -----------------------------------------------------------------
>
>                 Key: DRILL-5034
>                 URL: https://issues.apache.org/jira/browse/DRILL-5034
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.9.0
>            Reporter: Krystal
>            Assignee: Vitalii Diravka
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically converts
the timestamp data to UTC. 
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +----------------------------------------------+
> |                    EXPR$0                    |
> +----------------------------------------------+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +----------------------------------------------+
> {code}
> data schema:
> {code}
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from `/user/hive/warehouse/voter_hive_parquet`
limit 5;
> +------------------------+
> |         EXPR$0         |
> +------------------------+
> | 2016-10-23 20:03:58.0  |
> | null                   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> +------------------------+
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the user timezone
is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from dfs.`/user/hive/warehouse/voter_hive_parquet`
limit 5;
> +------------------------+
> |         EXPR$0         |
> +------------------------+
> | 2016-10-24 03:03:58.0  |
> | null                   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> +------------------------+
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +-------+---------------------------------------------------+
> |  ok   |                      summary                      |
> +-------+---------------------------------------------------+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +-------+---------------------------------------------------+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> +------------------------+
> |    create_timestamp    |
> +------------------------+
> | 2016-10-24 03:03:58.0  |
> | null                   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> +------------------------+
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message