drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitalii Diravka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC
Date Fri, 11 Nov 2016 18:18:58 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15657706#comment-15657706
] 

Vitalii Diravka commented on DRILL-5034:
----------------------------------------

As for me it is not a bug. Accordingly to the [spec |https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#timestamp_millis]
the parquet files don't involve the time zone for the timestamp datatype. 
To get a parquet file from hive without zone retain use the [hive.parquet.timestamp.skip.conversion|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.parquet.timestamp.skip.conversion]
hive property. Or for already generated files with zone retain you can use new drill function:
{code}convert_from(create_timestamp, 'TIMESTAMP_IMPALA_LOCALTIMEZONE'){code}
To save existing behaviour it is possible programmatically to return TIMESTAMP_IMPALA function
to the retaining timezone and to add a new function `TIMESTAMP_IMPALA_UTC`. 
But it will be some irregularity to the impala parquet files. Impala does not store timezone
into parquet timestamp by default [https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_timestamp.html].
Since function name is not TIMESTAMP_HIVE but TIMESTAMP_IMPALA I suppose current behaviour
is more reasonable. 


> Select timestamp from hive generated parquet always return in UTC
> -----------------------------------------------------------------
>
>                 Key: DRILL-5034
>                 URL: https://issues.apache.org/jira/browse/DRILL-5034
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.9.0
>            Reporter: Krystal
>            Assignee: Vitalii Diravka
>             Fix For: 1.9.0
>
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically converts
the timestamp data to UTC. 
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +----------------------------------------------+
> |                    EXPR$0                    |
> +----------------------------------------------+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +----------------------------------------------+
> data schema:
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> Using drill-1.8, the returned timestamps match the table data:
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from `/user/hive/warehouse/voter_hive_parquet`
limit 5;
> +------------------------+
> |         EXPR$0         |
> +------------------------+
> | 2016-10-23 20:03:58.0  |
> | null                   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> +------------------------+
> 5 rows selected (1.032 seconds)
> If the user timzone is changed to UTC, then the timestamp data is returned in UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the user timezone
is in PST.
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from dfs.`/user/hive/warehouse/voter_hive_parquet`
limit 5;
> +------------------------+
> |         EXPR$0         |
> +------------------------+
> | 2016-10-24 03:03:58.0  |
> | null                   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> +------------------------+
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +-------+---------------------------------------------------+
> |  ok   |                      summary                      |
> +-------+---------------------------------------------------+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +-------+---------------------------------------------------+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> +------------------------+
> |    create_timestamp    |
> +------------------------+
> | 2016-10-24 03:03:58.0  |
> | null                   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> +------------------------+
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message