drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet
Date Tue, 01 Nov 2016 18:57:58 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626338#comment-15626338
] 

ASF GitHub Bot commented on DRILL-4373:
---------------------------------------

Github user vdiravka commented on the issue:

    https://github.com/apache/drill/pull/600
  
    @parthchandra The known issue with hive that it stores timestamp values into parquet files
with local zone retain. That's why when we want to retrieve the data from such table we should
consider the local timezone.
    On the other hand parquet files don't involve the particular time zone and when we just
read the file we shouldn't consdier a local timezone. And this is also standard drill behaviour
with normal int64 timestamps.
    So I decided that we need two `IMPALA_TIMESTAMP` functions: for hive and for regular parquet
files.
    I left  `IMPALA_TIMESTAMP` function without local timezone retain and I added `IMPALA_TIMESTAMP_LOCALTIMEZONE`
function (implicit using with hive timestamps and enabled drill native parquet reader). 
    
    Please let me know if this approach is good.
    Changes in a new commit for easy review.


> Drill and Hive have incompatible timestamp representations in parquet
> ---------------------------------------------------------------------
>
>                 Key: DRILL-4373
>                 URL: https://issues.apache.org/jira/browse/DRILL-4373
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Hive, Storage - Parquet
>    Affects Versions: 1.8.0
>            Reporter: Rahul Challapalli
>            Assignee: Parth Chandra
>              Labels: doc-impacting
>             Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a hive table
on top of the parquet file and use "timestamp" as the column type, drill fails to read the
hive table through the hive storage plugin
> Implementation: 
> Added int96 to timestamp converter for both parquet readers and controling it by system
/ session option "store.parquet.int96_as_timestamp".
> The value of the option is false by default for the proper work of the old query scripts
with the "convert_from TIMESTAMP_IMPALA" function.
> When the option is true using of that function is unnesessary and can lead to the query
fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message