drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitalii Diravka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet
Date Wed, 31 Aug 2016 10:47:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451891#comment-15451891

Vitalii Diravka commented on DRILL-4373:

[~rkins] As I see you have an error cause drill and hive use different data types for timestamp
logical type: hive uses int96 (the reason is nanoseconds accuracy), but drill uses int64 (special
data type for timestamps with appropriate meta annotation due to [parquet documentation|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md],
used for microseconds or milliseconds accuracy). Therefore drill stores timestamps correctly
and hive must be able to read such parquet files: https://issues.apache.org/jira/browse/HIVE-13435.

Another issue is that Drill can read hive timestamps from parquet files but with using CONVERT_FROM
function. By default drill converts INT96 to VARBINARY.
I'm going to implement in context of this jira ability for drill to interpret hive timestamp
in parquet files as timestamp implicitly by default, but with controlling it by session/system
option (for the case if a new datatype will be stored as INT96 in the parquet file).

> Drill and Hive have incompatible timestamp representations in parquet
> ---------------------------------------------------------------------
>                 Key: DRILL-4373
>                 URL: https://issues.apache.org/jira/browse/DRILL-4373
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Hive, Storage - Parquet
>            Reporter: Rahul Challapalli
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a hive table
on top of the parquet file and use "timestamp" as the column type, drill fails to read the
hive table through the hive storage plugin

This message was sent by Atlassian JIRA

View raw message