Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1CE74200B71 for ; Wed, 31 Aug 2016 12:47:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 19B9F160AB4; Wed, 31 Aug 2016 10:47:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 60F29160A8C for ; Wed, 31 Aug 2016 12:47:21 +0200 (CEST) Received: (qmail 89008 invoked by uid 500); 31 Aug 2016 10:47:20 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 88994 invoked by uid 99); 31 Aug 2016 10:47:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Aug 2016 10:47:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6AD032C014E for ; Wed, 31 Aug 2016 10:47:20 +0000 (UTC) Date: Wed, 31 Aug 2016 10:47:20 +0000 (UTC) From: "Vitalii Diravka (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 31 Aug 2016 10:47:22 -0000 [ https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451891#comment-15451891 ] Vitalii Diravka commented on DRILL-4373: ---------------------------------------- [~rkins] As I see you have an error cause drill and hive use different data types for timestamp logical type: hive uses int96 (the reason is nanoseconds accuracy), but drill uses int64 (special data type for timestamps with appropriate meta annotation due to [parquet documentation|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md], used for microseconds or milliseconds accuracy). Therefore drill stores timestamps correctly and hive must be able to read such parquet files: https://issues.apache.org/jira/browse/HIVE-13435. Another issue is that Drill can read hive timestamps from parquet files but with using CONVERT_FROM function. By default drill converts INT96 to VARBINARY. I'm going to implement in context of this jira ability for drill to interpret hive timestamp in parquet files as timestamp implicitly by default, but with controlling it by session/system option (for the case if a new datatype will be stored as INT96 in the parquet file). > Drill and Hive have incompatible timestamp representations in parquet > --------------------------------------------------------------------- > > Key: DRILL-4373 > URL: https://issues.apache.org/jira/browse/DRILL-4373 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive, Storage - Parquet > Reporter: Rahul Challapalli > > git.commit.id.abbrev=83d460c > I created a parquet file with a timestamp type using Drill. Now if I define a hive table on top of the parquet file and use "timestamp" as the column type, drill fails to read the hive table through the hive storage plugin -- This message was sent by Atlassian JIRA (v6.3.4#6332)