Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7B579200BA5 for ; Wed, 19 Oct 2016 10:49:00 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 79E75160ADE; Wed, 19 Oct 2016 08:49:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C439B160AFB for ; Wed, 19 Oct 2016 10:48:59 +0200 (CEST) Received: (qmail 98757 invoked by uid 500); 19 Oct 2016 08:48:58 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 98690 invoked by uid 99); 19 Oct 2016 08:48:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2016 08:48:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7E88A2C4C71 for ; Wed, 19 Oct 2016 08:48:58 +0000 (UTC) Date: Wed, 19 Oct 2016 08:48:58 +0000 (UTC) From: "Arina Ielchiieva (JIRA)" To: dev@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (DRILL-4763) Parquet file with DATE logical type produces wrong results for simple SELECT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 19 Oct 2016 08:49:00 -0000 [ https://issues.apache.org/jira/browse/DRILL-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-4763. ------------------------------------- Resolution: Duplicate Merged into master with three commits with Jira DRILL-4203: 2f4b5ef717ed78a1ebb65687af9688a902e02041 ae34d5c30582de777db19360abf013bc50c8640b 8461d10b4fd6ce56361d1d826bb3a38b6dc8473c > Parquet file with DATE logical type produces wrong results for simple SELECT > ---------------------------------------------------------------------------- > > Key: DRILL-4763 > URL: https://issues.apache.org/jira/browse/DRILL-4763 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Affects Versions: 1.6.0 > Reporter: Paul Rogers > Assignee: Vitalii Diravka > Fix For: 1.9.0 > > Attachments: date.parquet, int_16.parquet > > > Created a simple Parquet file with the following schema: > message test { required int32 index; required int32 value (DATE); required int32 raw; } > That is, a file with an int32 storage type and a DATE logical type. Then, created a number of test values: > 0 (which should be interpreted as 1970-01-01) and > (int) (System.currentTimeMillis() / (24*60*60*1000) ) Which should be interpreted as the number of days since 1970-01-01 and today. > According to the Parquet spec (https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md), Parquet dates are expressed as "the number of days from the Unix epoch, 1 January 1970." > Java timestamps are expressed as "measured in milliseconds, between the current time and midnight, January 1, 1970 UTC." > There is ambiguity here: Parquet dates are presumably local times not absolute times, so the math above will actually tell us the date in London right now, but that's close enough. > Generate the local file to date.parquet. Query it with: > SELECT * from `local`.`root`.`date.parquet`; > The results are incorrect: > index value raw > 1 -11395-10-18T00:00:00.000-07:52:58 0 > Here, we have a value of 0. The displayed date is decidedly not 1970-01-01T00:00:00. We actually have many problems: > 1. The date is far off. > 2. The output shows time. But, the Parquet DATE format explcitly does NOT include time, so it makes no sense to include it. > 3. The output attempts to show a time zone, but a time zone of -07:52:58, while close to PST, is not right (there is no timezine that is of by 7:02 from UTC.) > 4. The data has no time zone, Parquet DATE explicilty is a local time, so it is impossible to know the relationship between that date an UTC. > The correct output (in ISO format) would be: 1970-01-01 > The last line should be today's date, but instead is: > 6 -11348-04-20T00:00:00.000-07:52:58 16986 > Expected: > 2016-07-04 > Note that all the information to produce the right information is available to Drill: > 1. The DATE annotation says the meaning of the signed 32-bit integer. > 2. Given the starting point and duration in days, the conversion to Drill's own internal date format is unambiguous. > 3. The DATE annotation says that the date is local, so Drill should not attempt to convert to UTC. (That is, a Java Date object can't be used, instead a Joda/Java 8 LocalDate is necessary.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)