Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7E4ED200BAD for ; Tue, 25 Oct 2016 20:34:13 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7CC16160AF3; Tue, 25 Oct 2016 18:34:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 99859160AD8 for ; Tue, 25 Oct 2016 20:34:12 +0200 (CEST) Received: (qmail 60144 invoked by uid 500); 25 Oct 2016 18:34:11 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 60131 invoked by uid 99); 25 Oct 2016 18:34:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Oct 2016 18:34:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0B444C0C0A for ; Tue, 25 Oct 2016 18:34:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.102 X-Spam-Level: X-Spam-Status: No, score=-0.102 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=maprtech.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id UDm_muamgyo0 for ; Tue, 25 Oct 2016 18:34:09 +0000 (UTC) Received: from mail-pf0-f182.google.com (mail-pf0-f182.google.com [209.85.192.182]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 9995F5F24F for ; Tue, 25 Oct 2016 18:34:09 +0000 (UTC) Received: by mail-pf0-f182.google.com with SMTP id s8so123659478pfj.2 for ; Tue, 25 Oct 2016 11:34:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maprtech.com; s=google; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=l8CilRNM51iy5+cUd+d5rdf6WT7fYJ5nphItHhuoW/s=; b=iSbqqwW/n4yuUMbc5QrdGv1uHb+BEY7ODmbb/xF91agj/MC4cR1S95R8x179a+yPjj LZ76ebkLA8moGiaAlr4QQAU4apydsodgSMSbCE0llaWAYICtK/qsP8ALR2YbraDBq2q1 RzE3WeTST5kqYWaZvLye/Y0AayD1AjD7PFyGo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=l8CilRNM51iy5+cUd+d5rdf6WT7fYJ5nphItHhuoW/s=; b=I85k8/e9Swn1Jej2TJomJHmZqgDfg3XtsEz7b/L10ua1ELm87eB+szIaqA/iAjusNG 1tAZ2/+FY02qySik0Pq5QWg/MWZQ+zxGt0cZ11O6Xy+fAxg2k475RxgKfGC2SxGuMCAN kBY9E7OYq1RezZaF8asghf6GuFm1hcqX0eJFJw9MFlegFb2ZS7mX2jNHlWIyvaFqK+bg hMKNMhiJY5rAzD5PcqM6iWi/DHFcPlpJg7q8g9ZuKaO7AQdgYbz8q4F98/s7+1qBT72d cOB1vfjP3OD959cImvhBTKQqpmKamFMBP/K6YzUDFmxVGmTwo922OqSM0ovbu/ZpMjLA iq0Q== X-Gm-Message-State: ABUngvdwck5b8h8LFw7kMPR9eWNOPfTOOX8HTpDimbB2P7wklY7Zst3ixmKBIOj61pKwQMOg X-Received: by 10.99.146.8 with SMTP id o8mr34513954pgd.55.1477420435230; Tue, 25 Oct 2016 11:33:55 -0700 (PDT) Received: from [10.250.56.224] ([12.220.154.66]) by smtp.gmail.com with ESMTPSA id xg4sm10758880pac.39.2016.10.25.11.33.54 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 25 Oct 2016 11:33:54 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: isDateCorrect field in ParquetTableMetadata From: Paul Rogers In-Reply-To: Date: Tue, 25 Oct 2016 11:33:54 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: "dev@drill.apache.org" X-Mailer: Apple Mail (2.3124) archived-at: Tue, 25 Oct 2016 18:34:13 -0000 Would it make sense to write the Drill version into the Parquet = metadata, and decide on date format based on the Drill version? This = works if Drill versions after, say, 1.9 has the =E2=80=9Ccorrect=E2=80=9D = format and anything with an earlier version has =E2=80=9Cincorrect=E2=80=9D= dates. This is the typical way that folks handle format changes across = versions. - Paul > On Oct 25, 2016, at 11:24 AM, Vitalii Diravka = wrote: >=20 > Hi Jinfeng, >=20 > 1.If the parquet files are generated with Drill after Drill-4203 these > files have "isDateCorrect =3D true" property. > Drill serializes this property from metadata now. When we set this = property > in the first constructor we will hide the value from metadata. > IsDateCorrect will be false only if this value equals to the false (no = case > for it now) or absent in parquet metadata footer. >=20 >=20 > 2. I'm not sure the reason to change isDateCorrect metadata property = when > the user disable dates correction. > If you have some use case it would be great if you provide it. >=20 > 3. Maybe you are right regarding to when Parquet metadata is cloned. > Here I added the property in the same manner as Jason's new property > "drillVersion. So need it a separate unit test? >=20 >=20 > Kind regards > Vitalii >=20 > 2016-10-25 16:23 GMT+00:00 Jinfeng Ni : >=20 >> Forgot to copy the link to the code. >>=20 >> [1] https://github.com/apache/drill/blob/master/exec/java- >> exec/src/main/java/org/apache/drill/exec/store/parquet/ >> Metadata.java#L950-L955 >>=20 >> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni wrote: >>> @Jason, @Vitalli, >>>=20 >>> Any thoughts on this question, since both you worked on fix of >> DRILL-4203? >>>=20 >>> Looking through the code, there is a third case [1], where this flag >>> is set to false when Parquet metadata is cloned (after partition >>> pruning, etc). That means, for the 2nd case where the flag is set = to >>> true, if there is pruning happening, the new parquet metadata will = see >>> the flag is flipped to false. This does not make sense to me. >>>=20 >>>=20 >>>=20 >>> On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni wrote: >>>> Hello All, >>>>=20 >>>> DRILL-4203 addressed the date field issue. In the fix, it = introduced >>>> a new field in ParquetTableMetadata_v2 : isDateCorrect. I have = some >>>> difficulty in understanding the meaning of this field. >>>>=20 >>>> According to [1], this field is set to false, when Drill gets = parquet >>>> metadata from parquet footer. This field is set to true in code = flow >>>> of [2] and [3], when Drill gets parquet metadata from meta data = cache. >>>>=20 >>>> Questions I have: >>>> 1. If the parquet files are generated with Drill after DRILL-4203, >>>> Drill still thinks date field is NOT correct (isDateCorrect =3D = false)? >>>> 2. Why does this filed have nothing to do with "autoCorrection" = flag >>>> [4]? If someone turns off autoCorrection, will it have impact on = this >>>> "isDateCorrect" flag ? >>>>=20 >>>> Thanks in advance for any input, >>>>=20 >>>> Jinfeng >>>>=20 >>>>=20 >>>> [1] https://github.com/apache/drill/blob/master/exec/java- >> = exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L932 >>>> [2] https://github.com/apache/drill/blob/master/exec/java- >> = exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L936 >>>> [3] https://github.com/apache/drill/blob/master/exec/java- >> = exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L187 >>>> [4] https://github.com/apache/drill/blob/master/exec/java- >> exec/src/main/java/org/apache/drill/exec/store/parquet/ >> Metadata.java#L354-L355 >>=20