drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitalii Diravka <vitalii.dira...@gmail.com>
Subject Re: isDateCorrect field in ParquetTableMetadata
Date Wed, 26 Oct 2016 15:28:21 GMT
@Paul Rogers
It may be the undefined case when the file is generated with drill.version
= 1.9-SNAPSHOT.
It is more easy to determine corrupted date with this flag and there is no
need to wait the end of release to merge these changes.

@Jinfeng NI
It looks like you are right.
With consistent mode (isDateCorrect = true) all tests are passed. So I am
going to open a jira ticket for it with next changes
https://github.com/vdiravka/drill/commit/ff8d5c7d601915f760d1b0e96187303410cac5d3
Thanks.

Kind regards
Vitalii

2016-10-25 18:36 GMT+00:00 Jinfeng Ni <jni@apache.org>:

> I'm not sure if I fully understand your answers. The bottom line is
> quite simple: given a set of parquet files, the ParquetTableMeta
> instance constructed in Drill should have identical value for
> "isDateCorrect", whether it comes from parquet footer, or parquet
> metadata cache, or whether there is partition pruning or not. However,
> the code shows that this flag is not in consistent mode across
> different cases.
>
>
>
> On Tue, Oct 25, 2016 at 11:24 AM, Vitalii Diravka
> <vitalii.diravka@gmail.com> wrote:
> > Hi Jinfeng,
> >
> > 1.If the parquet files are generated with Drill after Drill-4203 these
> > files have "isDateCorrect = true" property.
> > Drill serializes this property from metadata now. When we set this
> property
> > in the first constructor we will hide the value from metadata.
> > IsDateCorrect will be false only if this value equals to the false (no
> case
> > for it now) or absent in parquet metadata footer.
> >
> >
> > 2. I'm not sure the reason to change isDateCorrect metadata property when
> > the user disable dates correction.
> > If you have some use case it would be great if you provide it.
> >
> > 3. Maybe you are right regarding to when Parquet metadata is cloned.
> > Here I added the property in the same manner as Jason's new property
> > "drillVersion. So need it a separate unit test?
> >
> >
> > Kind regards
> > Vitalii
> >
> > 2016-10-25 16:23 GMT+00:00 Jinfeng Ni <jni@apache.org>:
> >
> >> Forgot to copy the link to the code.
> >>
> >> [1] https://github.com/apache/drill/blob/master/exec/java-
> >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> >> Metadata.java#L950-L955
> >>
> >> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni <jni@apache.org> wrote:
> >> > @Jason, @Vitalli,
> >> >
> >> > Any thoughts on this question, since both you worked on fix of
> >> DRILL-4203?
> >> >
> >> > Looking through the code, there is a third case [1], where this flag
> >> > is set to false when Parquet metadata is cloned (after partition
> >> > pruning, etc).  That means, for the 2nd case where the flag is set to
> >> > true, if there is pruning happening, the new parquet metadata will see
> >> > the flag is flipped to false. This does not make sense to me.
> >> >
> >> >
> >> >
> >> > On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni <jni@apache.org> wrote:
> >> >> Hello All,
> >> >>
> >> >> DRILL-4203 addressed the date field issue.  In the fix, it introduced
> >> >> a new field in ParquetTableMetadata_v2 : isDateCorrect.  I have some
> >> >> difficulty in understanding the meaning of this field.
> >> >>
> >> >> According to [1], this field is set to false, when Drill gets parquet
> >> >> metadata from parquet footer.  This field is  set to true in code
> flow
> >> >> of [2] and [3], when Drill gets parquet metadata from meta data
> cache.
> >> >>
> >> >> Questions I have:
> >> >> 1.  If the parquet files are generated with Drill after DRILL-4203,
> >> >> Drill still thinks date field is NOT correct (isDateCorrect = false)?
> >> >> 2.  Why does this filed have nothing to do with "autoCorrection" flag
> >> >> [4]?  If someone turns off autoCorrection, will it have impact on
> this
> >> >> "isDateCorrect" flag ?
> >> >>
> >> >> Thanks in advance for any input,
> >> >>
> >> >> Jinfeng
> >> >>
> >> >>
> >> >> [1] https://github.com/apache/drill/blob/master/exec/java-
> >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> Metadata.java#L932
> >> >> [2] https://github.com/apache/drill/blob/master/exec/java-
> >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> Metadata.java#L936
> >> >> [3] https://github.com/apache/drill/blob/master/exec/java-
> >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> Metadata.java#L187
> >> >> [4] https://github.com/apache/drill/blob/master/exec/java-
> >> exec/src/main/java/org/apache/drill/exec/store/parquet/
> >> Metadata.java#L354-L355
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message