drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitalii Diravka <vitalii.dira...@gmail.com>
Subject Re: isDateCorrect field in ParquetTableMetadata
Date Tue, 25 Oct 2016 18:24:07 GMT
Hi Jinfeng,

1.If the parquet files are generated with Drill after Drill-4203 these
files have "isDateCorrect = true" property.
Drill serializes this property from metadata now. When we set this property
in the first constructor we will hide the value from metadata.
IsDateCorrect will be false only if this value equals to the false (no case
for it now) or absent in parquet metadata footer.


2. I'm not sure the reason to change isDateCorrect metadata property when
the user disable dates correction.
If you have some use case it would be great if you provide it.

3. Maybe you are right regarding to when Parquet metadata is cloned.
Here I added the property in the same manner as Jason's new property
"drillVersion. So need it a separate unit test?


Kind regards
Vitalii

2016-10-25 16:23 GMT+00:00 Jinfeng Ni <jni@apache.org>:

> Forgot to copy the link to the code.
>
> [1] https://github.com/apache/drill/blob/master/exec/java-
> exec/src/main/java/org/apache/drill/exec/store/parquet/
> Metadata.java#L950-L955
>
> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni <jni@apache.org> wrote:
> > @Jason, @Vitalli,
> >
> > Any thoughts on this question, since both you worked on fix of
> DRILL-4203?
> >
> > Looking through the code, there is a third case [1], where this flag
> > is set to false when Parquet metadata is cloned (after partition
> > pruning, etc).  That means, for the 2nd case where the flag is set to
> > true, if there is pruning happening, the new parquet metadata will see
> > the flag is flipped to false. This does not make sense to me.
> >
> >
> >
> > On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni <jni@apache.org> wrote:
> >> Hello All,
> >>
> >> DRILL-4203 addressed the date field issue.  In the fix, it introduced
> >> a new field in ParquetTableMetadata_v2 : isDateCorrect.  I have some
> >> difficulty in understanding the meaning of this field.
> >>
> >> According to [1], this field is set to false, when Drill gets parquet
> >> metadata from parquet footer.  This field is  set to true in code flow
> >> of [2] and [3], when Drill gets parquet metadata from meta data cache.
> >>
> >> Questions I have:
> >> 1.  If the parquet files are generated with Drill after DRILL-4203,
> >> Drill still thinks date field is NOT correct (isDateCorrect = false)?
> >> 2.  Why does this filed have nothing to do with "autoCorrection" flag
> >> [4]?  If someone turns off autoCorrection, will it have impact on this
> >> "isDateCorrect" flag ?
> >>
> >> Thanks in advance for any input,
> >>
> >> Jinfeng
> >>
> >>
> >> [1] https://github.com/apache/drill/blob/master/exec/java-
> exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L932
> >> [2] https://github.com/apache/drill/blob/master/exec/java-
> exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L936
> >> [3] https://github.com/apache/drill/blob/master/exec/java-
> exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L187
> >> [4] https://github.com/apache/drill/blob/master/exec/java-
> exec/src/main/java/org/apache/drill/exec/store/parquet/
> Metadata.java#L354-L355
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message