drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <prog...@maprtech.com>
Subject Re: isDateCorrect field in ParquetTableMetadata
Date Tue, 25 Oct 2016 18:33:54 GMT
Would it make sense to write the Drill version into the Parquet metadata, and decide on date
format based on the Drill version? This works if Drill versions after, say, 1.9 has the “correct”
format and anything with an earlier version has “incorrect” dates. This is the typical
way that folks handle format changes across versions.

- Paul

> On Oct 25, 2016, at 11:24 AM, Vitalii Diravka <vitalii.diravka@gmail.com> wrote:
> 
> Hi Jinfeng,
> 
> 1.If the parquet files are generated with Drill after Drill-4203 these
> files have "isDateCorrect = true" property.
> Drill serializes this property from metadata now. When we set this property
> in the first constructor we will hide the value from metadata.
> IsDateCorrect will be false only if this value equals to the false (no case
> for it now) or absent in parquet metadata footer.
> 
> 
> 2. I'm not sure the reason to change isDateCorrect metadata property when
> the user disable dates correction.
> If you have some use case it would be great if you provide it.
> 
> 3. Maybe you are right regarding to when Parquet metadata is cloned.
> Here I added the property in the same manner as Jason's new property
> "drillVersion. So need it a separate unit test?
> 
> 
> Kind regards
> Vitalii
> 
> 2016-10-25 16:23 GMT+00:00 Jinfeng Ni <jni@apache.org>:
> 
>> Forgot to copy the link to the code.
>> 
>> [1] https://github.com/apache/drill/blob/master/exec/java-
>> exec/src/main/java/org/apache/drill/exec/store/parquet/
>> Metadata.java#L950-L955
>> 
>> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni <jni@apache.org> wrote:
>>> @Jason, @Vitalli,
>>> 
>>> Any thoughts on this question, since both you worked on fix of
>> DRILL-4203?
>>> 
>>> Looking through the code, there is a third case [1], where this flag
>>> is set to false when Parquet metadata is cloned (after partition
>>> pruning, etc).  That means, for the 2nd case where the flag is set to
>>> true, if there is pruning happening, the new parquet metadata will see
>>> the flag is flipped to false. This does not make sense to me.
>>> 
>>> 
>>> 
>>> On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni <jni@apache.org> wrote:
>>>> Hello All,
>>>> 
>>>> DRILL-4203 addressed the date field issue.  In the fix, it introduced
>>>> a new field in ParquetTableMetadata_v2 : isDateCorrect.  I have some
>>>> difficulty in understanding the meaning of this field.
>>>> 
>>>> According to [1], this field is set to false, when Drill gets parquet
>>>> metadata from parquet footer.  This field is  set to true in code flow
>>>> of [2] and [3], when Drill gets parquet metadata from meta data cache.
>>>> 
>>>> Questions I have:
>>>> 1.  If the parquet files are generated with Drill after DRILL-4203,
>>>> Drill still thinks date field is NOT correct (isDateCorrect = false)?
>>>> 2.  Why does this filed have nothing to do with "autoCorrection" flag
>>>> [4]?  If someone turns off autoCorrection, will it have impact on this
>>>> "isDateCorrect" flag ?
>>>> 
>>>> Thanks in advance for any input,
>>>> 
>>>> Jinfeng
>>>> 
>>>> 
>>>> [1] https://github.com/apache/drill/blob/master/exec/java-
>> exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L932
>>>> [2] https://github.com/apache/drill/blob/master/exec/java-
>> exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L936
>>>> [3] https://github.com/apache/drill/blob/master/exec/java-
>> exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L187
>>>> [4] https://github.com/apache/drill/blob/master/exec/java-
>> exec/src/main/java/org/apache/drill/exec/store/parquet/
>> Metadata.java#L354-L355
>> 


Mime
View raw message