drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection
Date Wed, 09 Nov 2016 16:58:59 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651438#comment-15651438
] 

ASF GitHub Bot commented on DRILL-4980:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/644#discussion_r87232227
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
---
    @@ -59,19 +59,24 @@
        */
       public static final long JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH = 2440588;
       /**
    -   * All old parquet files (which haven't "is.date.correct=true" property in metadata)
have
    -   * a corrupt date shift: {@value} days or 2 * {@value #JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH}
    +   * All old parquet files (which haven't "is.date.correct=true" or "parquet-writer.version"
properties
    +   * in metadata) have a corrupt date shift: {@value} days or 2 * {@value #JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH}
        */
       public static final long CORRECT_CORRUPT_DATE_SHIFT = 2 * JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH;
    -  // The year 5000 (or 1106685 day from Unix epoch) is chosen as the threshold for auto-detecting
date corruption.
    -  // This balances two possible cases of bad auto-correction. External tools writing
dates in the future will not
    -  // be shifted unless they are past this threshold (and we cannot identify them as external
files based on the metadata).
    -  // On the other hand, historical dates written with Drill wouldn't risk being incorrectly
shifted unless they were
    -  // something like 10,000 years in the past.
       private static final Chronology UTC = org.joda.time.chrono.ISOChronology.getInstanceUTC();
    +  /**
    +   * The year 5000 (or 1106685 day from Unix epoch) is chosen as the threshold for auto-detecting
date corruption.
    +   * This balances two possible cases of bad auto-correction. External tools writing
dates in the future will not
    +   * be shifted unless they are past this threshold (and we cannot identify them as external
files based on the metadata).
    +   * On the other hand, historical dates written with Drill wouldn't risk being incorrectly
shifted unless they were
    +   * something like 10,000 years in the past.
    +   */
       public static final int DATE_CORRUPTION_THRESHOLD =
           (int) (UTC.getDateTimeMillis(5000, 1, 1, 0) / DateTimeConstants.MILLIS_PER_DAY);
    -
    +  /**
    +   * The version of drill parquet writer with date values corruption fix
    +   */
    +  public static final int DRILL_WRITER_VERSION_WITHOUT_CORRUPTION = 2;
    --- End diff --
    
    Maybe call this DRILL_WRITER_VERSION_STD_DATE_FORMAT
    
    The old format was not "corrupted", it just used a date format that was non-standard.


> Upgrading of the approach of parquet date correctness status detection
> ----------------------------------------------------------------------
>
>                 Key: DRILL-4980
>                 URL: https://issues.apache.org/jira/browse/DRILL-4980
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>    Affects Versions: 1.8.0
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>             Fix For: 1.9.0
>
>
> This jira is an addition for the [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message