drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paul-rogers <...@git.apache.org>
Subject [GitHub] drill pull request #644: DRILL-4980: Upgrading of the approach of parquet da...
Date Wed, 09 Nov 2016 16:59:20 GMT
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/644#discussion_r87232227
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
---
    @@ -59,19 +59,24 @@
        */
       public static final long JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH = 2440588;
       /**
    -   * All old parquet files (which haven't "is.date.correct=true" property in metadata)
have
    -   * a corrupt date shift: {@value} days or 2 * {@value #JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH}
    +   * All old parquet files (which haven't "is.date.correct=true" or "parquet-writer.version"
properties
    +   * in metadata) have a corrupt date shift: {@value} days or 2 * {@value #JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH}
        */
       public static final long CORRECT_CORRUPT_DATE_SHIFT = 2 * JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH;
    -  // The year 5000 (or 1106685 day from Unix epoch) is chosen as the threshold for auto-detecting
date corruption.
    -  // This balances two possible cases of bad auto-correction. External tools writing
dates in the future will not
    -  // be shifted unless they are past this threshold (and we cannot identify them as external
files based on the metadata).
    -  // On the other hand, historical dates written with Drill wouldn't risk being incorrectly
shifted unless they were
    -  // something like 10,000 years in the past.
       private static final Chronology UTC = org.joda.time.chrono.ISOChronology.getInstanceUTC();
    +  /**
    +   * The year 5000 (or 1106685 day from Unix epoch) is chosen as the threshold for auto-detecting
date corruption.
    +   * This balances two possible cases of bad auto-correction. External tools writing
dates in the future will not
    +   * be shifted unless they are past this threshold (and we cannot identify them as external
files based on the metadata).
    +   * On the other hand, historical dates written with Drill wouldn't risk being incorrectly
shifted unless they were
    +   * something like 10,000 years in the past.
    +   */
       public static final int DATE_CORRUPTION_THRESHOLD =
           (int) (UTC.getDateTimeMillis(5000, 1, 1, 0) / DateTimeConstants.MILLIS_PER_DAY);
    -
    +  /**
    +   * The version of drill parquet writer with date values corruption fix
    +   */
    +  public static final int DRILL_WRITER_VERSION_WITHOUT_CORRUPTION = 2;
    --- End diff --
    
    Maybe call this DRILL_WRITER_VERSION_STD_DATE_FORMAT
    
    The old format was not "corrupted", it just used a date format that was non-standard.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message