drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet
Date Fri, 07 Oct 2016 01:08:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553744#comment-15553744
] 

ASF GitHub Bot commented on DRILL-4373:
---------------------------------------

Github user bitblender commented on a diff in the pull request:

    https://github.com/apache/drill/pull/600#discussion_r82314071
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
---
    @@ -45,4 +53,34 @@ public static int getIntFromLEBytes(byte[] input, int start) {
         }
         return out;
       }
    +
    +  /**
    +   * Utilities for converting from parquet INT96 binary (impala, hive timestamp)
    +   * to date time value. This utilizes the Joda library.
    +   */
    +  public static class NanoTimeUtils {
    +
    +    public static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1);
    +    public static final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1);
    +    public static final long NANOS_PER_MINUTE = TimeUnit.MINUTES.toNanos(1);
    +    public static final long NANOS_PER_SECOND = TimeUnit.SECONDS.toNanos(1);
    +    public static final long NANOS_PER_MILLISECOND =  TimeUnit.MILLISECONDS.toNanos(1);
    +
    +  /**
    +   * @param binaryTimeStampValue
    +   *          hive, impala timestamp values with nanoseconds precision
    +   *          are stored in parquet Binary as INT96
    +   *
    +   * @return  the number of milliseconds since January 1, 1970, 00:00:00 GMT
    +   *          represented by @param binaryTimeStampValue .
    +   */
    +    public static long getDateTimeValueFromBinary(Binary binaryTimeStampValue) {
    +      NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue);
    +      int julianDay = nt.getJulianDay();
    +      long nanosOfDay = nt.getTimeOfDayNanos();
    +      return DateTimeUtils.fromJulianDay(julianDay-0.5d) + nanosOfDay/NANOS_PER_MILLISECOND;
    --- End diff --
    
    1.  I would recommend not using Joda. Do the calculations directly, like in ConvertFromImpalaTimestamp.
Joda uses non-standard, hence  confusing, terminology. What Joda calls and uses as JulianDay,
is actually Julian Date. Seems like you have identified this discrepancy and adjusted for
it by subtracting 0.5 from _julianDay_. 
    
        Note: (I guess you have already figured this out) : The actual code and the Joda code
in the comment, in ConvertFromImpalaTimestamp, are inconsistent. Took me a day to figure out
the reason behind this ! A bug should be opened to delete the comment. 
    
    2. Can you please also leave a comment stating that 2440588 is the JDN for the Unix Epoch.
    
    3. Please leave a comment stating that the order of the calls to get _julianDay_ and _nanosOfDay_
matters. You can do this by just stating how timestamps are stored in INT96 i.e 32-bit JDN
followed by 64-bit nanosOfDay.
    
    4. Consistent(single or none) spacing for binary operators (+-/) used here would be nice.
Single spacing would be preferable.


> Drill and Hive have incompatible timestamp representations in parquet
> ---------------------------------------------------------------------
>
>                 Key: DRILL-4373
>                 URL: https://issues.apache.org/jira/browse/DRILL-4373
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Hive, Storage - Parquet
>    Affects Versions: 1.8.0
>            Reporter: Rahul Challapalli
>            Assignee: Karthikeyan Manivannan
>              Labels: doc-impacting
>             Fix For: 1.9.0
>
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a hive table
on top of the parquet file and use "timestamp" as the column type, drill fails to read the
hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message