hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Hanson" <eh...@microsoft.com>
Subject Re: Review Request: Vectorized Timestamp functions for long nanosecond based timestamps
Date Thu, 30 May 2013 21:29:14 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11530/#review21203
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44042>

    Sun Java style convention is to have a blank line before all comments. I don't personally
mind but that's what I've been asked to do :-).



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44039>

    Please use full sentences starting with caps and ending with period.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44038>

    Can you confirm this expression will be constant-folded by the compiler? Otherwise this
should be evaluated by hand in advance. 



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44040>

    If you know in what cases ms can be negative, can you add that to the comment? That seems
unusual. 
    
    In think this because if the time is negative (before the epoch) you could get negative
nanos. So you want to convert before creating the timestamp. Is that right? Please elaborate
a little in the comment.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44041>

    Would it be possible to re-use a timestamp that belongs to the class here, rather than
calling new? If so, please do that to speed this up. I think you can do what you need with
setTime() and setNanos().
    
    Eliminating new() in the inner loop of vector processing tends to speed things up a lot.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFWeekOfYearLong.java
<https://reviews.apache.org/r/11530/#comment44044>

    Please explain why constant 4 is correct here and what it means.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44049>

    Nice work! Good, compact code to initialize year boundaries. No use of new() or calls
to heavy calendar methods in the inner loop. Awesome! :-)



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44045>

    can you use minYear and maxYear here instead of literals?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44047>

    can you use minYear + 1 and maxYear instead of literals?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44051>

    Given that this function is moderately heavy anyway (with the binary search) I think making
it virtual will not slow things down much. 
    
    But if it gets faster we should seriously consider creating a separate evaluate method
for VectorUDFYearLong and make this a static function to avoid virtual method call overhead.
    



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44050>

    Did you consider calculating approximate year using something like
    
    approxYear = yearBase + (int)(time / nanosPerYear) - 1; 
    
    and then linearly search forward to find year boundary? I wonder if that would be faster
than binary search.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44063>

    Overall this is a good set of unit tests! It's quite comprehensive. Thanks.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44054>

    Can you comment this function to explain how you are using long[] inputs? I think I understand
but a comment would help.
    
    



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44061>

    if inputs.length == 1 (which I think it often does in your tests), then I % 1 is always
0. So you are always loading up input vectors with all 0. Is there a reason for this? If so,
please explain, or if not, consider a wider range of inputs.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44052>

    what does /*begin-macro*/ mean?



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44053>

    I'm trying to standardize on using "batch" instead of "vrg" for batches. vrg is used in
a lot of places but g stands for group not batch so it is confusing.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44060>

    here you know !(vrg.cols[in].noNulls || vrg.cols[in].isNull[i] == false) ==> 
    !noNulls and isNull[I] == true
    
    But you are checking for the value of isNull[I] for "in" in the assert which is a bit
misleading. 
    
    Maybe test against true in the else branch?


- Eric Hanson


On May 29, 2013, 11:36 p.m., Gopal V wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11530/
> -----------------------------------------------------------
> 
> (Updated May 29, 2013, 11:36 p.m.)
> 
> 
> Review request for hive, Jitendra Pandey and Eric Hanson.
> 
> 
> Description
> -------
> 
> Timestamp UDFs for vectorized long nanosecond timestamps - all of them convert timestamp
into a corresponding long/int value (year, month, week-of-year, day-of-month, hour, minute,
second, unix-timestamp).
> 
> 
> This addresses bug HIVE-4608.
>     https://issues.apache.org/jira/browse/HIVE-4608
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFDayOfMonthLong.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFHourLong.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFMinuteLong.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFMonthLong.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFSecondLong.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFUnixTimeStampLong.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFWeekOfYearLong.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11530/diff/
> 
> 
> Testing
> -------
> 
> Unit tests included which compare each UDF against its non-vectorized one's output, with
random data and year boundary data (+1,0,-1).
> 
> 
> Thanks,
> 
> Gopal V
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message