Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 185C0E454 for ; Thu, 30 May 2013 21:29:21 +0000 (UTC) Received: (qmail 95160 invoked by uid 500); 30 May 2013 21:29:20 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 95098 invoked by uid 500); 30 May 2013 21:29:20 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 95085 invoked by uid 99); 30 May 2013 21:29:20 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 May 2013 21:29:20 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 526A01CBFE0; Thu, 30 May 2013 21:29:14 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6076732038657139173==" MIME-Version: 1.0 Subject: Re: Review Request: Vectorized Timestamp functions for long nanosecond based timestamps From: "Eric Hanson" To: "Jitendra Pandey" , "Eric Hanson" Cc: "hive" , "Gopal V" Date: Thu, 30 May 2013 21:29:14 -0000 Message-ID: <20130530212914.1348.40658@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Eric Hanson" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/11530/ X-Sender: "Eric Hanson" References: <20130529233613.1348.26136@reviews.apache.org> In-Reply-To: <20130529233613.1348.26136@reviews.apache.org> Reply-To: "Eric Hanson" --===============6076732038657139173== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11530/#review21203 ----------------------------------------------------------- ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTime= stampFieldLong.java Sun Java style convention is to have a blank line before all comments. = I don't personally mind but that's what I've been asked to do :-). ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTime= stampFieldLong.java Please use full sentences starting with caps and ending with period. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTime= stampFieldLong.java Can you confirm this expression will be constant-folded by the compiler= ? Otherwise this should be evaluated by hand in advance. = ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTime= stampFieldLong.java If you know in what cases ms can be negative, can you add that to the c= omment? That seems unusual. = = In think this because if the time is negative (before the epoch) you co= uld get negative nanos. So you want to convert before creating the timestam= p. Is that right? Please elaborate a little in the comment. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTime= stampFieldLong.java Would it be possible to re-use a timestamp that belongs to the class he= re, rather than calling new? If so, please do that to speed this up. I thin= k you can do what you need with setTime() and setNanos(). = Eliminating new() in the inner loop of vector processing tends to speed= things up a lot. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFWeek= OfYearLong.java Please explain why constant 4 is correct here and what it means. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYear= Long.java Nice work! Good, compact code to initialize year boundaries. No use of = new() or calls to heavy calendar methods in the inner loop. Awesome! :-) ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYear= Long.java can you use minYear and maxYear here instead of literals? ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYear= Long.java can you use minYear + 1 and maxYear instead of literals? ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYear= Long.java Given that this function is moderately heavy anyway (with the binary se= arch) I think making it virtual will not slow things down much. = = But if it gets faster we should seriously consider creating a separate = evaluate method for VectorUDFYearLong and make this a static function to av= oid virtual method call overhead. = ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYear= Long.java Did you consider calculating approximate year using something like = approxYear =3D yearBase + (int)(time / nanosPerYear) - 1; = = and then linearly search forward to find year boundary? I wonder if tha= t would be faster than binary search. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTim= estampExpressions.java Overall this is a good set of unit tests! It's quite comprehensive. Tha= nks. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTim= estampExpressions.java Can you comment this function to explain how you are using long[] input= s? I think I understand but a comment would help. = = ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTim= estampExpressions.java if inputs.length =3D=3D 1 (which I think it often does in your tests), = then I % 1 is always 0. So you are always loading up input vectors with all= 0. Is there a reason for this? If so, please explain, or if not, consider = a wider range of inputs. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTim= estampExpressions.java what does /*begin-macro*/ mean? ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTim= estampExpressions.java I'm trying to standardize on using "batch" instead of "vrg" for batches= . vrg is used in a lot of places but g stands for group not batch so it is = confusing. ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTim= estampExpressions.java here you know !(vrg.cols[in].noNulls || vrg.cols[in].isNull[i] =3D=3D f= alse) =3D=3D> = !noNulls and isNull[I] =3D=3D true = But you are checking for the value of isNull[I] for "in" in the assert = which is a bit misleading. = = Maybe test against true in the else branch? - Eric Hanson On May 29, 2013, 11:36 p.m., Gopal V wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/11530/ > ----------------------------------------------------------- > = > (Updated May 29, 2013, 11:36 p.m.) > = > = > Review request for hive, Jitendra Pandey and Eric Hanson. > = > = > Description > ------- > = > Timestamp UDFs for vectorized long nanosecond timestamps - all of them co= nvert timestamp into a corresponding long/int value (year, month, week-of-y= ear, day-of-month, hour, minute, second, unix-timestamp). > = > = > This addresses bug HIVE-4608. > https://issues.apache.org/jira/browse/HIVE-4608 > = > = > Diffs > ----- > = > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDF= DayOfMonthLong.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDF= HourLong.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDF= MinuteLong.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDF= MonthLong.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDF= SecondLong.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDF= TimestampFieldLong.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDF= UnixTimeStampLong.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDF= WeekOfYearLong.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDF= YearLong.java PRE-CREATION = > ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVecto= rTimestampExpressions.java PRE-CREATION = > = > Diff: https://reviews.apache.org/r/11530/diff/ > = > = > Testing > ------- > = > Unit tests included which compare each UDF against its non-vectorized one= 's output, with random data and year boundary data (+1,0,-1). > = > = > Thanks, > = > Gopal V > = > --===============6076732038657139173==--