hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johnny Zhang" <xiao...@cloudera.com>
Subject Re: Review Request: implement a udf to keep hive session alive for certain amount of time
Date Fri, 09 Nov 2012 23:08:30 GMT


> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java,
line 36
> > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line36>
> >
> >     I don't have a strong opinion about this but is sleep the right name for this
UDF? Sleep is how this UDF keeps the Hive session alive but it might not convey to a user
what this UDF does. How about something like session_keep_alive? I am open to other suggestions
as well. 
> >     
> >     Again, not a deal-breaker:-) However, if you do decide to change the name, don't
forget to change all references of "sleep" in the code (log statements, exception messages,
etc.).

the reason give a name 'sleep' is because Hadoop used to have a similar example job http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-examples/0.20.2-737/org/apache/hadoop/examples/SleepJob.java
which does nothing but keeping running a MR job, which does nothing. 
Let's see what's other people's opinion


> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java,
line 37
> > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line37>
> >
> >     Specify in the explain statement what the units of the duration being specified
are (seconds?)

agree, will fix it


> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java,
line 41
> > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line41>
> >
> >     Better to use GenericUDFSleep.class as argument

agree, will fix it


> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java,
line 52
> > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line52>
> >
> >     1. A better exception to throw here is UDFArgumentLengthException
> >     2. It's always nice to see as a user what was the expected and the actual value
when something goes wrong. Consider printing out the type of the argument received in the
exception message. This type can be retrieved by arguments[0].getTypeName()

agree that need to use 'arguments[0].getTypeName()' to print out what's the argument type
of user input, will fix it.


> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java,
line 52
> > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line52>
> >
> >     I am being nitpicky here but a better exception to throw here would be: UDFArgumentTypeException.
Also, when seeing an error message as a user, it's always nice to contrast the actual vs.
expected. Here is the expected type is int but it will nice to print out the type of the argument
that the UDF received. You can retrieve by arguments[i].getTypeName()

agree, will fix it


> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java,
line 55
> > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line55>
> >
> >     The UDF is returning a Map<Int, Int> even though you don't really want
to return anything. I think you should use a void object inspector. For details, look at http://svn.apache.org/viewvc/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorFactory.java?view=markup

agree, will change it to return PrimitiveObjectInspectorFactory.writableStringObjectInspector
since will print message in the end


> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java,
line 62
> > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line62>
> >
> >     Better to use ObjectInspectorConverter to avoid the string parsing penalty.
> >     
> >     For reference, take a look at how this UDF reads an integer argument:
> >     http://svn.apache.org/viewvc/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFElt.java?view=markup

agree, will fix it.


> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java,
line 71
> > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line71>
> >
> >     Any particular reason why we don't just Thread.sleep(numLoop * 1000) without
any loops? Is that because we want to log every 4 seconds?

yes, that's the reason. Just want to print something so that people know what's going on,
especially when it sleeps for a while.


> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java,
line 74
> > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line74>
> >
> >     In my opinion, InterruptedException should be wrapped into a HiveException and
then thrown.

agree, will fix it


> On Nov. 4, 2012, 4:59 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java,
line 64
> > <https://reviews.apache.org/r/7848/diff/1/?file=185096#file185096line64>
> >
> >     Since the argument is being divided by 2 in line 62, does this mean we are only
sleeping for received_argument/2 seconds? Am I missing something?

the reason I divide it by 2 is because I found it the evaluate() function run before and during
the MR job. As you suggested offline, after adding annotation @UDFType(deterministic = false),
I am be able to make it run only during the MR job, so divide by 2 is not needed anymore.
Thanks.


- Johnny


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7848/#review13082
-----------------------------------------------------------


On Nov. 3, 2012, 2:56 a.m., Johnny Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7848/
> -----------------------------------------------------------
> 
> (Updated Nov. 3, 2012, 2:56 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> To make testing issues like HIVE-3590 convenient, we can implement a UDF to keep hive
session alive for a given time. The patch introduce a new UDF sleep() which does this without
introducing any data/load to cluster.
> 
> 
> This addresses bug HIVE-3666.
>     https://issues.apache.org/jira/browse/HIVE-3666
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
1405251 
>   http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSleep.java
PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7848/diff/
> 
> 
> Testing
> -------
> 
> have tested it with Hive CLI and Hive Server session, and it can keep them alive by the
given seconds
> 
> 
> Thanks,
> 
> Johnny Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message