pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2317) Ruby/Jruby UDFs
Date Sat, 03 Mar 2012 22:28:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221720#comment-13221720
] 

jiraposter@reviews.apache.org commented on PIG-2317:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4091/#review5601
-----------------------------------------------------------


Thanks Jonathan for this contribution. A lot of good stuff in there.
Please add comments when using Ruby specific mechanisms in there (child classes notification,
Class naming after the variable it is affected to, ...) so that less Ruby savvy contributors
can follow.


trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12179>

    maybe you need to separate the top level ruby script to be used outside of pig and the
one used inside using includes?



trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12208>

    please add doc on public functions



trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12180>

    @function_to_register["GETCLASSFROMOBJECT"] could be replaced by a member @current_get_class_from_object



trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12214>

    you could define a class for this



trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12209>

    - please add a comment explain the mechanism that names the class after the variable it
is assigned to.



trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12210>

    Add a comment to explain that classes inheriting PigUdf will automatically get registered
here.



trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12211>

    I would add a separate field for schema func instead of this convention.



trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12181>

    I would throw an exception if both @schema and @functions_to_register[id] are set.



trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12182>

    You could call it RegisterDescendentsUDF or a similar name as this is a based class for
UDFs.
    The descendents registration is common to the PigUDF as well. Do you want to unify the
class hierarchy? 



trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12184>

    ruby.getClass("DataBag") could be done only once



trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12183>

    throw an Exception here



trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12185>

    I would override OutputSchema instead. In the case of a bag you are losing the schema
of the Tuple inside the bag.



trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12212>

    If you can add the name of the function in the message that would help with debugging.



trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12186>

    you could factor out the function calling and bag instantiated code.



trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12215>

    some of the ruby calling logic and exception catching could be factored in this class



trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12187>

    Override outputSchema instead like you did for EvalFunc



trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12213>

    private



trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12189>

    check for the varargs case



trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12190>

    you should throw an exception here



trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java
<https://reviews.apache.org/r/4091/#comment12217>

    It is very similar to JrubyEvalFunc. You can factor out more of this code.



trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java
<https://reviews.apache.org/r/4091/#comment12216>

    private



trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java
<https://reviews.apache.org/r/4091/#comment12218>

    turn these comments into javadoc



trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java
<https://reviews.apache.org/r/4091/#comment12192>

    this will strip the inner schema of bags and tuples. See comments about overriding outputSchema



trunk/src/org/apache/pig/scripting/jruby/JrubyUtils.java
<https://reviews.apache.org/r/4091/#comment12219>

    please add javadoc to public methods



trunk/src/org/apache/pig/scripting/jruby/PigJrubyLibrary.java
<https://reviews.apache.org/r/4091/#comment12193>

    add javadoc



trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java
<https://reviews.apache.org/r/4091/#comment12194>

    please make it private and at the top 



trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12195>

    add javadoc



trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12196>

    make it private
    1 is a perfectly good value. it does not need to be unique.
    Changing the serialVersionUID value will make the serialized data format incompatible.



trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12197>

    in this case addAll would replace the current content. Is it intended ?



trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12220>

    want to remove it ?



trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12198>

    internalDB.add(JrubyUtils.rubyToPig((RubyArray)elem));



trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12199>

    you could call runtime.getClass("DataBag").getClass("BagIterator") once at the beginning



trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12200>

    add spaces around operator and prefer the normalized "if" with { }
    
    if () {
    //
    }



trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12201>

    flattten with 3 Ts? It's going to be really flat.



trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
<https://reviews.apache.org/r/4091/#comment12202>

    make this a javadoc comment



trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
<https://reviews.apache.org/r/4091/#comment12203>

    private



trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
<https://reviews.apache.org/r/4091/#comment12204>

    space after comma



trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
<https://reviews.apache.org/r/4091/#comment12205>

    give meaningful names to parameters



trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb
<https://reviews.apache.org/r/4091/#comment12221>

    What about keeping only the one parameter ouptutSchema and outputSchemaFunctions?
    Users would just use the function right before defining the UDF.
    
      outputSchema "word:chararray"
      def concat input, input2
        return input + input2
      end



trunk/test/org/apache/pig/test/TestUDF.java
<https://reviews.apache.org/r/4091/#comment12222>

    if this value is known you may as well compare each value to the known constant instead
of comparing to the previous value.


- Julien


On 2012-02-28 22:02:52, Jonathan Coveney wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4091/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-28 22:02:52)
bq.  
bq.  
bq.  Review request for pig, Julien Le Dem and Alan Gates.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This is the reviewboard for the following:
bq.  
bq.  https://issues.apache.org/jira/browse/PIG-2317
bq.  
bq.  
bq.  This addresses bug PIG-2317.
bq.      https://issues.apache.org/jira/browse/PIG-2317
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/src/org/apache/pig/AccumulatorEvalFunc.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/AlgebraicEvalFunc.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/ScriptEngine.java 1294756 
bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java PRE-CREATION

bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java PRE-CREATION 
bq.    trunk/bin/pig 1294756 
bq.    trunk/ivy.xml 1294756 
bq.    trunk/ivy/libraries.properties 1294756 
bq.    trunk/pigudf.rb PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyUtils.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/PigJrubyLibrary.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/RubyExampleA.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/RubyExampleB.java PRE-CREATION 
bq.    trunk/test/e2e/pig/build.xml 1294756 
bq.    trunk/test/e2e/pig/conf/default.conf 1294756 
bq.    trunk/test/e2e/pig/drivers/TestDriverPig.pm 1294756 
bq.    trunk/test/e2e/pig/tests/nightly.conf 1294756 
bq.    trunk/test/e2e/pig/udfs/ruby/morerubyudfs.rb PRE-CREATION 
bq.    trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb PRE-CREATION 
bq.    trunk/test/org/apache/pig/test/TestUDF.java 1294756 
bq.    trunk/test/org/apache/pig/test/utils/HelperEvalFuncUtils.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4091/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  e2e tests can be run as follows:
bq.  
bq.  ant -Dharness.hadoop.home=<path_to_hadoop> -Dharness.old.pig=<path_to_old_pig>
test-e2e-deploy-local
bq.  ant -Dharness.hadoop.home=<path_to_hadoop> -Dharness.old.pig=<path_to_old_pig>
-Dtests.to.run="-t RubyUDFs" test-e2e-local
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> Ruby/Jruby UDFs
> ---------------
>
>                 Key: PIG-2317
>                 URL: https://issues.apache.org/jira/browse/PIG-2317
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Jacob Perkins
>            Assignee: Jonathan Coveney
>            Priority: Minor
>         Attachments: PIG-2317-8.patch, PigUdf.rb, PigUdf.rb, jruby_scripting.patch, jruby_scripting_2_real.patch,
jruby_scripting_3.patch, jruby_scripting_4.patch, jruby_scripting_5.patch, jruby_scripting_6.patch,
jruby_scripting_7.patch, pigjruby.rb, pigjruby.rb, pigjruby.rb, pigudf.rb
>
>
> It should be possible to write UDFs in Ruby. These UDFs will be registered in the same
way as python and javascript UDFs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message