pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2317) Ruby/Jruby UDFs
Date Tue, 20 Mar 2012 20:11:47 GMT

    [ https://issues.apache.org/jira/browse/PIG-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233708#comment-13233708
] 

jiraposter@reviews.apache.org commented on PIG-2317:
----------------------------------------------------



bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > Thanks Jonathan for this contribution. A lot of good stuff in there.
bq.  > Please add comments when using Ruby specific mechanisms in there (child classes
notification, Class naming after the variable it is affected to, ...) so that less Ruby savvy
contributors can follow.

I'm going to update the JIRA with some of the broader comments.

My todo at this point is:
- Polish up the Javadocs (I should probably add @params and whatnot)
- Clean up the naming convention in the Ruby API defined in Java
- Add more test coverage (I added a couple big features, they need to be used more in the
e2e tests especially)
- Add varargs support

That said, the tests pass and it seems to work. The API feels quite clean to me.


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java, line 41
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86324#file86324line41>
bq.  >
bq.  >     ruby.getClass("DataBag") could be done only once

It could, but there is a caveat: the Ruby runtime HAS to be identical within calls, or you're
going to run into huge issues. The getClass call isn't particularly heinous, and I think it's
cleaner to leave in. For the time being, all of the Ruby runtimes are identical, but in the
future there may be multiple (a different runtime per UDF or per script registered), so this
is safer.


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/pigudf.rb, line 110
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line110>
bq.  >
bq.  >     You could call it RegisterDescendentsUDF or a similar name as this is a based
class for UDFs.
bq.  >     The descendents registration is common to the PigUDF as well. Do you want to
unify the class hierarchy?

Agreed, I renamed it


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/pigudf.rb, line 87
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line87>
bq.  >
bq.  >     I would add a separate field for schema func instead of this convention.

I split it out into another class like you suggested


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/pigudf.rb, line 41
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line41>
bq.  >
bq.  >     - please add a comment explain the mechanism that names the class after the
variable it is assigned to.

added a ton of comments


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/pigudf.rb, line 26
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line26>
bq.  >
bq.  >     you could define a class for this

Agreed


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/pigudf.rb, line 6
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line6>
bq.  >
bq.  >     maybe you need to separate the top level ruby script to be used outside of pig
and the one used inside using includes?

I was able to clean this up significantly in multiple ways. Will document how in the JIRA.


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java, line 65
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86324#file86324line65>
bq.  >
bq.  >     I would override OutputSchema instead. In the case of a bag you are losing the
schema of the Tuple inside the bag.

Wisdom of the ages. Done :)


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java, line 86
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86325#file86325line86>
bq.  >
bq.  >     some of the ruby calling logic and exception catching could be factored in this
class

Agreed...the win felt small, I will take this into account, but it's not a priority


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java, line 60
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86326#file86326line60>
bq.  >
bq.  >     check for the varargs case

I'll make varargs a top priority for the next round of changes... didn't seem super pressing
atm, but maybe it is.


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java, line 15
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86327#file86327line15>
bq.  >
bq.  >     It is very similar to JrubyEvalFunc. You can factor out more of this code.

I suppose this begs the question of why we even have FilterFunc anymore, given that it just
extends EvalFunc<Boolean> now. I treated it separately because I wasn't sure about that
(I could just make FilterFuncs EvalFunc<Boolean>s and be done with it)... it seems like
something we should change in Pig (ie abolish FilterFuncs). But if FilterFunc is treater separately,
I want the two classes... I suppose I could pull out the functionality, and should, but I'd
like to have some resolution on that, because it'll guide how I do so.


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java, line 107
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86328#file86328line107>
bq.  >
bq.  >     this will strip the inner schema of bags and tuples. See comments about overriding
outputSchema

Can you avoid this with Algebraic UDFs? Fixed for Accumulators


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java, line 67
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86331#file86331line67>
bq.  >
bq.  >     please make it private and at the top

I don't even ned it, I think


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java, line 40
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86332#file86332line40>
bq.  >
bq.  >     make it private
bq.  >     1 is a perfectly good value. it does not need to be unique.
bq.  >     Changing the serialVersionUID value will make the serialized data format incompatible.

you're correct. I think this was an artifact from a previous implementation and isn't actually
necessary. good to know, though...


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java, line 104
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86332#file86332line104>
bq.  >
bq.  >     in this case addAll would replace the current content. Is it intended ?

Good call. It was a bug. I removed addAll and just made the functionality a part of add.


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java, line 149
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86332#file86332line149>
bq.  >
bq.  >     you could call runtime.getClass("DataBag").getClass("BagIterator") once at the
beginning

I eliminated BagIterator, it didn't seem necessary. That said, re: the getClass calls, see
above.


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java, line 98
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86333#file86333line98>
bq.  >
bq.  >     give meaningful names to parameters

the issue with ruby parameters is that a lot of the functions can take multiple types, etc,
and the meaning changes depending. I could handle this a couple of ways.

1) in the case where there is only one option, use that
2) make better use of @param javadoc
3) in the case where there are multiple possibilities, make sure to explicitly create an object
that is named to reflect what it is supposed to be

Would appreciate your thoughts on the matter


bq.  On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq.  > trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb, line 32
bq.  > <https://reviews.apache.org/r/4091/diff/1/?file=86341#file86341line32>
bq.  >
bq.  >     What about keeping only the one parameter ouptutSchema and outputSchemaFunctions?
bq.  >     Users would just use the function right before defining the UDF.
bq.  >     
bq.  >       outputSchema "word:chararray"
bq.  >       def concat input, input2
bq.  >         return input + input2
bq.  >       end

it'd be good to test both, agreed


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4091/#review5601
-----------------------------------------------------------


On 2012-02-28 22:02:52, Jonathan Coveney wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4091/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-28 22:02:52)
bq.  
bq.  
bq.  Review request for pig, Julien Le Dem and Alan Gates.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This is the reviewboard for the following:
bq.  
bq.  https://issues.apache.org/jira/browse/PIG-2317
bq.  
bq.  
bq.  This addresses bug PIG-2317.
bq.      https://issues.apache.org/jira/browse/PIG-2317
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/src/org/apache/pig/AccumulatorEvalFunc.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/AlgebraicEvalFunc.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/ScriptEngine.java 1294756 
bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java PRE-CREATION

bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java PRE-CREATION 
bq.    trunk/bin/pig 1294756 
bq.    trunk/ivy.xml 1294756 
bq.    trunk/ivy/libraries.properties 1294756 
bq.    trunk/pigudf.rb PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/JrubyUtils.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/PigJrubyLibrary.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/RubyExampleA.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/scripting/jruby/RubyExampleB.java PRE-CREATION 
bq.    trunk/test/e2e/pig/build.xml 1294756 
bq.    trunk/test/e2e/pig/conf/default.conf 1294756 
bq.    trunk/test/e2e/pig/drivers/TestDriverPig.pm 1294756 
bq.    trunk/test/e2e/pig/tests/nightly.conf 1294756 
bq.    trunk/test/e2e/pig/udfs/ruby/morerubyudfs.rb PRE-CREATION 
bq.    trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb PRE-CREATION 
bq.    trunk/test/org/apache/pig/test/TestUDF.java 1294756 
bq.    trunk/test/org/apache/pig/test/utils/HelperEvalFuncUtils.java PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4091/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  e2e tests can be run as follows:
bq.  
bq.  ant -Dharness.hadoop.home=<path_to_hadoop> -Dharness.old.pig=<path_to_old_pig>
test-e2e-deploy-local
bq.  ant -Dharness.hadoop.home=<path_to_hadoop> -Dharness.old.pig=<path_to_old_pig>
-Dtests.to.run="-t RubyUDFs" test-e2e-local
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> Ruby/Jruby UDFs
> ---------------
>
>                 Key: PIG-2317
>                 URL: https://issues.apache.org/jira/browse/PIG-2317
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Jacob Perkins
>            Assignee: Jonathan Coveney
>            Priority: Minor
>         Attachments: PIG-2317-8.patch, PIG-2317-8_plus.patch, PigUdf.rb, PigUdf.rb, jruby_scripting.patch,
jruby_scripting_2_real.patch, jruby_scripting_3.patch, jruby_scripting_4.patch, jruby_scripting_5.patch,
jruby_scripting_6.patch, jruby_scripting_7.patch, pigjruby.rb, pigjruby.rb, pigjruby.rb, pigudf.rb
>
>
> It should be possible to write UDFs in Ruby. These UDFs will be registered in the same
way as python and javascript UDFs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message