pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Coveney (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2317) Ruby/Jruby UDFs
Date Tue, 20 Mar 2012 21:03:40 GMT

    [ https://issues.apache.org/jira/browse/PIG-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233743#comment-13233743

Jonathan Coveney commented on PIG-2317:

OK! I just uploaded a new diff (which incorporates Daniel's changes). It may be possible to
undo some of that, actually... I'll explain some of the big new changes. First, a todo list:
- Need to add more e2e tests
- Need to add more traditional tests
- Need to make the Javadocs more robust with @params and whatnot
- Need to add varargs support (this is the only feature that is missing, AFAIK)
- I have some TODO's littered about...need to clean those up

In general, there is a LOT more commenting, and I tried to be super explicit on the Ruby side
of things.

I significantly cleaned up and simplified pigudf.rb, taking into account comments from Julien.
I simplified the mechanisms at play as far as I could.

pigudf.rb is in src/main/jruby/

Now, in order to get access to the Pig library, all you have to do is "require 'pig'", which
imho is awesome: you just require pig, and you get everything! It's super clean. The unclean
part of it is the way it works. If you do "require 'name.jar'", then JRuby looks for NameService.java
in the base of the jar. If you do "require 'path/to/name.jar'", it'll look for path.to.NameService.java.
Either way, this is the reason why I had to add src/PigService.java. IMHO the win is worth
it, as it is super clean. In JRuby 1.7.0 there is a proposal to use the jar manifest to deal
with this, and it's something I've brought up with them and something that will happen. 1.7
should also remove the need for a hack described below.

I got rid of the BagIterator, as it didn't make much sense. In this implementation, it makes
more sense just to iterate on the DataBag object in Ruby directly, as it hides the pain (this
pattern is repeated in Schema).

HACK ALERT: for people who know ruby, generally if you include 'Enumerable', and implement
each, you can do "obj.each" and it will give you an enumerator object. This is useful for
chaining together functions that enumerate over the object and change it in some way. Either
way, JRuby 1.6.7 has a method that provides exactly this functionality...but they forgot to
give it public permissions (it's just static enumeratorize(Blahblahblah)). I worked hard to
try and get around the need for this, but it does it so cleanly and doing it any other way
is such a pain (I haven't found a good one), that I used reflection to get around the permissions.
I felt ok doing this because the 1.7.0 branch makes this explicitly public -- it was just
an oversight.

Accumulator now uses outputSchema, as it always should have.

One (surprisingly long) addition is a Ruby interface for Schema objects! It protects the user
from the Schema/FieldSchema divide, and makes it really easy to mix String schema declarations
and a Schema object that is input. I will post more depth about this later, but I think my
time would be better served fixing the javadocs and the tests atm.
> Ruby/Jruby UDFs
> ---------------
>                 Key: PIG-2317
>                 URL: https://issues.apache.org/jira/browse/PIG-2317
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Jacob Perkins
>            Assignee: Jonathan Coveney
>            Priority: Minor
>         Attachments: PIG-2317-8.patch, PIG-2317-8_plus.patch, PIG-2317-9.patch, PigUdf.rb,
PigUdf.rb, jruby_scripting.patch, jruby_scripting_2_real.patch, jruby_scripting_3.patch, jruby_scripting_4.patch,
jruby_scripting_5.patch, jruby_scripting_6.patch, jruby_scripting_7.patch, pigjruby.rb, pigjruby.rb,
pigjruby.rb, pigudf.rb
> It should be possible to write UDFs in Ruby. These UDFs will be registered in the same
way as python and javascript UDFs.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message