hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-928) UDFs in scripting languages
Date Thu, 15 Oct 2009 01:59:31 GMT

    [ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765860#action_12765860

Alan Gates commented on PIG-928:

Questions that we need to answer to get this patch ready for commit:

1) How do we do type conversion?  The current patch assumes a single string input and output.
 We'll want to be able to do conversions from scripting languages to pig types that make sense.
 How this can be done is tied up with #2 below.

2) Do we do this using the Bean Scripting Framework or with specific bindings for each language?
 This patch shows how to do the specific bindings for Groovy.  It can be done for Jython,
and I'm reasonably sure it can be done for JRuby.  The obvious advantage of using the BSF
is we get all the languages they support for free.  We need to understand the performance
costs of each choice.  We should be able to use the existing patch to test the difference
between using the BSF and direct Groovy bindings.  Also, it seems like type conversions will
be much easier to do if we use specific bindings, as we can do explicit type mappings for
each language.  Perhaps this is possible with BSF, but I'm not sure how.

3) Grammer for how to declare these.  I propose that we allow two options:  inlined in define
and file referenced in define.  So these would roughly look like:

define myudf ScriptUDF('groovy', 'return input.get(0).split();');
define myudf ScriptUDF('python', myudf.py);

We could also support inlining in the Pig Latin itself, something like:

B = foreach A generate {'groovy', 'return input.get(0).split();');};

I'm not a fan of this type of inlining, as I think it makes the code hard to read.

> UDFs in scripting languages
> ---------------------------
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>         Attachments: package.zip
> It should be possible to write UDFs in scripting languages such as python, ruby, etc.
 This frees users from needing to compile Java, generate a jar, etc.  It also opens Pig to
programmers who prefer scripting languages over Java.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message