hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arnab Nandi (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-928) UDFs in scripting languages
Date Mon, 24 May 2010 10:42:28 GMT

     [ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arnab Nandi updated PIG-928:
----------------------------

    Attachment: pig.scripting.patch.arnab
                test.zip
                calltrace.png

Building on Julien's and Woody's code, this patch provides pluggable scripting support in
native Pig.

##Syntax:##

register 'test.py' USING org.apache.pig.scripting.jython.JythonScriptEngine;

This makes all functions inside test.py available as Pig functions.

##Things in this patch: ##

1. Modifications to parser .jjt file

2. ScriptEngine abstract class and Jython instantiation. 

3. Ability to ship .py files similar to .jars, loaded on demand.

4. Input checking and Schema support.


##Things NOT in this patch: ##

1. Inline code support: (Replace 'test.py' with `multiline inline code`, prefer to submit
as separate bug)

2. Scripting engines and examples other than Jython(e.g. beanshell and rhino)

3. Junit-based test harness (provided as test.zip)

4. Python<->Pig Object transforms are not very efficient (see calltrace.zip). Preferred
the cleaner implementation first. (non-obvious optimizations such as object reuse can be introduced
as separate bug)


##Notes: ##

1. I went with "register" instead of "define" since files can contain multiple functions,
similar to .jars. imho this makes more sense, using define would introduce the concept of
"codeblock aliases" and function names would look like "alias.functionName()", which is possible
but inconsistent since we cannot have "alias2.functionName()" (which would require separate
interpreter instances, etc etc).

2. This has been tested both locally and in mapred mode.

3. We assume .py files are simply a list of functions. Since the entire file is loaded, you
can have dependent functions. No effort is made to resolve imports, though.

4. You'll need to add jython.jar into classpath, or compile it into pig.jar.


Would love comments and code-followups!


> UDFs in scripting languages
> ---------------------------
>
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>             Fix For: 0.8.0
>
>         Attachments: calltrace.png, package.zip, pig-greek.tgz, pig.scripting.patch.arnab,
pyg.tgz, scripting.tgz, scripting.tgz, test.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, ruby, etc.
 This frees users from needing to compile Java, generate a jar, etc.  It also opens Pig to
programmers who prefer scripting languages over Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message