hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-928) UDFs in scripting languages
Date Tue, 15 Jun 2010 23:34:29 GMT

    [ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879181#action_12879181
] 

Alan Gates commented on PIG-928:
--------------------------------

I propose the following syntax for register:

{code}
REGISTER _filename_ [USING _class_ [AS _namespace_]]
{code}

This is backwards compatible with the current version of register.

_class_ in the USING clause would need to implement a new interface ScriptEngine (or something)
which would be used to interpret the file.  If no USING clause is
given, then it is assumed that _filename_ is a jar.  I like this better than the 'lang python'
option we had earlier because it allows users to add new engines
without modifying the parser.  We should however provide a pre-defined set of scripting engines
and names, so that for example python translates to
org.apache.pig.script.jython.JythonScriptingEngine

If the AS clause is not given, then the basename of _filename_ defines the namespace name
for all functions defined in that file.  This allows us to avoid
function name clashes.  If the AS clause is given, this defines an alternate namespace.  This
allows us to avoid name clashes for filenames.  Functions would
have to be referenced by full namespace names, though aliases can be given via DEFINE.

Note that the AS clause is a sub-clause of the USING clause, and cannot be used alone, so
there is no ability to give namespaces to jars.

As far as I can tell there is no need for a SHIP clause in the register.  Additional python
modules that are needed can be registered.  As long as Pig lazily
searches for functions and does not automatically find every function in every file we register,
this will work fine.

So taken altogether, this would look like the following.  Assume we have two python files
{{/home/alan/myfuncs.py}}

{code}
import mymod

def a():
    ...

def b():
    ...
{code}

and {{/home/bob/myfuncs.py}}:

{code}
def a():
    ...

def c():
    ...
{code}

and the following Pig Latin

{code}
REGISTER /home/alan/myfuncs.py USING python;
REGISTER /home/alan/mymod.py; -- no need for USING since I won't be looking in here for files,
it just has to be moved over
REGISTER /home/bob/myfuncs.py  USING python AS hisfuncs;

DEFINE b myfuncs.b();

A = LOAD 'mydata' as (x, y, z);
B = FOREACH A GENERATE myfuncs.a(x), b(y), hisfuncs.a(z);
...
{code}



> UDFs in scripting languages
> ---------------------------
>
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>            Assignee: Aniket Mokashi
>             Fix For: 0.8.0
>
>         Attachments: calltrace.png, package.zip, pig-greek.tgz, pig.scripting.patch.arnab,
pyg.tgz, RegisterPythonUDF2.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz,
test.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, ruby, etc.
 This frees users from needing to compile Java, generate a jar, etc.  It also opens Pig to
programmers who prefer scripting languages over Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message