hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aniket Mokashi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-928) UDFs in scripting languages
Date Thu, 22 Jul 2010 21:53:54 GMT

    [ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891357#action_12891357

Aniket Mokashi commented on PIG-928:

bq. I am still not convinced about the changes required in POUserFunc. That logic should really
be a part of pythonToPig(pyObject). If python UDF is returning byte[], it should be turned
into DataByteArray before it gets back into Pig's pipeline. And if we do that conversion in
pythonToPig() (which is a right place to do it) we will need no changes in POUserFunc.
I agree that it is better to move computation on JythonFunction side (JythonUtils) for type
checking and should provide more type safety to avoid user defined types complexity. But I
would still go for changes in POUserFunc for result.result for the case defined in above example
(removing byte[] scenario).
bq. Instead of instanceof, doing class equality test will be a wee-bit faster. Like instead
of (pyObject instanceof PyDictionary) do pyobject.getClass() == PyDictionary.class. Obviously,
it will work when you know exact target class and not for the derived ones.
Jython code has derived classes for each of the basic Jython types, though they aren't used
for most of the types as of now, they may start returning these derived objects (PyTupleDerived)
in their future implementation, in which case we might break our code. Also, PyLongDerived
are already used inside the code. __tojava__ function just returns the proxy java object until
we ask for a specific type of object. I think its better to use instanceof instead of class
equality here.
bq. For register command, we need to test not only for functionality but for regressions as
well. Look at TestGrunt.java in test package to get an idea how to write test for it.
Code path for .jar registration is identical to old code, except that it doesnt "use" any
engine or namespace.
bq. Also what will happen if user returned a nil python object (null equivalent of Java) from
UDF. It looks to me that will result in NPE. Can you add a test for that and similar test
case from pigToPython()
A java null object will be turned into PyNone object but __tojava__ function will always returns
the special object Py.NoConversion  if this PyObject can not be converted to the desired Java

> UDFs in scripting languages
> ---------------------------
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>            Assignee: Aniket Mokashi
>             Fix For: 0.8.0
>         Attachments: calltrace.png, package.zip, PIG-928.patch, pig-greek.tgz, pig.scripting.patch.arnab,
pyg.tgz, RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch,
RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch,
RegisterPythonUDFFinale5.patch, RegisterPythonUDFLatest.patch, RegisterScriptUDFDefineParse.patch,
scripting.tgz, scripting.tgz, test.zip
> It should be possible to write UDFs in scripting languages such as python, ruby, etc.
 This frees users from needing to compile Java, generate a jar, etc.  It also opens Pig to
programmers who prefer scripting languages over Java.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message