hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Woody Anderson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-928) UDFs in scripting languages
Date Wed, 24 Feb 2010 19:57:28 GMT

    [ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837984#action_12837984

Woody Anderson commented on PIG-928:

yes, i've looked at both javax.script and BSF, both of which are not well designed for this
scenario (in my opinion).
This comes mostly from their extreme generality and that they do not seem to provide a way
to access and subsequently stash a consistent reference to a particular function. aka a pointer.

This is partly what allows direct use of the jython interpreter to be so fast. Each invocation
utilizes a function object directly, it does not have to give a name to an 'engine' which
looks up the function and decided appropriate call context, object context etc.
Those things are great, but not if you don't need them.
Perhaps someone can show me how those systems work much better than i have been able to utilize
them, but this approach allows the impl to be agnostic to these frameworks in a way that can
boost performance.
as you may have noticed, the js example uses javax.script, which BSF3 now conforms to, this
impl must populate an engine, and then use the function name over and over. this involves
more function name lookups and is less condusive to lamda functions etc.

bsf is also extremely easy to integrate under the hood in the same way, it has the same perf
costs as javax.script due to the hoop jumping. I tried this out while trying to make perl
work, but the perlengine is 6 years old and i was unable to get it to work, the bsf binding
part worked well enough though.

the reflection overhead is pretty minimal, and not really needed if the user writes the code
directly (they can simply use the appropriate package directly).
define spig_println_Tchararray_P1 org.apache.pig.scripting.Eval('js','println_Tchararray_P1','chararray','var
println_Tchararray_P1 = function(a0) { println(a0); };');
define spig_println_Tchararray_P1 org.apache.pig.scripting.js.Eval('println_Tchararray_P1','chararray','var
println_Tchararray_P1 = function(a0) { println(a0); };');

the top level Eval is there simply to allow factory based performance improvements that can
be created by knowledgeable implementers.

if the scriptengine frameworks provided nicer access to functions, and nicer call patterns
it would have been nicer to use them.

> UDFs in scripting languages
> ---------------------------
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>         Attachments: package.zip, scripting.tgz, scripting.tgz
> It should be possible to write UDFs in scripting languages such as python, ruby, etc.
 This frees users from needing to compile Java, generate a jar, etc.  It also opens Pig to
programmers who prefer scripting languages over Java.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message