pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-928) UDFs in scripting languages
Date Sat, 17 Oct 2009 23:45:31 GMT

    [ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766984#action_12766984

Ashutosh Chauhan commented on PIG-928:

I did some quick benchmarking using BSF approach for UDFs written in Ruby, Python, Groovy
and native builtin in Pig. It's a standard wordcount example where udf tokenizes an input
string into number of words. I used pig sources(src/org/apache/pig) as input which has more
then 210K lines. Since, I haven't yet figured out type translation so to be consistent in
experiment, I passed data as String argument and return type as Object[] in all languages.
Following are the numbers I got averaged over 3 runs:


This shows Groovy-BSF combo is super-slow and Ruby and Python is much better. These numbers
must be seen as an absolute worst case. I believe type translations, compiling script in constructor
and using the compiled version instead of evaluating script in every exec() call will give
much better performance. Also, there might exist other optimizations.

Sometime next week, I will try to repeat the same experiment with javax.script

> UDFs in scripting languages
> ---------------------------
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>         Attachments: package.zip
> It should be possible to write UDFs in scripting languages such as python, ruby, etc.
 This frees users from needing to compile Java, generate a jar, etc.  It also opens Pig to
programmers who prefer scripting languages over Java.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message