hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-928) UDFs in scripting languages
Date Fri, 16 Oct 2009 22:29:31 GMT

    [ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766746#action_12766746

Alan Gates commented on PIG-928:

I ran some quick and sloppy performance tests on this.  I ran it using both BSF and direct
bindings to groovy.  I also ran it using the builtin TOKENIZE function in Pig.  I had it read
5000 lines of text.  The groovy (or TOKENIZE) functions handle splitting the line, then we
do a standard group/count to count the words.  I got the following results:

Groovy using BSF:  55.070 seconds
Groovy direct bindings:  58.560 seconds
TOKENIZE:  2.554 seconds

So a 30x slow down using this.  That's pretty painful.  I know string translation between
languages can be bad.  I don't know how much of this is inter-language bindings and how much
is groovy.  When i get  chance I'll try this in Python and see if I get similar numbers.

> UDFs in scripting languages
> ---------------------------
>                 Key: PIG-928
>                 URL: https://issues.apache.org/jira/browse/PIG-928
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>         Attachments: package.zip
> It should be possible to write UDFs in scripting languages such as python, ruby, etc.
 This frees users from needing to compile Java, generate a jar, etc.  It also opens Pig to
programmers who prefer scripting languages over Java.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message