pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball
Date Mon, 18 Jun 2012 04:28:42 GMT

    [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393660#comment-13393660

Cheolsoo Park commented on PIG-2745:

Hi Daniel, thanks for submitting my patch!

I am wondering why you think that the issue with relative paths for Python still exists. In
my YARN cluster, the Scripting_* tests (excluded due to MAPREDUCE-3700) all pass. (Technically,
I am using Hadoop-2.0.0, but that shouldn't make a difference.) I can also manually verify
that it works in Grunt shell.

My fix shouldn't be Ruby-specific since the problem is with PigServer stuffing any UDF scripts
into the job jar.

Looking at your test code, one thing that I haven't thought about is "../" although that shouldn't
be an issue now as in the registerCode() method, relative paths are always converted to absolute
paths by FileLocalizer.fetchFile(). Nevertheless, handling "../" as well might be a good idea
to make that method more robust.

> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.11, 0.10.1
>         Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball
(not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode
Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/
-Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or
classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed
to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x")
looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with
it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is
because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when
registering UDF scripts so that they are stored without the leading "/" in the job jar as
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message