pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball
Date Sat, 09 Jun 2012 19:25:42 GMT

    [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292415#comment-13292415

Cheolsoo Park commented on PIG-2745:

I also see the same issue with e2e Scripting tests where Jython UDF scripts are not found
in classpath. Applying the change that I described let those test pass as well.
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball
(not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode
Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/
-Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or
classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Now look at the job jar generated by Pig, and search for "scriptingudfs.rb" that the
error complains about.
> To save the job jar in /tmp, I had to comment out the following line in JobComtrolCompiler.java:

> {code}
> submitJarFile.deleteOnExit();
> {code}
> It can be seen that the absolute path of the script is stored in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" seems
supposed to be able to be found from the jar, but it is not. The reason is because getResourceAsStream("/x")
looks for "x" (without the leading "/") not "/x" in the jar. Since "scriptingudfs.rb" is stored
as the absolute path with the leading "/", it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test appears to pass if you run in local mode or from installed Pig. The
reason is because "scriptingudfs.rb" exists in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb),
so it is found in file system.
> The fix in UNIX seems straightforward. When registering UDF scripts, we can simply remove
the leading "/". For example,
> {code:title=src/org/apache/pig/PigServer.java}
> -        pigContext.addScriptFile(f.getPath());
> +        String key = f.isAbsolute() ? f.getPath().substring(1) : f.getPath();
> +        pigContext.addScriptFile(key, f.getPath());
> {code}
> This results in that the UDF scripts are stored without the leading "/" in the job jar
as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> But this won't work with Windows and S3 as their root dir is not "/".
> Alternatively, we could store the UDF scripts with the file name instead of the full
absolute path in the job jar. But this will disallow more than one UDF scripts with the same
name but in different paths to be registered.
> I am wondering if anyone has a better suggestion. Thanks!

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message