hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1697) Document the behavior of -file option in streaming
Date Mon, 31 May 2010 11:44:38 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873625#action_12873625

Amareshwari Sriramadasu commented on MAPREDUCE-1697:

Streaming -info (StreamJob.exitUsage) says
-file     <file>     File/dir to be shipped in the Job jar file
When I tried passing a directory through -file option, the contents of directory are added
to the job jar, not the directory itself. 
After MAPREDUCE-967, because the contents of the passed directory are not added to the *jar
unpack pattern*, the files/dirs inside the passed directory are not unjarred. Thus they are
not symlinked from cwd of the task. I raised MAPREDUCE-1826 for this. We can update "behavior
of passing a directory through -file option" in MAPREDUCE-1826 itself.

Along with documentation changes, I would like to deprecate the -file option in this jira
in favor of MAPREDUCE-574.

> Document the behavior of -file option in streaming
> --------------------------------------------------
>                 Key: MAPREDUCE-1697
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1697
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming, documentation
>    Affects Versions: 0.20.1
>            Reporter: Amareshwari Sriramadasu
>             Fix For: 0.21.0, 0.22.0
>         Attachments: patch-1697-1.txt, patch-1697.txt
> The behavior of -file option in streaming is not documented anywhere.
> The behavior of -file is the following :
> 1) All the files passed through  -file option are packaged into job.jar.
> 2) If -file option is used for .class or .jar files, they are unjarred on tasktracker
and placed in ${mapred.local.dir}/taskTracker/jobcache/job_ID/jars/classes or /lib, respectively.
Symlinks to the directories classes and lib are created from the cwd of the task, . The names
of symlinks are "classes", "lib". So file names of .class or .jar files do not appear in cwd
of the task. 
> Paths to these files are automatically added to classpath. The tricky part is that hadoop
framework can pick .class or .jar using classpath, but actual mapper script cannot. If you'd
like to access these .class or .jar inside script, please do something like "java -cp lib/*;classes/*
> 3) If -file option is used for files other than .class or .jar (e.g, .txt or .pl), these
files are unjarred into ${mapred.local.dir}/taskTracker/jobcache/job_ID/jars/. Symlinks to
these files are created from the cwd of the task. Names of these symlinks are actually file

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message