hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prachi Gupta (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1853) multiple -cacheFile option in hadoop streaming does not seem to work
Date Thu, 06 Sep 2007 21:44:28 GMT
multiple -cacheFile option in hadoop streaming does not seem to work 
---------------------------------------------------------------------

                 Key: HADOOP-1853
                 URL: https://issues.apache.org/jira/browse/HADOOP-1853
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/streaming
            Reporter: Prachi Gupta


Specifying one -cacheFile option in hadoop streaming works. Specifying more than one, gives
a parse error. A patch to fix this and a unit test to test the fix has been attached with
this bug. Following is an example of this bug:

This works:
-----------------------
[hod] (parthas) >> stream -input "/user/parthas/test/tmp.data" -mapper 
"testcache.py abc" -output "/user/parthas/qc/exp2/filterData/subLab/0" 
-file "/home/parthas/proj/qc/bin/testcache.py" -cacheFile 
'hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.qlab.head#abc' 
-jobconf mapred.map.tasks=1 -jobconf 
mapred.job.name="SubByLabel-101-0.ulab.aa" -jobconf numReduceTasks=0
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/home/parthas/proj/qc/bin/testcache.py, 
/export/crawlspace/kryptonite/hod/tmp/hod-1467-tmp/hadoop-unjar56313/] 
[] /tmp/streamjob56314.jar tmpDir=null
07/07/25 16:51:31 INFO mapred.FileInputFormat: Total input paths to 
process : 1
07/07/25 16:51:32 INFO streaming.StreamJob: getLocalDirs(): 
[/export/crawlspace/kryptonite/hod/tmp/hod-1467-tmp/mapred/local]
07/07/25 16:51:32 INFO streaming.StreamJob: Running job: job_0006
07/07/25 16:51:32 INFO streaming.StreamJob: To kill this job, run:
07/07/25 16:51:32 INFO streaming.StreamJob: 
/export/crawlspace/kryptonite/hadoop/mapred/current/bin/../bin/hadoop 
job  -Dmapred.job.tracker=kry1590:50264 -kill job_0006
07/07/25 16:51:32 INFO streaming.StreamJob: Tracking URL: 
http://kry1590.inktomisearch.com:56285/jobdetails.jsp?jobid=job_0006
07/07/25 16:51:33 INFO streaming.StreamJob:  map 0%  reduce 0%
07/07/25 16:51:34 INFO streaming.StreamJob:  map 100%  reduce 0%
07/07/25 16:51:40 INFO streaming.StreamJob:  map 100%  reduce 100%
07/07/25 16:51:40 INFO streaming.StreamJob: Job complete: job_0006
07/07/25 16:51:40 INFO streaming.StreamJob: Output: 
/user/parthas/qc/exp2/filterData/subLab/0
---------------

This does not.
----------------------
[hod] (parthas) >> stream -input "/user/parthas/test/tmp.data" -mapper 
"testcache.py abc def" -output 
"/user/parthas/qc/exp2/filterData/subLab/0" -file 
"/home/parthas/proj/qc/bin/testcache.py" -cacheFile 
'hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.qlab.head#abc' 
-cacheFile 
'hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.ulab.aa.head#def' 
-jobconf mapred.map.tasks=1 -jobconf 
mapred.job.name="SubByLabel-101-0.ulab.aa" -jobconf numReduceTasks=0
07/07/25 16:52:17 ERROR streaming.StreamJob: Unexpected 
hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.ulab.aa.head#def 
while processing 
-input|-output|-mapper|-combiner|-reducer|-file|-dfs|-jt|-additionalconfspec|-inputformat|-outputformat|-partitioner|-numReduceTasks|-inputreader|||-cacheFile|-cacheArchive|-verbose|-info|-debug|-inputtagged|-help


Usage: $HADOOP_HOME/bin/hadoop [--config dir] jar \
	 $HADOOP_HOME/hadoop-streaming.jar [options]
Options:
 -input    <path>     DFS input file(s) for the Map step
 -output   <path>     DFS output directory for the Reduce step
 -mapper   <cmd|JavaClassName>	    The streaming command to run
 -combiner <JavaClassName> Combiner has to be a Java class
 -reducer  <cmd|JavaClassName>	    The streaming command to run
 -file	   <file>     File/dir to be shipped in the Job jar file
 -dfs	 <h:p>|local  Optional. Override DFS configuration
 -jt	 <h:p>|local  Optional. Override JobTracker configuration
 -additionalconfspec specfile  Optional.
 -inputformat 
TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName 
Optional.
 -outputformat TextOutputFormat(default)|JavaClassName	Optional.
 -partitioner JavaClassName  Optional.
 -numReduceTasks <num>	Optional.
 -inputreader <spec>  Optional.
 -jobconf  <n>=<v>    Optional. Add or override a JobConf property
 -cmdenv   <n>=<v>    Optional. Pass env.var to streaming commands
 -cacheFile fileNameURI
 -cacheArchive fileNameURI
 -verbose

For more details about these options:
Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info


 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message