hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sri Ramadasu <amar...@yahoo-inc.com>
Subject Re: Hadoop streaming command : -file option to pass a directory to jobcache
Date Thu, 18 Mar 2010 08:24:35 GMT
You can archive/zip the directory and pass it.
 You might have to unarchive it yourself if you use -file option. You can use -archives option
which will unarchive it for you.
Please see http://hadoop.apache.org/common/docs/r0.20.0/commands_manual.html#Generic+Options
for more details.


On 3/18/10 11:23 AM, "venkata subbarayudu" <avsrit2005@gmail.com> wrote:

Hi All,
       I am new to hadoop and is using Python to write MapReduce tasks. In order to execute
the streaming command I am using the following command.

bin/hadoop jar hadoop-0.20.0-streaming.jar -mapper pkg2Cls.py -jobconf mapred.map.tasks=5
-jobconf mapred.reduce.tasks=0 -input /usr/test/linecount  -output linecountresults -file
pkg2Cls.py -file pkg1Cls.py

which is working fine. But now I want to pass the the entire directory of my python files
to the "-file option", instead of passing each file using the -file option.

how can I do this.

Thanks for your help in advance.
Subbarayudu Amanchi.

View raw message