hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From venkata subbarayudu <avsrit2...@gmail.com>
Subject Re: Hadoop streaming command : -file option to pass a directory to jobcache
Date Thu, 18 Mar 2010 12:50:29 GMT
Hi Amareshwari,
          Thanks for your quick reply, I am not sure (don't know) whether
"-archives" option can be used with the streaming command using the jar
option. Can you please give the command on how it can be used, as in

  bin/hadoop jar hadoop-0.20.0-streaming.jar -archives /user/test/src.har
-mapper pkg2Cls.py -jobconf mapred.map.tasks=5 -jobconf
mapred.reduce.tasks=0 -input /usr/test/linecount  -output linecountresults
-file pkg2Cls.py -file pkg1Cls.py


On Thu, Mar 18, 2010 at 1:54 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

>  You can archive/zip the directory and pass it.
>  You might have to unarchive it yourself if you use –file option. You can
> use –archives option which will unarchive it for you.
> Please see
> http://hadoop.apache.org/common/docs/r0.20.0/commands_manual.html#Generic+Optionsfor
more details.
>
> -Amareshwari
>
>
> On 3/18/10 11:23 AM, "venkata subbarayudu" <avsrit2005@gmail.com> wrote:
>
> Hi All,
>        I am new to hadoop and is using Python to write MapReduce tasks. In
> order to execute the streaming command I am using the following command.
>
> bin/hadoop jar hadoop-0.20.0-streaming.jar -mapper pkg2Cls.py -jobconf
> mapred.map.tasks=5 -jobconf mapred.reduce.tasks=0 -input
> /usr/test/linecount  -output linecountresults -file pkg2Cls.py -file
> pkg1Cls.py
>
> which is working fine. But now I want to pass the the entire directory of
> my python files to the "-file option", instead of passing each file using
> the -file option.
>
> how can I do this.
>
>
> Thanks for your help in advance.
> Subbarayudu Amanchi.
>
>

Mime
View raw message