hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon" <yar...@yahoo-inc.com>
Subject RE: HadoopStreaming
Date Wed, 18 Oct 2006 18:44:09 GMT
You may specify multiple -input "<wildcard>" statements.
Take care to quote the wildcard part to prevent your local shell from
parsing it.
You may specify any property you like using -jobconf. Common uses are
mapred.map.tasks and mapred.reduce.tasks to override the defaults for number
of maps and reduces, but anything is allowed.
Another useful argument is -cmdenv <key>=<value> to override environment
variables. A common use is to ship a dynamic library and set LD_LIBRARY_PATH
to '.', but override any variable your program expects.

Yoram

> -----Original Message-----
> From: Hairong Kuang [mailto:hairong@yahoo-inc.com] 
> Sent: Tuesday, October 17, 2006 7:11 PM
> To: hadoop-user@lucene.apache.org
> Subject: RE: HadoopStreaming
> 
> Hadoop streaming assumes that inputs are files. If kjv is a 
> directory, you
> may use the option "-input kjv/*".
> 
> Hairong
> 
> -----Original Message-----
> From: Andrew McNabb [mailto:amcnabb@mcnabbs.org] 
> Sent: Tuesday, October 17, 2006 6:22 PM
> To: hadoop-user@lucene.apache.org
> Subject: Re: HadoopStreaming
> 
> On Tue, Oct 17, 2006 at 03:52:59PM -0700, Yoram Arnon wrote:
> > Try changing your command to read
> > 
> > hadoop-streaming \
> > -mapper "/usr/bin/python mapper.py" \
> > -file "/home/amcnabb/svn/mrpso/python/mapper.py" \ -reducer 
> > "/usr/bin/python reducer.py" \ -file 
> > "/home/amcnabb/svn/mrpso/python/reducer.py"  \ -input kjv \ -output 
> > kjvout
> 
> I'll try this first thing in the morning.
> 
> > I assume kjv is a file and kjvout is a directory - they should be.
> 
> Actually, I was doing it the same way as other Hadoop stuff I've done:
> kjv is a directory in DFS.  Does HadoopStreaming do it in a 
> different way
> from most other Hadoop stuff?
> 
> In any case, how do I make it take a directory as input if 
> that's what I
> need?
> 
> > I also assume /usr/bin/python is the path to python *on the cluster 
> > machines*. Otherwise, you can do -mapper "python mapper.py" -file 
> > /usr/bin/python -file /home/amcnabb/svn/mrpso/python/mapper.py
> 
> > I recommend adding -jobconf mapred.job.name="kjv", to make the 
> > jobtracker history more readable.
> > 
> 
> I didn't know about that option.  I'll do that.
> 
> Thanks for all of the tips.
> 
> --
> Andrew McNabb
> http://www.mcnabbs.org/andrew/
> PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868
> 
> 


Mime
View raw message