hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang" <hair...@yahoo-inc.com>
Subject RE: HadoopStreaming
Date Wed, 18 Oct 2006 02:10:44 GMT
Hadoop streaming assumes that inputs are files. If kjv is a directory, you
may use the option "-input kjv/*".


-----Original Message-----
From: Andrew McNabb [mailto:amcnabb@mcnabbs.org] 
Sent: Tuesday, October 17, 2006 6:22 PM
To: hadoop-user@lucene.apache.org
Subject: Re: HadoopStreaming

On Tue, Oct 17, 2006 at 03:52:59PM -0700, Yoram Arnon wrote:
> Try changing your command to read
> hadoop-streaming \
> -mapper "/usr/bin/python mapper.py" \
> -file "/home/amcnabb/svn/mrpso/python/mapper.py" \ -reducer 
> "/usr/bin/python reducer.py" \ -file 
> "/home/amcnabb/svn/mrpso/python/reducer.py"  \ -input kjv \ -output 
> kjvout

I'll try this first thing in the morning.

> I assume kjv is a file and kjvout is a directory - they should be.

Actually, I was doing it the same way as other Hadoop stuff I've done:
kjv is a directory in DFS.  Does HadoopStreaming do it in a different way
from most other Hadoop stuff?

In any case, how do I make it take a directory as input if that's what I

> I also assume /usr/bin/python is the path to python *on the cluster 
> machines*. Otherwise, you can do -mapper "python mapper.py" -file 
> /usr/bin/python -file /home/amcnabb/svn/mrpso/python/mapper.py

> I recommend adding -jobconf mapred.job.name="kjv", to make the 
> jobtracker history more readable.

I didn't know about that option.  I'll do that.

Thanks for all of the tips.

Andrew McNabb
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

View raw message