hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Drew <drewsk...@yahoo.com>
Subject hadoop streaming reducer values
Date Wed, 13 May 2009 02:55:13 GMT

Hi,

I have a question about the <key, values> that the reducer gets in Hadoop
Streaming.

I wrote a simple mapper.sh, reducer.sh script files:

mapper.sh : 

#!/bin/bash

while read data
do
  #tokenize the data and output the values <word, 1>
  echo $data | awk '{token=0; while(++token<=NF) print $token"\t1"}'
done

reducer.sh :

#!/bin/bash

while read data
do
  echo -e $data
done

The mapper tokenizes a line of input and outputs <word, 1> pairs to standard
output.  The reducer just outputs what it gets from standard input.

I have a simple input file:

cat in the hat
ate my mat the

I was expecting the final output to be something like:

the 1 1 1 
cat 1

etc.

but instead each word has its own line, which makes me think that
<key,value> is being given to the reducer and not <key, values> which is
default for normal Hadoop (in Java) right?

the 1
the 1
the 1
cat 1

Is there any way to get <key, values> for the reducer and not a bunch of
<key, value> pairs?  I looked into the -reducer aggregate option, but there
doesn't seem to be a way to customize what the reducer does with the <key,
values> other than max,min functions.

Thanks.
-- 
View this message in context: http://www.nabble.com/hadoop-streaming-reducer-values-tp23514523p23514523.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Mime
View raw message