hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: hadoop streaming reducer values
Date Wed, 13 May 2009 14:03:56 GMT
You may wish to set the separator to the string comma space ', ' for your
example.
chapter 7 of my book goes into this in some detail, and I posted a graphic
that visually depicts the process and the values
about a month ago.
The original post was titled 'Changing key/value separator in hadoop
streaming'
and I have attached the graphic.


On Tue, May 12, 2009 at 7:55 PM, Alan Drew <drewsky77@yahoo.com> wrote:

>
> Hi,
>
> I have a question about the <key, values> that the reducer gets in Hadoop
> Streaming.
>
> I wrote a simple mapper.sh, reducer.sh script files:
>
> mapper.sh :
>
> #!/bin/bash
>
> while read data
> do
>  #tokenize the data and output the values <word, 1>
>  echo $data | awk '{token=0; while(++token<=NF) print $token"\t1"}'
> done
>
> reducer.sh :
>
> #!/bin/bash
>
> while read data
> do
>  echo -e $data
> done
>
> The mapper tokenizes a line of input and outputs <word, 1> pairs to
> standard
> output.  The reducer just outputs what it gets from standard input.
>
> I have a simple input file:
>
> cat in the hat
> ate my mat the
>
> I was expecting the final output to be something like:
>
> the 1 1 1
> cat 1
>
> etc.
>
> but instead each word has its own line, which makes me think that
> <key,value> is being given to the reducer and not <key, values> which is
> default for normal Hadoop (in Java) right?
>
> the 1
> the 1
> the 1
> cat 1
>
> Is there any way to get <key, values> for the reducer and not a bunch of
> <key, value> pairs?  I looked into the -reducer aggregate option, but there
> doesn't seem to be a way to customize what the reducer does with the <key,
> values> other than max,min functions.
>
> Thanks.
> --
> View this message in context:
> http://www.nabble.com/hadoop-streaming-reducer-values-tp23514523p23514523.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message