hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sri Ramadasu <amar...@yahoo-inc.com>
Subject Re: Hadoop Streaming
Date Wed, 14 Jul 2010 08:06:15 GMT
In streaming, the combined values are given to reducer as <key, value> pairs again, so
you don't see key and list of values.
I think it is done in that way to be symmetrical with mapper, though I don't know exact reason.


On 7/14/10 1:05 PM, "Moritz Krog" <moritzkrog@googlemail.com> wrote:

Hi everyone,

I'm pretty new to Hadoop and generally avoiding Java everywhere I can, so
I'm getting started with Hadoop streaming and python mapper and reducer.
>From what I read in the mapreduce tutorial, mapper an reducer can be plugged
into Hadoop via the "-mapper" and "-reducer" options on job start. I was
wondering what the input for the reducer would look like, so I ran a Hadoop
job using my own mapper but /bin/cat as reducer. As you can see, the output
of the job is ordered, but the keys haven't been combined:

{'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
'person'}   107488
{'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
'person'}   95560
{'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
'person'}   95562

I would have expected something like:

{'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
'person'}   95560, 95562, 107488

my understanding from the tutorial was, that this reduction is a part of the
shuffle and sort phase. Or do I need to use a combiner to get that done?
Does Hadoop streaming even do this, or do I need to use a native java class?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message