hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Moritz Krog <moritzk...@googlemail.com>
Subject Re: Hadoop Streaming
Date Wed, 14 Jul 2010 08:17:06 GMT
First of all thanks  for the quick answer :)

is there any way to configure the job in such a way, that I get the key ->
value list? I specifically need exactly this behavior.. it's crucial to what
I want to do with Hadoop..

On Wed, Jul 14, 2010 at 10:06 AM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

> In streaming, the combined values are given to reducer as <key, value>
> pairs again, so you don't see key and list of values.
> I think it is done in that way to be symmetrical with mapper, though I
> don't know exact reason.
> Thanks
> Amareshwari
> On 7/14/10 1:05 PM, "Moritz Krog" <moritzkrog@googlemail.com> wrote:
> Hi everyone,
> I'm pretty new to Hadoop and generally avoiding Java everywhere I can, so
> I'm getting started with Hadoop streaming and python mapper and reducer.
> From what I read in the mapreduce tutorial, mapper an reducer can be
> plugged
> into Hadoop via the "-mapper" and "-reducer" options on job start. I was
> wondering what the input for the reducer would look like, so I ran a Hadoop
> job using my own mapper but /bin/cat as reducer. As you can see, the output
> of the job is ordered, but the keys haven't been combined:
> {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> 'person'}   107488
> {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> 'person'}   95560
> {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> 'person'}   95562
> I would have expected something like:
> {'lastname': 'Adhikari', 'firstnames': 'P', 'suffix': None, 'type':
> 'person'}   95560, 95562, 107488
> my understanding from the tutorial was, that this reduction is a part of
> the
> shuffle and sort phase. Or do I need to use a combiner to get that done?
> Does Hadoop streaming even do this, or do I need to use a native java
> class?
> Best,
> Moritz

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message