hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: Join-package combiner number of input and output records the same
Date Tue, 25 Sep 2012 13:37:59 GMT
Out of curiosity : did you change the partitioner or the comparators? And
how did you implement the equals and hash code methods of your objects



On Tue, Sep 25, 2012 at 3:32 PM, Björn-Elmar Macek

> Hi,
> i had this problem once too. Did you properly overwrite the reduce method
> with the @override annotation?
> Does your reduce method use OutputCollector or Context for gathering
> outputs? If you are using current version, it has to be Context.
> The thing is: if you do NOT override the standart reduce function
> (identity) is used and this results ofc in the same number of tuples as you
> read as input.
> Good luck!
> Elmar
> Am 25.09.2012 um 11:57 schrieb Sigurd Spieckermann <
> sigurd.spieckermann@gmail.com>:
> I think I have tracked down the problem to the point that each split only
> contains one big key-value pair and a combiner is connected to a map task.
> Please correct me if I'm wrong, but I assume each map task takes one split
> and the combiner operates only on the key-value pairs within one split.
> That's why the combiner has no effect in my case. Is there a way to combine
> the mapper outputs of multiple splits before they are sent off to the
> reducer?
> 2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
>> Maybe one more note: the combiner and the reducer class are the same and
>> in the reduce-phase the values get aggregated correctly. Why is this not
>> happening in the combiner-phase?
>> 2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
>>> Hi guys,
>>> I'm experiencing a strange behavior when I use the Hadoop join-package.
>>> After running a job the result statistics show that my combiner has an
>>> input of 100 records and an output of 100 records. From the task I'm
>>> running and the way it's implemented, I know that each key appears multiple
>>> times and the values should be combinable before getting passed to the
>>> reducer. I'm running my tests in pseudo-distributed mode with one or two
>>> map tasks. From using the debugger, I noticed that each key-value pair is
>>> processed by a combiner individually so there's actually no list passed
>>> into the combiner that it could aggregate. Can anyone think of a reason
>>> that causes this undesired behavior?
>>> Thanks
>>> Sigurd

Bertrand Dechoux

View raw message