hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn-Elmar Macek <ma...@cs.uni-kassel.de>
Subject Re: Join-package combiner number of input and output records the same
Date Tue, 25 Sep 2012 13:32:30 GMT

i had this problem once too. Did you properly overwrite the reduce method with the @override
Does your reduce method use OutputCollector or Context for gathering outputs? If you are using
current version, it has to be Context.

The thing is: if you do NOT override the standart reduce function (identity) is used and this
results ofc in the same number of tuples as you read as input.

Good luck!

Am 25.09.2012 um 11:57 schrieb Sigurd Spieckermann <sigurd.spieckermann@gmail.com>:

> I think I have tracked down the problem to the point that each split only contains one
big key-value pair and a combiner is connected to a map task. Please correct me if I'm wrong,
but I assume each map task takes one split and the combiner operates only on the key-value
pairs within one split. That's why the combiner has no effect in my case. Is there a way to
combine the mapper outputs of multiple splits before they are sent off to the reducer?
> 2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
> Maybe one more note: the combiner and the reducer class are the same and in the reduce-phase
the values get aggregated correctly. Why is this not happening in the combiner-phase?
> 2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
> Hi guys,
> I'm experiencing a strange behavior when I use the Hadoop join-package. After running
a job the result statistics show that my combiner has an input of 100 records and an output
of 100 records. From the task I'm running and the way it's implemented, I know that each key
appears multiple times and the values should be combinable before getting passed to the reducer.
I'm running my tests in pseudo-distributed mode with one or two map tasks. From using the
debugger, I noticed that each key-value pair is processed by a combiner individually so there's
actually no list passed into the combiner that it could aggregate. Can anyone think of a reason
that causes this undesired behavior?
> Thanks
> Sigurd

View raw message