hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sigurd Spieckermann <sigurd.spieckerm...@gmail.com>
Subject Re: Join-package combiner number of input and output records the same
Date Tue, 25 Sep 2012 09:57:55 GMT
I think I have tracked down the problem to the point that each split only
contains one big key-value pair and a combiner is connected to a map task.
Please correct me if I'm wrong, but I assume each map task takes one split
and the combiner operates only on the key-value pairs within one split.
That's why the combiner has no effect in my case. Is there a way to combine
the mapper outputs of multiple splits before they are sent off to the

2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>

> Maybe one more note: the combiner and the reducer class are the same and
> in the reduce-phase the values get aggregated correctly. Why is this not
> happening in the combiner-phase?
> 2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
>> Hi guys,
>> I'm experiencing a strange behavior when I use the Hadoop join-package.
>> After running a job the result statistics show that my combiner has an
>> input of 100 records and an output of 100 records. From the task I'm
>> running and the way it's implemented, I know that each key appears multiple
>> times and the values should be combinable before getting passed to the
>> reducer. I'm running my tests in pseudo-distributed mode with one or two
>> map tasks. From using the debugger, I noticed that each key-value pair is
>> processed by a combiner individually so there's actually no list passed
>> into the combiner that it could aggregate. Can anyone think of a reason
>> that causes this undesired behavior?
>> Thanks
>> Sigurd

View raw message