hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sigurd Spieckermann <sigurd.spieckerm...@gmail.com>
Subject Re: Join-package combiner number of input and output records the same
Date Tue, 25 Sep 2012 08:38:42 GMT
Maybe one more note: the combiner and the reducer class are the same and in
the reduce-phase the values get aggregated correctly. Why is this not
happening in the combiner-phase?

2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>

> Hi guys,
>
> I'm experiencing a strange behavior when I use the Hadoop join-package.
> After running a job the result statistics show that my combiner has an
> input of 100 records and an output of 100 records. From the task I'm
> running and the way it's implemented, I know that each key appears multiple
> times and the values should be combinable before getting passed to the
> reducer. I'm running my tests in pseudo-distributed mode with one or two
> map tasks. From using the debugger, I noticed that each key-value pair is
> processed by a combiner individually so there's actually no list passed
> into the combiner that it could aggregate. Can anyone think of a reason
> that causes this undesired behavior?
>
> Thanks
> Sigurd
>

Mime
View raw message