hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sigurd Spieckermann <sigurd.spieckerm...@gmail.com>
Subject Join-package combiner number of input and output records the same
Date Tue, 25 Sep 2012 08:32:50 GMT
Hi guys,

I'm experiencing a strange behavior when I use the Hadoop join-package.
After running a job the result statistics show that my combiner has an
input of 100 records and an output of 100 records. From the task I'm
running and the way it's implemented, I know that each key appears multiple
times and the values should be combinable before getting passed to the
reducer. I'm running my tests in pseudo-distributed mode with one or two
map tasks. From using the debugger, I noticed that each key-value pair is
processed by a combiner individually so there's actually no list passed
into the combiner that it could aggregate. Can anyone think of a reason
that causes this undesired behavior?


View raw message