hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn-Elmar Macek <ma...@cs.uni-kassel.de>
Subject Re: Join-package combiner number of input and output records the same
Date Tue, 25 Sep 2012 13:40:44 GMT
Ups, sorry. You are using standart implementations? I dont know whats happening then. Sorry.
But the fact, that your inputsize equals your outputsize in a "join" process reminded me too
much of my own problems. Sorry for confusion, i may have caused.

Am 25.09.2012 um 15:32 schrieb Björn-Elmar Macek <macek@cs.uni-kassel.de>:

> Hi,
> i had this problem once too. Did you properly overwrite the reduce method with the @override
> Does your reduce method use OutputCollector or Context for gathering outputs? If you
are using current version, it has to be Context.
> The thing is: if you do NOT override the standart reduce function (identity) is used
and this results ofc in the same number of tuples as you read as input.
> Good luck!
> Elmar
> Am 25.09.2012 um 11:57 schrieb Sigurd Spieckermann <sigurd.spieckermann@gmail.com>:
>> I think I have tracked down the problem to the point that each split only contains
one big key-value pair and a combiner is connected to a map task. Please correct me if I'm
wrong, but I assume each map task takes one split and the combiner operates only on the key-value
pairs within one split. That's why the combiner has no effect in my case. Is there a way to
combine the mapper outputs of multiple splits before they are sent off to the reducer?
>> 2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
>> Maybe one more note: the combiner and the reducer class are the same and in the reduce-phase
the values get aggregated correctly. Why is this not happening in the combiner-phase?
>> 2012/9/25 Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
>> Hi guys,
>> I'm experiencing a strange behavior when I use the Hadoop join-package. After running
a job the result statistics show that my combiner has an input of 100 records and an output
of 100 records. From the task I'm running and the way it's implemented, I know that each key
appears multiple times and the values should be combinable before getting passed to the reducer.
I'm running my tests in pseudo-distributed mode with one or two map tasks. From using the
debugger, I noticed that each key-value pair is processed by a combiner individually so there's
actually no list passed into the combiner that it could aggregate. Can anyone think of a reason
that causes this undesired behavior?
>> Thanks
>> Sigurd

View raw message