hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathias Herberts <mathias.herbe...@gmail.com>
Subject Combiners
Date Sat, 29 Oct 2011 10:52:45 GMT

I'm designing a 'Hadoop MapReduce Poster', putting all pieces together
so people will easily be able to visualize the full M/R flow.

Concerning the combiners, I have a few points I'd like to have clarified.

If I'm not mistaken, the output of the Mapper is passed to the
Partitioner which will dispatch K,V into R partitions.

<K,V> for each partition then go through the set SortComparatorClass
for sorting.

If there is a combiner, the sorted output is grouped using the
SortComparatorClass (and not the GroupingComparatorClass as it's the
case in the Reducer) and passed to the combiner prior to be written to
the partition file.

My question is, what happens if the combiner outputs different keys
than what it is being fed? The output of the combiner will suffer two

1. It won't be sorted
2. It might end up in the wrong partition

Since a Combiner is simply a Reducer with no other constraints,
nothing seems to prevent those 2 problems.

Is my understanding correct?


View raw message