hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@thoughtworks.com>
Subject Re: Will all the intermediate output with the same key go to the same reducer?
Date Thu, 20 Sep 2012 13:28:08 GMT

Yes. By contract, all intermediate output with the same key goes to
the same reducer.

In your example, suppose of the two keys generated from the mapper,
one key goes to reducer 1 and the second goes to reducer 2, reducer 3
will not have any records to process and end without producing any

If the intermediate key space is very large, 1 reducer would certainly
be a bottleneck, as you rightly note. Hence, configuring the right
number of reducers would be certainly important.


On 9/20/12, Jason Yang <lin.yang.jason@gmail.com> wrote:
> Hi, all
> I have a question that whether all the intermediate output with the same
> key go to the same reducer or not?
> If it is, in case of only two keys are generated from mapper, but there are
> 3 reducer running in this job, what would happen?
> If not, how could I do some processing over the all data, like counting? I
> think some would suggest to set the number of reducer to 1, but I thought
> this would make the reducer to be the bottleneck when there are large
> volume of intermediate output, isn't it?
> --
> YANG, Lin

View raw message