hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@thoughtworks.com>
Subject Re: Will all the intermediate output with the same key go to the same reducer?
Date Thu, 20 Sep 2012 13:28:08 GMT
Hi,

Yes. By contract, all intermediate output with the same key goes to
the same reducer.

In your example, suppose of the two keys generated from the mapper,
one key goes to reducer 1 and the second goes to reducer 2, reducer 3
will not have any records to process and end without producing any
output.

If the intermediate key space is very large, 1 reducer would certainly
be a bottleneck, as you rightly note. Hence, configuring the right
number of reducers would be certainly important.

Thanks
hemanth

On 9/20/12, Jason Yang <lin.yang.jason@gmail.com> wrote:
> Hi, all
>
> I have a question that whether all the intermediate output with the same
> key go to the same reducer or not?
>
> If it is, in case of only two keys are generated from mapper, but there are
> 3 reducer running in this job, what would happen?
>
> If not, how could I do some processing over the all data, like counting? I
> think some would suggest to set the number of reducer to 1, but I thought
> this would make the reducer to be the bottleneck when there are large
> volume of intermediate output, isn't it?
>
> --
> YANG, Lin
>

Mime
View raw message