hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sambit Tripathy <sambi...@gmail.com>
Subject Re: Will all the intermediate output with the same key go to the same reducer?
Date Fri, 21 Sep 2012 07:26:52 GMT
Hi,

Have you considered using an in-mapper combining pattern? i.e Inside your
Mapper object you can create a Map object holding the intermediate
key-values whose state is preserved across multiple calls of map method.
The values are emitted periodically only when certain threshold
reached(threshold = ratio between block size and memory consumed). You can
make use of a counter to check the number of key-value pairs has been
processed. You can substantially avoid the problem: "reducer to be the
bottleneck when there are large volume of intermediate output" as you have
already a lesser number of intermediate keys in-memory which are flushed on
a specific bucket size.


Thanks
Sambit Tripathy



On Thu, Sep 20, 2012 at 6:42 PM, Jason Yang <lin.yang.jason@gmail.com>wrote:

> Hi, all
>
> I have a question that whether all the intermediate output with the same
> key go to the same reducer or not?
>
> If it is, in case of only two keys are generated from mapper, but there
> are 3 reducer running in this job, what would happen?
>
> If not, how could I do some processing over the all data, like counting? I
> think some would suggest to set the number of reducer to 1, but I thought
> this would make the reducer to be the bottleneck when there are large
> volume of intermediate output, isn't it?
>
> --
> YANG, Lin
>
>

Mime
View raw message