hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Ideas for dynamic change reducer task number ?
Date Sun, 22 Nov 2009 15:42:25 GMT
Hi all,



During my work, I often run the same map reduce jobs on different size of
data set. The mapper task number can change automatically according the
input data set. But I have to set different reducer number according
different data set size.

But I do not want to be bothered to do that, it is not convenient for users
in my opinions. And it will also harm the system’s automation(because in an
automation system we can not predict the size of input data set). So I think
hadoop should have a more intelligent mechanism to control the reducer
number according the input data.

Here I suggest to add an new interface named ReduceNumManager which has a
method getReduceNum(InputFormat inputFormat)  the code snippet is as
following (the interface needs to be refined):



*public** **interface** ReduceNumManager {***

* *

*    **int** getReduceNum(InputFormat inputFormat);***

*}***



And users can set this class in JobConf by JobConf.setReduceNumMamanger.
And the JobClient use this class to determine the reduce number.

e.g. if the InputFormat is the FileInputFormat, then we can have a
FileReduceNumManager which implements this interface, and this class compute
the reducer number according the size of input file.



I think this work will benefit users and Pig and Hive(not sure) will also
benefit from this, because it is not convenient for users to set different
reduce numbers each time using the same script but for different size of
data set.

If we provide such a mechanism , they only need to provide their customized
implementation.



This is my initial idea, looking forward to hear from experts’ feedback.



Thank you



Jeff Zhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message