hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "HowManyMapsAndReduces" by JackHebert
Date Tue, 01 May 2007 18:48:31 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JackHebert:

The comment on the change is:
Added comment about how to programmicly set the number of map tasks.

  Actually controlling the number of maps is subtle. The mapred.map.tasks parameter is just
a hint to the !InputFormat for the number of maps. The default !InputFormat behavior is to
split the total number of bytes into the right number of fragments. However, the DFS block
size of the input files is treated as an upper bound for input splits. A lower bound on the
split size can be set via mapred.min.split.size. Thus, if you expect 10TB of input data and
have 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger.
+ The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(int
num). This can be used to increase the number of map tasks, but will not set the number below
that which Hadoop determines via splitting the input data.
  == Number of Reduces ==
  The right number of reduces seems to be between 1.0 to 1.75 * (nodes * mapred.tasktracker.tasks.maximum).
At 1.0 all of the reduces can launch immediately and start transfering map outputs as the
maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch
a second round of reduces doing a much better job of load balancing.
@@ -19, +21 @@

  The number of reduces also controls the number of output files in the output directory,
but usually that is not important because the next map/reduce step will split them into even
smaller splits for the maps.
+ The number of reduce tasks can also be increased in the same way as the map tasks, via JobConf's
conf.setNumReduceTasks(int num).

View raw message