hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Hadoop "remembering" old mapred.map.tasks
Date Mon, 21 Apr 2008 00:14:43 GMT

Does Hadoop cache settings set in hadoop-*xml between runs?
I'm using Hadoop 0.16.2 and have initially set the number of map and reduce tasks to 8 of
each.  After running a number of jobs I wanted to increase that number (to 23 maps and 11
reduces), so I changed the mapred.map.tasks and mapred.reduce.tasks properties in hadoop-site.xml.
 I then stopped everything (stop-all.sh) and copied my modified hadoop-site.xml to all nodes
in the cluster.  I also rebuilt the .job file and pushed that out to all nodes, too.

However, when I start everything up again I *still* see Map Task Capacity is equal to 8, and
the same for Reduce Task Capacity.
Am I supposed to do something in addition to the above to make Hadoop "forget" my old settings?
 I can't find *any* references to mapred.map.tasks in any of the Hadoop files except for my
hadoop-site.xml, so I can't figure out why Hadoop is still stuck on 8.

Although the max capacity is set to 8, when I run my jobs now I *do* see that they get broken
up into 23 maps and 11 reduces (it was 8 before), but only 8 of them run in parallel.  There
are 4 dual-code machines in the cluster for a total of 8 cores.  Is Hadoop able to figure
this out and that is why it runs only 8 tasks in parallel, despite my higher settings?


View raw message