hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "LimitingTaskSlotUsage" by SomeOtherAccount
Date Fri, 05 Nov 2010 20:49:22 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "LimitingTaskSlotUsage" page has been changed by SomeOtherAccount.


  The CapacityScheduler in 0.21 has a feature whereby one may use RAM-per-task to limit how
many slots a given task takes.  By careful use of this feature, one may limit how many concurrent
tasks on a given node a job may take. 
+ = Increasing the Number of Slots Used =
+ There are both job and server-level tunables that impact how many tasks are run concurrently.
+ == Increase the amount of tasks per node ==
+ There are two server tunables that determine how many tasks a given TaskTracker will run
on a node:
+  * mapred.tasktracker.map.tasks.maximum sets the map slot usage
+  * mapred.tasktracker.reduce.tasks.maximum sets the reduce slot usage
+ These must be set in the mapred-site.xml file on the TaskTracker.  After making the change,
the TaskTracker must be restarted to see it.  One should see the values increase (or decrease)
on the JobTracker main page.  Note that this is '''not''' set by your job. 
+ == Increase the amount of map tasks ==
+ Typically, the amount of maps per job is determined by Hadoop based upon the InputFormat
and the block size in place.  Using mapred.min.split.size and mapred.max.split.size settings,
one can provide hints to the system that it should use a size that is different than the block
size to determine what the min and max input size should be.
+ == Increase the amount of reduce tasks ==
+ Currently, the number of reduces is determined by the job.  mapred.reduce.tasks should be
set by the job to the appropriate number of reduces.  When using Pig, use the PARALLEL keyword.

View raw message