hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "LimitingTaskSlotUsage" by SomeOtherAccount
Date Fri, 22 Oct 2010 15:47:10 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "LimitingTaskSlotUsage" page has been changed by SomeOtherAccount.
http://wiki.apache.org/hadoop/LimitingTaskSlotUsage

--------------------------------------------------

New page:


There are many reasons why one wants to limit the number of running tasks. 

* Job is consuming all task slots

The most common reason is because a given job is consuming all of the available task slots,
preventing other jobs from running.   The easiest and best solution is to switch from the
default FIFO scheduler to another scheduler, such as the FairShareScheduler or the CapacityScheduler.
 Both support job tasks limit.

* Job has taken too many reduce slots that are still waiting for maps to finish

There is a job tunable called mapred.reduce.slowstart.completed.maps that sets the percentage
of maps that must be completed before firing off reduce tasks.  By default, this is set to
5% (0.05) which for most shared clusters is likely too low.  Recommended values are closer
to 80% or higher (0.80).  Note that for jobs that have a significant amount of intermediate
data, setting this value higher will cause reduce slots to take more time fetching that data
before performing work.

* Job is referencing an external, limited resource (such as a database)

In Hadoop terms, we call this a 'side-effect'.

One of the general assumptions of the framework is that there are not any side-effects. All
tasks are expected to be restartable and a side-effect typically goes against the grain of
this rule.

If a task absolutely must break the rules, there are a few things one can do:

** Deploy ZooKeeper and use it as a persistent lock to keep track of how many tasks are running
concurrently
** Use a scheduler with a maximum task-per-queue feature and submit the job to that queue

* Job consumes too much RAM/disk IO/etc on a given node

The CapacityScheduler in 0.21 has a feature whereby one may use RAM-per-task to limit how
many slots a given task takes.  By careful use of this feature, one may limit how many concurrent
tasks on a given node a job may take. 

Mime
View raw message