hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "FAQ" by SomeOtherAccount
Date Fri, 22 Oct 2010 15:31:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "FAQ" page has been changed by SomeOtherAccount.
http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=79&rev2=80

--------------------------------------------------

  hadoop job -kill JOBID
  }}}
  
+ == How do I limit the number of concurrent tasks my job may have running total at a time?
==
+ 
+ Typically when this question is asked, it is because a job is referencing something external
to Hadoop that has some sort of limit on it, such as reading or writing from a database. 
In Hadoop terms, we call this a 'side-effect'.
+ 
+ One of the general assumptions of the framework is that there are not any side-effects.
All tasks are expected to be restartable and a side-effect typically goes against the grain
of this rule.
+ 
+ If a task absolutely must break the rules, there are a few things one can do:
+ 
+ * Deploy ZooKeeper and use it as a persistent lock to keep track of how many tasks are running
concurrently
+ * Use a scheduler with a maximum task-per-queue feature and submit the job to that queue
+ 
+ == How do I limit the number of concurrent tasks my job may have running on a given node
at a time? ==
+ 
+ The CapacityScheduler in 0.21 has a feature whereby one may use RAM-per-task to limit how
many slots a given task takes.  By careful use of this feature, one may limit how many concurrent
tasks on a given node a job may take. 
+ 
  = HDFS =
  
  == If I add new DataNodes to the cluster will HDFS move the blocks to the newly added nodes
in order to balance disk space utilization between the nodes? ==

Mime
View raw message