Return-Path: Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: (qmail 93529 invoked from network); 3 Jul 2009 20:31:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Jul 2009 20:31:01 -0000 Received: (qmail 22889 invoked by uid 500); 3 Jul 2009 20:31:11 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 22852 invoked by uid 500); 3 Jul 2009 20:31:11 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 22842 invoked by uid 99); 3 Jul 2009 20:31:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jul 2009 20:31:11 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jul 2009 20:31:08 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 53000234C053 for ; Fri, 3 Jul 2009 13:30:47 -0700 (PDT) Message-ID: <1193262499.1246653047339.JavaMail.jira@brutus> Date: Fri, 3 Jul 2009 13:30:47 -0700 (PDT) From: "Matei Zaharia (JIRA)" To: common-issues@hadoop.apache.org Subject: [jira] Commented: (HADOOP-5170) Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727122#action_12727122 ] Matei Zaharia commented on HADOOP-5170: --------------------------------------- No, pools are persistent. You submit a job to a particular pool by setting a jobconf property (e.g. set pool.name="my_pool"). Then you'll be able to have caps on total maps or total reduces running for each pool. For example, you could limit your DB import pool to 20 mappers, and then *all* DB import jobs together will get no more than 10 mappers. I was planning to make the per-node limits be on a per job basis as in the current patch. However, for the per-job limits, it seemed to make more sense to let them apply across multiple jobs by placing them on a pool. > Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide > ---------------------------------------------------------------------------- > > Key: HADOOP-5170 > URL: https://issues.apache.org/jira/browse/HADOOP-5170 > Project: Hadoop Common > Issue Type: New Feature > Components: mapred > Reporter: Jonathan Gray > Assignee: Matei Zaharia > Fix For: 0.21.0 > > Attachments: HADOOP-5170-tasklimits-v3-0.18.3.patch, tasklimits-v2.patch, tasklimits-v3-0.19.patch, tasklimits-v3.patch, tasklimits-v4-20.patch, tasklimits-v4.patch, tasklimits.patch > > > There are a number of use cases for being able to do this. The focus of this jira should be on finding what would be the simplest to implement that would satisfy the most use cases. > This could be implemented as either a per-node maximum or a cluster-wide maximum. It seems that for most uses, the former is preferable however either would fulfill the requirements of this jira. > Some of the reasons for allowing this feature (mine and from others on list): > - I have some very large CPU-bound jobs. I am forced to keep the max map/node limit at 2 or 3 (on a 4 core node) so that I do not starve the Datanode and Regionserver. I have other jobs that are network latency bound and would like to be able to run high numbers of them concurrently on each node. Though I can thread some jobs, there are some use cases that are difficult to thread (scanning from hbase) and there's significant complexity added to the job rather than letting hadoop handle the concurrency. > - Poor assignment of tasks to nodes creates some situations where you have multiple reducers on a single node but other nodes that received none. A limit of 1 reducer per node for that job would prevent that from happening. (only works with per-node limit) > - Poor mans MR job virtualization. Since we can limit a jobs resources, this gives much more control in allocating and dividing up resources of a large cluster. (makes most sense w/ cluster-wide limit) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.