Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 87758 invoked from network); 27 Oct 2008 07:57:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Oct 2008 07:57:41 -0000 Received: (qmail 82990 invoked by uid 500); 27 Oct 2008 07:57:39 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 82954 invoked by uid 500); 27 Oct 2008 07:57:39 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 82943 invoked by uid 99); 27 Oct 2008 07:57:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Oct 2008 00:57:39 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Oct 2008 07:56:34 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 664A2234C231 for ; Mon, 27 Oct 2008 00:56:44 -0700 (PDT) Message-ID: <1857994575.1225094204417.JavaMail.jira@brutus> Date: Mon, 27 Oct 2008 00:56:44 -0700 (PDT) From: "Vivek Ratan (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4523) Enhance how memory-intensive user tasks are handled In-Reply-To: <1134360276.1225090124414.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642871#action_12642871 ] Vivek Ratan commented on HADOOP-4523: ------------------------------------- HADOOP-3759 provides a configuration value, _mapred.tasktracker.tasks.maxmemory_, which specifies the total VM on a machine available to tasks spawned by the TT. Along with HADOOP-4439, it provides a cluster-wide default for the maximum VM associated per task, _mapred.task.default.maxmemory_. This value can be overridden by individual jobs. HADOOP-3581 implements a monitoring mechanism that kill tasks if they go over their _maxmemory_ value. Keeping all this in mind, here's a proposal for what we need to additionally do: If _tasks.maxmemory_ is set, the TT monitors the total memory usage of all tasks spawned by the TT. If this value goes over _tasks.maxmemory_, the TT needs to kill one or more tasks. It first looks for tasks whose individual memory is over their _default.maxmemory_ value. These are killed (while you may ideally want to kill just enough that your total memory usage comes down, it's not obvious which of these violators you choose to kill, so it's probably simpler to kill all). If no such task is found, or if killing one or more of these tasks still takes us over the memory limit, we need to pick other tasks to kill. There are many ways to do this. Probably the easiest is to kill tasks that ran most recently. Tasks that are killed because they went over their memory limit should be treated as failed, since they violated their contract. Tasks that are killed because the sum total of memory usage was over a limit should be treated as killed, since it's not really their fault. Another improvement is to let _mapred.tasktracker.tasks.maxmemory_ be set by an external script, which lets Ops control what this value should be. A slightly less desirable option, as indicated in some offline discussions with Alan W, is to set this value to be an absolute number ("hadoop may use X amount") or an offset of the total amount of memory on the machine ("hadoop may use all but 4g"). > Enhance how memory-intensive user tasks are handled > --------------------------------------------------- > > Key: HADOOP-4523 > URL: https://issues.apache.org/jira/browse/HADOOP-4523 > Project: Hadoop Core > Issue Type: Improvement > Reporter: Vivek Ratan > > HADOOP-3581 monitors each Hadoop task to see if its memory usage (which includes usage of any tasks spawned by it and so on) is within a per-task limit. If the task's memory usage goes over its limit, the task is killed. This, by itself, is not enough to prevent badly behaving jobs from bringing down nodes. What is also needed is the ability to make sure that the sum total of VM usage of all Hadoop tasks does not exceed a certain limit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.