Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <1857994575.1225094204417.JavaMail.jira@brutus>
Date: Mon, 27 Oct 2008 00:56:44 -0700 (PDT)
From: "Vivek Ratan (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Commented: (HADOOP-4523) Enhance how memory-intensive user
 tasks are handled
In-Reply-To: <1134360276.1225090124414.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642871#action_12642871 ] 

Vivek Ratan commented on HADOOP-4523:
-------------------------------------

HADOOP-3759 provides a configuration value, _mapred.tasktracker.tasks.maxmemory_, which specifies the total VM on a machine available to tasks spawned by the TT. Along with HADOOP-4439, it provides a cluster-wide default for the maximum VM associated per task, _mapred.task.default.maxmemory_. This value can be overridden by individual jobs. HADOOP-3581 implements a monitoring mechanism that kill tasks if they go over their _maxmemory_ value. Keeping all this in mind, here's a proposal for what we need to additionally do: 

If _tasks.maxmemory_ is set, the TT monitors the total memory usage of all tasks spawned by the TT. If this value goes over _tasks.maxmemory_, the TT needs to kill one or more tasks. It first looks for tasks whose individual memory is over their _default.maxmemory_ value. These are killed (while you may ideally want to kill just enough that your total memory usage comes down, it's not obvious which of these violators you choose to kill, so it's probably simpler to kill all). If no such task is found, or if killing one or more of these tasks still takes us over the memory limit, we need to pick other tasks to kill. There are many ways to do this. Probably the easiest is to kill tasks that ran most recently. 

Tasks that are killed because they went over their memory limit should be treated as failed, since they violated their contract. Tasks that are killed because the sum total of memory usage was over a limit should be treated as killed, since it's not really their fault. 

Another improvement is to let _mapred.tasktracker.tasks.maxmemory_ be set by an external script, which lets Ops control what this value should be. A slightly less desirable option, as indicated in some offline discussions with Alan W, is to set this value to be an absolute number ("hadoop may use X amount") or an offset of the total amount of memory on the machine ("hadoop may use all but  4g"). 

> Enhance how memory-intensive user tasks are handled
> ---------------------------------------------------
>
>                 Key: HADOOP-4523
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4523
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Vivek Ratan
>
> HADOOP-3581 monitors each Hadoop task to see if its memory usage (which includes usage of any tasks spawned by it and so on) is within a per-task limit. If the task's memory usage goes over its limit, the task is killed. This, by itself, is not enough to prevent badly behaving jobs from bringing down nodes. What is also needed is the ability to make sure that the sum total of VM usage of all Hadoop tasks does not exceed a certain limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.