hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
Date Thu, 25 Feb 2010 00:13:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838107#action_12838107
] 

Hong Tang commented on MAPREDUCE-1221:
--------------------------------------

Chip in my 2 cents. I think the goal of this jira is to prevent a node from swapping (same
as in MAPREDUCE-257). In this regard, containing memory usage for individual tasks may not
be sufficient to protect the system from swapping - for instance, there could be foreign processes
or bugs in TT and DN that "eat up" ram on the same node and lead to swapping. Or there could
be faulty ram modules such that OS does not recognize them during the boot time and leads
to the actual amount of RAM less than configured. In my view, this idea is similar to the
OOM killer mechanism in Linux kernel and serves as a low-level protection against faults not
easily preventable from upper layers.

In terms of what tasks to shot down, when a node goes into swap, there is a high chance that
all tasks would fail (time out). Even worse, when TT is swapping, it may not even respond
to kill commands from JT. So killing some tasks and letting others proceed may still be better
than not killing. That said, I agree that we can fine tune the policy - e.g. avoid killing
the 4th task attempt, or bias tasks that have already done lots of work (input bytes consumed,
slot secs used).

> Kill tasks on a node if the free physical memory on that machine falls below a configured
threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds
a set of configured thresholds. I would like to extend this feature to enable killing tasks
if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory,
the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs
from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits
(HADOOP-5883) were designed to address this problem. This works well when most map-reduce
jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory
for each task. On the other hand, if each task forks off mappers/reducers written in other
languages (python/php, etc), the total virtual memory usage of the process-subtree varies
greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message