hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
Date Wed, 24 Feb 2010 22:39:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838039#action_12838039
] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

Ah, fair point - I missed that detail about rlimit, my bad.

----

bq. However, I think the goal of this patch is different - it's to let the jobs use however
much memory they want without declaring it in advance, but fix things when we do overcommit.

I'm trying to parse things chronologically. Please help me understand this.

The original description says: "virtual-memory is inconvenient in some cases, so we'll do
for physical-memory". Fair enough.

However, the current patch seems to reserve *some* physical memory for TaskTracker... and
is the plan to just whichever task is the _highest_ at a given instant?

If so, speaking from experience with HoD which had pretty much the same feature (albeit for
different reasons i.e. to protect the Linux Kernel), this is a bad idea. The problem is that
there is completely *no predictability*. Tasks get randomly shot down because *someone* tipped
things over the edge.

I'd rather see a simpler per-task limit which the user sets per-task - much like the virtual-memory-limit.

> Kill tasks on a node if the free physical memory on that machine falls below a configured
threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds
a set of configured thresholds. I would like to extend this feature to enable killing tasks
if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory,
the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs
from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits
(HADOOP-5883) were designed to address this problem. This works well when most map-reduce
jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory
for each task. On the other hand, if each task forks off mappers/reducers written in other
languages (python/php, etc), the total virtual memory usage of the process-subtree varies
greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message