hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
Date Thu, 25 Feb 2010 23:42:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838610#action_12838610
] 

dhruba borthakur commented on MAPREDUCE-1221:
---------------------------------------------

Please allow me to present my use case.

I have users submitting their own jobs to the cluster. These jobs and neither audited nor
vetted by any authority before being deployed on the cluster. The mappers for most of these
jobs are written in python or php. In these languages, it is easy for code writers to mistakenly
use  excessive amounts of memory (via a python dictionary or some such thing). We have seen
about 1 such case per month in our cluster. The thing to note that in all 100% of these jobs,
the user had a coding error that erroneously kept on inserting elements to his/her dictionary.
These are not "valid" jobs, and are usually killed by the user when he/she realises his/her
coding mistake.

The problem we are encountering is that when such a job is let loose in our cluster, many
tasks start eating lots of memory, thus causing excessive swapping and finally makes the OS
on those hang. This JIRA attempts to prevent this scenario. Once properly configured, this
JIRA will make it really really hard for a user job to be able to bring down nodes in the
Hadoop cluster. This JIRA increases the stability and uptime of our cluster to a great extent.
I would request all concerned authorities to review this JIRA from this perspective.





> Kill tasks on a node if the free physical memory on that machine falls below a configured
threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds
a set of configured thresholds. I would like to extend this feature to enable killing tasks
if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory,
the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs
from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits
(HADOOP-5883) were designed to address this problem. This works well when most map-reduce
jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory
for each task. On the other hand, if each task forks off mappers/reducers written in other
languages (python/php, etc), the total virtual memory usage of the process-subtree varies
greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message