hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3581) Prevent memory intensive user tasks from taking down nodes
Date Thu, 03 Jul 2008 06:00:52 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated HADOOP-3581:
--------------------------------------------

    Attachment: patch_3581_0.1.txt

Attaching a first patch, so that it helps in moving the discussion forward. The patch is still
raw and needs a good deal of work up. Much of it is just a proof of concept; enough abstraction
is set in place so that actual implementation can be changed easily.

At present,
- the process tracker works only on linux, uses proc file system and the process directories
inside.
- uses mapred.child.ulimit to limit the *total* vmem usage of all the tasks' process trees.
- once it detects that the *total* vmem usage of all tasks has crossed over the specified
limit, it calls findOOMTaskstoKill to find tasks to be killed.
- findOOMTaskstoKill returns the list of tasks to be killed. Currently it returns only one
task, the one with the highest memory usage.
- after getting the list of tasks to be killed, it kills each of the corresponding process
trees by issuing individual 'kill <pid>' commands (SIGTERM).

Need thought/TODO:
- Introduce separate configuration properties for usage of map tasks and reduce tasks? Knock
out previous usage of mapred.child.ulimit and its corresponding usage to set ulimits?
-  May want to monitor if the kill went through or not, and then issue a subsequent SIGKILL
as needed. Kill mechanism might totally change if we wish to start the tasks using job control.
- May want to refactor the code a bit and merge killOOMTasks with killOverflowingTasks. Later
move all of this together to a single place when HADOOP-3675 goes in.
- Lot of code paths are not synchronized yet, so might result in threading errors/race conditions.

We still need decision as to whether we want to 1) limit aggregate usage over all tasks' process
trees or 2) limit usage per task's process tree. Believe that both of these can be implemented
with the framework setup in current patch.

> Prevent memory intensive user tasks from taking down nodes
> ----------------------------------------------------------
>
>                 Key: HADOOP-3581
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3581
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: patch_3581_0.1.txt
>
>
> Sometimes user Map/Reduce applications can get extremely memory intensive, maybe due
to some inadvertent bugs in the user code, or the amount of data processed. When this happens,
the user tasks start to interfere with the proper execution of other processes on the node,
including other Hadoop daemons like the DataNode and TaskTracker. Thus, the node would become
unusable for any Hadoop tasks. There should be a way to prevent such tasks from bringing down
the node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message