hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1119) When tasks fail to report status, show tasks's stack dump before killing
Date Sat, 28 Nov 2009 06:55:20 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Kimball updated MAPREDUCE-1119:
-------------------------------------

    Attachment: MAPREDUCE-1119.6.patch

Attaching a new patch. This includes the above code review suggestions. The {{DefaultTaskController}}
is tested by adding a subclass of TaskController for testing; this counts the number of times
a {{dumpTaskStack()}} call is made, and ensures that it is incremented only during the appropriate
jobs. The same strategy is employed for testing {{LinuxTaskController}}; {{ClusterWithLinuxTaskController.MyLinuxTaskController}}
now counts SIGQUIT calls as well as any exceptional exit statuses from {{task-controller}}
when administering the SIGQUIT to the client. Also improved {{ClusterWithLinuxTaskController}}'s
documentation as regards setting up the testcase a bit.

All of these tests pass on my local machine. 

> When tasks fail to report status, show tasks's stack dump before killing
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1119
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.3.patch, MAPREDUCE-1119.4.patch,
MAPREDUCE-1119.5.patch, MAPREDUCE-1119.6.patch, MAPREDUCE-1119.patch
>
>
> When the TT kills tasks that haven't reported status, it should somehow gather a stack
dump for the task. This could be done either by sending a SIGQUIT (so the dump ends up in
stdout) or perhaps something like JDI to gather the stack directly from Java. This may be
somewhat tricky since the child may be running as another user (so the SIGQUIT would have
to go through LinuxTaskController). This feature would make debugging these kinds of failures
much easier, especially if we could somehow get it into the TaskDiagnostic message

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message