hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1119) When tasks fail to report status, show tasks's stack dump before killing
Date Wed, 11 Nov 2009 19:42:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776606#action_12776606

Todd Lipcon commented on MAPREDUCE-1119:

New patch looks better. A couple changes before I think it's ready for commit:
- The "wasFailure" boolean isn't terribly clear. I'd like to either see some more javadoc
"@param" tags, or replace it with a KillReason enum. Basically I want to be able to fill in
this table:
|*Reason*|*wasFailure*|*generates stack*|
|Child threw exception|?|?|
|Job failed due to other tasks failing|?|?|
|Task timed out|?|?|
|Task killed by user (incl preemption)|?|?|
|Job killed by user|?|?|

Perhaps I'm just being dense, but the above isn't easy to understand from the existing code.
If this was a "preexisting condition" (sorry!) then maybe it's out of scope for this JIRA.

> When tasks fail to report status, show tasks's stack dump before killing
> ------------------------------------------------------------------------
>                 Key: MAPREDUCE-1119
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.3.patch, MAPREDUCE-1119.patch
> When the TT kills tasks that haven't reported status, it should somehow gather a stack
dump for the task. This could be done either by sending a SIGQUIT (so the dump ends up in
stdout) or perhaps something like JDI to gather the stack directly from Java. This may be
somewhat tricky since the child may be running as another user (so the SIGQUIT would have
to go through LinuxTaskController). This feature would make debugging these kinds of failures
much easier, especially if we could somehow get it into the TaskDiagnostic message

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message