hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1119) When tasks fail to report status, show tasks's stack dump before killing
Date Wed, 11 Nov 2009 22:06:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776687#action_12776687
] 

Todd Lipcon commented on MAPREDUCE-1119:
----------------------------------------

Hey Aaron,

Thanks for doing the investigation to fill in the table.

The behavior as you've described it all seems pretty reasonable - I wish the "maybe" were
"false", but it sounds like it does require a pretty significant overhaul of failure tracking
throughout the TT/TaskManager/etc, so I'd be inclined to say it's out of scope.

So, +0.5 from me on the current patch. Full +1 if some of the above explanation could be transformed
into javadoc on the wasFailure parameter - probably just teh full explanation where it first
appears, and then a "@see FirstAppearanceClass.someMethodName" for that parameter elsewhere,
so we don't have duplication of the explanation.

> When tasks fail to report status, show tasks's stack dump before killing
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1119
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.3.patch, MAPREDUCE-1119.patch
>
>
> When the TT kills tasks that haven't reported status, it should somehow gather a stack
dump for the task. This could be done either by sending a SIGQUIT (so the dump ends up in
stdout) or perhaps something like JDI to gather the stack directly from Java. This may be
somewhat tricky since the child may be running as another user (so the SIGQUIT would have
to go through LinuxTaskController). This feature would make debugging these kinds of failures
much easier, especially if we could somehow get it into the TaskDiagnostic message

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message