hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them
Date Fri, 12 Feb 2016 13:35:18 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144582#comment-15144582
] 

Eric Payne commented on MAPREDUCE-5044:
---------------------------------------

Hi [~jira.shegalov]. I would like to see this functionality implemented. We occasionally see
containers time out, and it would be good if users could have direct feedback in the form
of a jstack to help them debug their applications.

I have been coming up to speed on the work that's already been committed in this area under
YARN-445 and its children. IIUC, YARN-445 and its children put in place the infrastructure
for a {{Client -> RM -> NM -> Container}} signal path. On the other hand, this JIRA
(along with YARN-1515) implements an {{AM -> NM -> Container}} signal path and the ability
to send multiple signals per call.

It seems that these pieces could possibly be split into separate JIRAs. Either way, I think
that a lot of what has been done in this JIRA could be used to add the interface to {{ContainerManagementProtocol}}
that would allow the AM to prompt the NM to signal the container to dump its stack prior to
killing the container on a timeout.

Is there a possibility that this JIRA will move forward? Ideally, we would like it all ported
back to 2.7. Please let me know if there's anything I can do.

> Have AM trigger jstack on task attempts that timeout before killing them
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-5044.v01.patch, MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch,
MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch, MAPREDUCE-5044.v06.patch, Screen Shot
2013-11-12 at 1.05.32 PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack output via
SIGQUIT before killing the task attempt.  This would be invaluable for helping users debug
their hung tasks, especially if they do not have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message