hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5044) Have AM trigger jstack on task attempts that timeout before killing them
Date Tue, 17 May 2016 02:29:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285894#comment-15285894
] 

Ming Ma commented on MAPREDUCE-5044:
------------------------------------

[~eepayne], my apologies for the delay.

* There was some discussion about combining signalContainer and stopContainers so that stopContainer
is just a special case for signalContainer. And to support the "SIGTERM + delay + SIGKILL"
used in stopContainers, we then need an ordered list of commands, thus the need for signalContainers.
We don't need to deal with that at this point. But it might be useful to rename signalContainer
to signalContainers so that we don't need to modify the API later, which means some new structure
like {{SignalContainersRequest}}. What is your take?
* ContainerManagerImpl. It might be cleaner to abstract the common signal container code to
a function used for both {{AM -> NM}} and {{RM -> NM}} cases.
* TaskAttemptImpl#PreemptedTransition. Given it is called only when the attempt is preempted,
{{event.getType() == TaskAttemptEventType.TA_TIMED_OUT}} can be replaced by {{false}}.
* It will be useful to add an end-to-end new unit test, which can be found in Gera's original
patch.
* Nit: ContainerLauncherImpl. Return value of {{getContainerManagementProtocol().signalContainer}}
isn't used and can be removed.
* Nit: ContainerLauncherEvent has indent format issue.

> Have AM trigger jstack on task attempts that timeout before killing them
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-5044.008.patch, MAPREDUCE-5044.009.patch, MAPREDUCE-5044.v01.patch,
MAPREDUCE-5044.v02.patch, MAPREDUCE-5044.v03.patch, MAPREDUCE-5044.v04.patch, MAPREDUCE-5044.v05.patch,
MAPREDUCE-5044.v06.patch, MAPREDUCE-5044.v07.local.patch, Screen Shot 2013-11-12 at 1.05.32
PM.png, Screen Shot 2013-11-12 at 1.06.04 PM.png
>
>
> When an AM expires a task attempt it would be nice if it triggered a jstack output via
SIGQUIT before killing the task attempt.  This would be invaluable for helping users debug
their hung tasks, especially if they do not have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message