hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7053) Timed out tasks can fail to produce thread dump
Date Wed, 14 Feb 2018 17:34:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364487#comment-16364487
] 

Jason Lowe commented on MAPREDUCE-7053:
---------------------------------------

The easiest "fix" for this issue is to have the AM ignore tasks that are unknown as it did
before, although that could cause unknown tasks to linger on the cluster far longer than they
should if somehow a task were to "escape."

> Timed out tasks can fail to produce thread dump
> -----------------------------------------------
>
>                 Key: MAPREDUCE-7053
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7053
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>            Reporter: Jason Lowe
>            Priority: Major
>
> TestMRJobs#testThreadDumpOnTaskTimeout has been failing sporadically recently.  When
the AM times out a task it immediately removes it from the list of known tasks and then connects
to the NM to request a thread dump followed by a kill.  If the task heartbeats in after the
task has been removed from the list of known tasks but before the thread dump signal arrives
then the task can exit with a "org.apache.hadoop.mapred.Task: Parent died." message and no
thread dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message