hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (YARN-2510) RM can drop container completion events
Date Wed, 03 Sep 2014 23:19:51 GMT

     [ https://issues.apache.org/jira/browse/YARN-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe resolved YARN-2510.
------------------------------
    Resolution: Invalid

My apologies, this is an invalid report.  I accidentally grabbed the wrong container ID when
searching the RM log because after looking again I don't see the RM seeing the container completion
event.  The 9 missing completion events on the AM were all from the same node, so I think
this is a case of a poorly handled node failure that lead to a MapReduce app hang.

I'll file a separate JIRA to track handling that case better.  That's probably is a MapReduce
fix since the RM can't tell the container is no longer needed unless either the NM reports
it completing (which it failed to do in this case due to a bad node) or the AM explicitly
releases the container.

> RM can drop container completion events
> ---------------------------------------
>
>                 Key: YARN-2510
>                 URL: https://issues.apache.org/jira/browse/YARN-2510
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Priority: Critical
>
> The RM can drop container completion events and fail to report them to the AM.  Details
in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message