hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1421) Node managers will not receive application finish event where containers ran before RM restart
Date Fri, 22 Nov 2013 21:34:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830331#comment-13830331

Karthik Kambatla commented on YARN-1421:

Reading all the RM apps on each node heartbeat can lead to significant performance overhead.
Would be nice to use an alternative mechanism to find "recently" finished applications from
among the ones the NM reports.

> Node managers will not receive application finish event where containers ran before RM
> ----------------------------------------------------------------------------------------------
>                 Key: YARN-1421
>                 URL: https://issues.apache.org/jira/browse/YARN-1421
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>            Priority: Critical
> Problem :- Today for every application we track the node managers where containers ran.
So when application finishes it notifies all those node managers about application finish
event (via node manager heartbeat). However if rm restarts then we forget this past information
and those node managers will never get application finish event and will keep reporting finished
> Proposed Solution :- Instead of remembering the node managers where containers ran for
this particular application it would be better if we depend on node manager heartbeat to take
this decision. i.e. when node manager heartbeats saying it is running application (app1, app2)
then we should check those application's status in RM's memory {code}rmContext.getRMApps(){code}
and if either they are not found (very old applications) or they are in their final state
(FINISHED, KILLED, FAILED) then we should immediately notify the node manager about the application
finish event. By doing this we are reducing the state which we need to store at RM after restart.

This message was sent by Atlassian JIRA

View raw message