hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2047) RM should honor NM heartbeat expiry after RM restart
Date Thu, 05 Nov 2015 03:44:27 GMT

    [ https://issues.apache.org/jira/browse/YARN-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991069#comment-14991069
] 

Bikas Saha commented on YARN-2047:
----------------------------------

>From the description it seems like the original scope was making sure that a lost NM's
containers are marked expired by the RM even across RM restart. For that, wont it be enough
to save a dead/decommissioned NM info in the state store. Upon restart, repopulate the decommissioned/dead
status from the state store. It can take appropriate action at that time - e.g. cancelling
an AM containers for those NMs when the AM re-registers or asking those NMs to restart and
re-register if they heartbeat again.


If this is a required action then it would also imply that saving a such nodes would be a
critical state change operation. So, e.g. decommission command from the admin should not complete
until the store has been updated. Is that the case?

> RM should honor NM heartbeat expiry after RM restart
> ----------------------------------------------------
>
>                 Key: YARN-2047
>                 URL: https://issues.apache.org/jira/browse/YARN-2047
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>
> After the RM restarts, it forgets about existing NM's (and their potentially decommissioned
status too). After restart, the RM cannot maintain the contract to the AM's that a lost NM's
containers will be marked finished within the expiry time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message