hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2047) RM should honor NM heartbeat expiry after RM restart
Date Mon, 09 Nov 2015 13:21:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996513#comment-14996513

Jun Gong commented on YARN-2047:

Sorry for the late reply. 

The issue aims to make sure that a lost NM's containers are marked expired by the RM even
across RM restart. What I said aims to solve the problem it caused in another way. Any thought?

If this is a required action then it would also imply that saving a such nodes would be a
critical state change operation. So, e.g. decommission command from the admin should not complete
until the store has been updated. Is that the case?
Yes, it is. However the store process is often very fast, it might be acceptable.

> RM should honor NM heartbeat expiry after RM restart
> ----------------------------------------------------
>                 Key: YARN-2047
>                 URL: https://issues.apache.org/jira/browse/YARN-2047
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
> After the RM restarts, it forgets about existing NM's (and their potentially decommissioned
status too). After restart, the RM cannot maintain the contract to the AM's that a lost NM's
containers will be marked finished within the expiry time.

This message was sent by Atlassian JIRA

View raw message