hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
Date Wed, 18 Mar 2015 23:21:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368118#comment-14368118

Ming Ma commented on YARN-3212:

bq. Do we want to consider DECOMMISSIONING nodes as not active? There are containers actively
running on them, and in that sense they are participating in the cluster (and contributing
to the overall cluster resource). I think they should still be considered active, but I could
be persuaded otherwise.

Do we need to support the scenario where NM becomes dead when it is being decommissioned?
Say decommission timeout is 30 minutes larger than the NM liveness timeout.  The node drops
out of the cluster for some time and rejoin later all within the decommission time out. Will
Yarn show the status as just dead node, or {dead, decommissioning}? Seems useful for admins
to know about it. If we need that,  we can consider two types of NodeState. One is liveness
state, one is admin state. Then you will have different combinations.

> RMNode State Transition Update with DECOMMISSIONING state
> ---------------------------------------------------------
>                 Key: YARN-3212
>                 URL: https://issues.apache.org/jira/browse/YARN-3212
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, YARN-3212-v2.patch
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition
from “running” state triggered by a new event - “decommissioning”. 
> This new state can be transit to state of “decommissioned” when Resource_Update if
no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event
(after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous decommission
by calling recommission on the same node. The reaction to other events is similar to RUNNING

This message was sent by Atlassian JIRA

View raw message