hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
Date Wed, 12 Aug 2015 11:48:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693360#comment-14693360
] 

Junping Du commented on YARN-3212:
----------------------------------

Thanks [~sunilg] for the comments! I agree this is not a bad idea for node in decommissioning
to give more chances for nodes just in UNHEALTHY. However, it will involve more complexities,
like: how much rounds we should wait (heartbeat number or timing, a separated configuration?),
an additional state for the node that is in decommissioning and unhealthy, etc. We should
evaluate if it worth it before we have hands-on experience on this new feature. In practically,
I saw rare cases that nodes can back to healthy state quite soon (unless get fixed immediately
with people log in) - that's saying within the timeout. 
Thus, I would prefer to keep the current transition which sounds slightly aggressively but
a good trade-off with simplicity at this moment. I can put a TODO in later patch (if other
outstanding issues according to the comments) to think more on this when we back with more
experiences. Make sense?

> RMNode State Transition Update with DECOMMISSIONING state
> ---------------------------------------------------------
>
>                 Key: YARN-3212
>                 URL: https://issues.apache.org/jira/browse/YARN-3212
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, YARN-3212-v2.patch, YARN-3212-v3.patch,
YARN-3212-v4.1.patch, YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition
from “running” state triggered by a new event - “decommissioning”. 
> This new state can be transit to state of “decommissioned” when Resource_Update if
no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event
(after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous decommission
by calling recommission on the same node. The reaction to other events is similar to RUNNING
state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message