hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
Date Wed, 18 Mar 2015 22:02:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368012#comment-14368012

Jason Lowe commented on YARN-3212:

Thanks for the patch, Junping.  Some comments on the patch from a quick overview of it:

Do we want to handle the DECOMMISSIONING_WITH_TIMEOUT event when the node is already in the
DECOMMISSIONING state?  Curious if we might get a duplicate decommission event somehow and
want to ignore it or if we know for sure this cannot happen in practice.

Do we want to consider DECOMMISSIONING nodes as not active?  There are containers actively
running on them, and in that sense they are participating in the cluster (and contributing
to the overall cluster resource).  I think they should still be considered active, but I could
be persuaded otherwise.

In the reconnected node transition there is a switch statement that will debug-log an unexpected
state message when in fact the DECOMMISSIONING state is expected for this transition.

Curious why the null check is needed in handleNMContainerStatuses?  What about this change
allows the container statuses to be null?

It would be nice to see some refactoring of the common code between StatusUpdateWhenHealthyTransition,
StatusUpdateWhenUnhealthyTransition, and StatusUpdateWhenDecommissioningTransition.

These change seems unnecessary?
-     .addTransition(NodeState.RUNNING, NodeState.RUNNING,
+     .addTransition(NodeState.RUNNING, EnumSet.of(NodeState.RUNNING),


-     .addTransition(NodeState.UNHEALTHY, NodeState.UNHEALTHY,
+     .addTransition(NodeState.UNHEALTHY, EnumSet.of(NodeState.UNHEALTHY),


-        LOG.info("Node " + rmNode.nodeId + " reported UNHEALTHY with details: "
-            + remoteNodeHealthStatus.getHealthReport());
+        LOG.info("Node " + rmNode.nodeId + " reported UNHEALTHY with details: " +
+            remoteNodeHealthStatus.getHealthReport());

> RMNode State Transition Update with DECOMMISSIONING state
> ---------------------------------------------------------
>                 Key: YARN-3212
>                 URL: https://issues.apache.org/jira/browse/YARN-3212
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, YARN-3212-v2.patch
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition
from “running” state triggered by a new event - “decommissioning”. 
> This new state can be transit to state of “decommissioned” when Resource_Update if
no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event
(after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous decommission
by calling recommission on the same node. The reaction to other events is similar to RUNNING

This message was sent by Atlassian JIRA

View raw message