hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
Date Fri, 20 Mar 2015 00:34:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370446#comment-14370446
] 

Junping Du commented on YARN-3212:
----------------------------------

Thanks [~jlowe] and [~mingma] for review and comments!
bq. Do we want to handle the DECOMMISSIONING_WITH_TIMEOUT event when the node is already in
the DECOMMISSIONING state? Curious if we might get a duplicate decommission event somehow
and want to ignore it or if we know for sure this cannot happen in practice.
This case is possibly happen when user submit another decommssion CLI while the node still
in decommissioning. I think it just ignore it now as nothing need to update if node already
in decommissioning. We will not have timeout tracking and update in RM side (may only pass
to AM for notification) according to discussions in YARN-914 and YARN-3225. 

bq. Do we want to consider DECOMMISSIONING nodes as not active? There are containers actively
running on them, and in that sense they are participating in the cluster (and contributing
to the overall cluster resource). I think they should still be considered active, but I could
be persuaded otherwise.
I think we discussed this on YARN-914 before. The conclusion so far is keeping node in decommissioning
as active (or may broken some services - I am not 100% sure on this) and make node resource
equals to resource of assigned containers at anytime. Do we need to change this conclusion?

bq.  In the reconnected node transition there is a switch statement that will debug-log an
unexpected state message when in fact the DECOMMISSIONING state is expected for this transition.
That's a good point. Will fix it in v3 patch. Thanks!

bq. Curious why the null check is needed in handleNMContainerStatuses? What about this change
allows the container statuses to be null?
I think so. Look like the RMNodeReconnectEvent comes from RegisterNodeManagerRequest and containerStatuses
field (getting from proto) could be nullable. So there is an NPE bug here and I found through
unit test where we created event like "new RMNodeReconnectEvent(node.getNodeID(), node, null,
null)" even before this patch. Am I missing something here?

bq. It would be nice to see some refactoring of the common code between StatusUpdateWhenHealthyTransition,
StatusUpdateWhenUnhealthyTransition, and StatusUpdateWhenDecommissioningTransition.
Yes. I should do earlier. Will do it in v3 patch.

bq. These change seems unnecessary?
These are still necessary because we changed state transition from one final state to multiple
final states (like below example) and interface only accept EnumSet. 
{code}
   public static class ReconnectNodeTransition implements
-      SingleArcTransition<RMNodeImpl, RMNodeEvent> {
+      MultipleArcTransition<RMNodeImpl, RMNodeEvent, NodeState> {
{code}

bq. Do we need to support the scenario where NM becomes dead when it is being decommissioned?
Say decommission timeout is 30 minutes larger than the NM liveness timeout. The node drops
out of the cluster for some time and rejoin later all within the decommission time out. Will
Yarn show the status as just dead node, or
{dead, decommissioning}
Now, the node can be LOST (dead) when it is in decommissioning. It is not different with running
node get lost but cannot join back except user put it back through recommission. Make sense?

bq. Seems useful for admins to know about it. If we need that, we can consider two types of
NodeState. One is liveness state, one is admin state. Then you will have different combinations.
We can add necessary log to let admin know about it. Are you talking about scenario like this:
admin put some node in decommissioning with a timeout, some upgrade script doing OS upgrade
and finish with a restart in a random time which could be shorter than decommissioning time.
Admin want these nodes can join back automatically. But how YARN know about Admin want these
nodes back or not after a restart? May be a explicitly set back to white list (recommission)
still necessary.

> RMNode State Transition Update with DECOMMISSIONING state
> ---------------------------------------------------------
>
>                 Key: YARN-3212
>                 URL: https://issues.apache.org/jira/browse/YARN-3212
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, YARN-3212-v2.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition
from “running” state triggered by a new event - “decommissioning”. 
> This new state can be transit to state of “decommissioned” when Resource_Update if
no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event
(after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous decommission
by calling recommission on the same node. The reaction to other events is similar to RUNNING
state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message