hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
Date Tue, 14 Apr 2015 11:45:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494002#comment-14494002

Junping Du commented on YARN-3212:

Thanks [~rohithsharma] for comments!
bq. I gone through the design doc and the approach looks good to me. Let you know if any clarrification
Sounds good. Thx!

bq. Because, Reconnected event can trigger only when node state is RUNNING|UNHEALTHY.
This should be changed after this patch. Because node (NM daemon) could be shutdown and restart
in decommissioning stage, and reconnect to RM will go to this state transition.

BTW, I would suggest to hold on review this patch now as it depends on YARN-3445 (NM heartbeat
RM with running apps). Also, YARN-3225 is almost ready to go in, so a rebase could be needed.

> RMNode State Transition Update with DECOMMISSIONING state
> ---------------------------------------------------------
>                 Key: YARN-3212
>                 URL: https://issues.apache.org/jira/browse/YARN-3212
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, YARN-3212-v2.patch, YARN-3212-v3.patch
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition
from “running” state triggered by a new event - “decommissioning”. 
> This new state can be transit to state of “decommissioned” when Resource_Update if
no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event
(after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous decommission
by calling recommission on the same node. The reaction to other events is similar to RUNNING

This message was sent by Atlassian JIRA

View raw message