hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1071) ResourceManager's decommissioned and lost node count is 0 after restart
Date Thu, 20 Feb 2014 22:47:22 GMT

    [ https://issues.apache.org/jira/browse/YARN-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907654#comment-13907654
] 

Jian He commented on YARN-1071:
-------------------------------

Thanks zhijie for the review ! 
bq. HostsFileReader#refresh(2params)
That's hadoop-common code, we should probably not touch it.
bq. Check the ip as well as we do in NodesListManager#isValidNode?
good catch!
Fixed other comments also.

The patch doesn't fix the include list scenario and changing exclude list between rm restarts.
For that, rm may need to persistently save the decomissionNM state

> ResourceManager's decommissioned and lost node count is 0 after restart
> -----------------------------------------------------------------------
>
>                 Key: YARN-1071
>                 URL: https://issues.apache.org/jira/browse/YARN-1071
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Srimanth Gunturi
>            Assignee: Jian He
>         Attachments: YARN-1071.1.patch, YARN-1071.2.patch, YARN-1071.3.patch, YARN-1071.4.patch
>
>
> I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}.
After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node
count:
> {noformat}
> "NumActiveNMs" : 3,
> "NumDecommissionedNMs" : 1,
> "NumLostNMs" : 2,
> "NumUnhealthyNMs" : 0,
> "NumRebootedNMs" : 0
> {noformat}
> After restarting RM, the counts were shown as below in JMX.
> {noformat}
> "NumActiveNMs" : 3,
> "NumDecommissionedNMs" : 0,
> "NumLostNMs" : 0,
> "NumUnhealthyNMs" : 0,
> "NumRebootedNMs" : 0
> {noformat}
> Notice that the lost and decommissioned NM counts are both 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message