hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bibin A Chundatt (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6178) RM recovery failure on node label mirror load
Date Sat, 11 Feb 2017 14:15:42 GMT

    [ https://issues.apache.org/jira/browse/YARN-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862405#comment-15862405
] 

Bibin A Chundatt commented on YARN-6178:
----------------------------------------

[~varun_saxena]

Looked into the code and did not find a case this could happen. Even if label list is  empty
file size should not be empty.
But  the cause of failure for RM not starting in my laptop setup is due to node label mirror
image size. Will try to reproduce the same.
We could  handle  file size 0 case also so that RM startup will not fail. Thoughts??


> RM recovery failure on node label mirror load
> ---------------------------------------------
>
>                 Key: YARN-6178
>                 URL: https://issues.apache.org/jira/browse/YARN-6178
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>
> Node label feature enabled .File state store the mirror file size is zero. 
> {noformat}
> secureuser@vm2:/tmp/hadoop-yarn-yarn/node-labels> l
> total 8
> drwxr-xr-x 2 secureuser hadoop 4096 Feb  6 18:56 ./
> drwxr-xr-x 3 secureuser hadoop 4096 Jan 22 22:07 ../
> -rw-r--r-- 1 secureuser hadoop    0 Feb  6 18:56 nodelabel.editlog
> -rw-r--r-- 1 secureuser hadoop    0 Feb  6 18:56 nodelabel.mirror
> {noformat}
> {noformat}
> 2017-02-11 10:04:59,034 INFO org.apache.hadoop.conf.Configuration: dynamic-resources.xml
not found
> 2017-02-11 10:04:59,042 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=yarn     OPERATION=transitionToActive    TARGET=RM       RESULT=FAILURE  DESCRIPTION=Exception
transitioning to active   PERMISSIONS=
> 2017-02-11 10:04:59,042 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling
the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>         at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>         at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
>         at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active
mode
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:321)
>         at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
>         ... 4 more
> Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.AddToClusterNodeLabelsRequestPBImpl.initLocalNodeLabels(AddToClusterNodeLabelsRequestPBImpl.java:117)
>         at org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.AddToClusterNodeLabelsRequestPBImpl.getNodeLabels(AddToClusterNodeLabelsRequestPBImpl.java:129)
>         at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:169)
>         at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:205)
>         at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
>         at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:761)
>         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1139)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1179)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1175)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1175)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:316)
>         ... 5 more
> 2017-02-11 10:04:59,042 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish
ZK session
> {noformat}
> Should skip load if the size is zero. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message