hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7252) Removing queue then failing over results in exception
Date Tue, 26 Sep 2017 19:49:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181441#comment-16181441
] 

Wangda Tan commented on YARN-7252:
----------------------------------

[~jhung], I'm trying to understand the problem: when root.a deleted by admin, will be this
added to edit log? If yes, we should not check STOPPED state for a delete queue. 

> Removing queue then failing over results in exception
> -----------------------------------------------------
>
>                 Key: YARN-7252
>                 URL: https://issues.apache.org/jira/browse/YARN-7252
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>         Attachments: YARN-7252-YARN-5734.001.patch
>
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 is active.
First, put root.a into STOPPED state, then remove it. Then put rm1 in standby and rm2 in active.
Here's the exception: {noformat}Operation failed: Error on refreshAll during transition to
Active
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
> 	at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> 	at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation failed
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
> 	... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted from the
new capacity scheduler configuration, but the queue is not yet in stopped state. Current State
: RUNNING
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
> 	... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity scheduler configuration,
but the queue is not yet in stopped state. Current State : RUNNING
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
> 	... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and sees it
is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message