hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4663) DeadLocks in ZKRMStateStore
Date Mon, 01 Feb 2016 09:05:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125969#comment-15125969
] 

Varun Saxena commented on YARN-4663:
------------------------------------

There seems to be an issue in ZK Client code.
{panel}
"main-EventThread":
	at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1468)
	- *waiting to lock <0x00000000ce5d0160>* (a java.util.LinkedList)
	at org.apache.zookeeper.ClientCnxn$SendThread.cleanAndNotifyState(ClientCnxn.java:1456)
	at org.apache.zookeeper.ClientCnxn$SendThread.access$2800(ClientCnxn.java:868)
	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1641)
	- *locked <0x00000000ce5c66c0>* (a org.apache.zookeeper.ClientCnxn$Packet)
	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1622)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2261)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2291)

"main-SendThread(160-149-0-9:24002)":
		at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:775)
	- *waiting to lock <0x00000000ce5c66c0>* (a org.apache.zookeeper.ClientCnxn$Packet)
	at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:815)
	at org.apache.zookeeper.ClientCnxn.access$2600(ClientCnxn.java:99)
	at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1469)
	- *locked <0x00000000ce5d0160>* (a java.util.LinkedList)
	at org.apache.zookeeper.ClientCnxn$SendThread.cleanAndNotifyState(ClientCnxn.java:1456)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1385)
{panel}

> DeadLocks in ZKRMStateStore
> ---------------------------
>
>                 Key: YARN-4663
>                 URL: https://issues.apache.org/jira/browse/YARN-4663
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Bob
>            Priority: Blocker
>
> {code}
> Java stack information for the threads listed above:
> ===================================================
> "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread":
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:963)
> 	- waiting to lock <0x00000000c8470590> (a org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.access$600(ZKRMStateStore.java:92)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread.run(ZKRMStateStore.java:1113)
> "main-EventThread":
> 	at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1468)
> 	- waiting to lock <0x00000000ce5d0160> (a java.util.LinkedList)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.cleanAndNotifyState(ClientCnxn.java:1456)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.access$2800(ClientCnxn.java:868)
> 	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1641)
> 	- locked <0x00000000ce5c66c0> (a org.apache.zookeeper.ClientCnxn$Packet)
> 	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1622)
> 	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2261)
> 	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2291)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:1053)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:1050)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1145)
> 	- locked <0x00000000c8470590> (a org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1178)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getChildrenWithRetries(ZKRMStateStore.java:1050)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.loadApplicationAttemptState(ZKRMStateStore.java:606)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.loadRMAppState(ZKRMStateStore.java:595)
> 	- locked <0x00000000c8470590> (a org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore)
> 	at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.loadState(ZKRMStateStore.java:464)
> 	- locked <0x00000000c8470590> (a org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:625)
> 	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> 	- locked <0x00000000c824a9f0> (a java.lang.Object)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1033)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1074)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1070)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1675)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1070)
> 	- locked <0x00000000c804f7c0> (a org.apache.hadoop.yarn.server.resourcemanager.ResourceManager)
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314)
> 	- locked <0x00000000c8465e20> (a org.apache.hadoop.yarn.server.resourcemanager.AdminService)
> 	at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
> 	- locked <0x00000000c82e1808> (a org.apache.hadoop.ha.ActiveStandbyElector)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:694)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:566)
> "main-SendThread(160-149-0-9:24002)":
> 	at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:775)
> 	- waiting to lock <0x00000000ce5c66c0> (a org.apache.zookeeper.ClientCnxn$Packet)
> 	at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:815)
> 	at org.apache.zookeeper.ClientCnxn.access$2600(ClientCnxn.java:99)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1469)
> 	- locked <0x00000000ce5d0160> (a java.util.LinkedList)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.cleanAndNotifyState(ClientCnxn.java:1456)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1385)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message