curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amir Gur (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CURATOR-194) Deadlock ConnectionState.checkTimeouts
Date Tue, 24 Mar 2015 09:44:52 GMT
Amir Gur created CURATOR-194:
--------------------------------

             Summary: Deadlock ConnectionState.checkTimeouts
                 Key: CURATOR-194
                 URL: https://issues.apache.org/jira/browse/CURATOR-194
             Project: Apache Curator
          Issue Type: Bug
          Components: Client
    Affects Versions: 2.6.0
            Reporter: Amir Gur


When ConnectionState.checkTimeouts actually detects a timeout, it calls 'reset'  
which calls org.apache.zookeeper.ClientCnxn.close, which sends a ZooDefs.OpCode.closeSession
request.
Then it waits on the packet, until SendThread calls 'notifyAll' on the packet.

At that time, SendThread is blocked because it tries to enter the synchronized method 'ConnectionState.checkTimeouts'.
So it will never notify the packet.

Here is the thread dump:

"job-scheduler_Worker-19-CheckHealthTask" prio=10 tid=0x00007f260609c000 nid=0x5a97 in Object.wait()
[0x00007f25723e1000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000725fc0580> (a org.apache.zookeeper.ClientCnxn$Packet)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
        - locked <0x0000000725fc0580> (a org.apache.zookeeper.ClientCnxn$Packet)
        at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1314)
        at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:677)
        - locked <0x0000000723949c88> (a org.apache.zookeeper.ZooKeeper)
        at org.apache.curator.HandleHolder.internalClose(HandleHolder.java:139)
        at org.apache.curator.HandleHolder.closeAndReset(HandleHolder.java:77)
        at org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
        - locked <0x000000071651de48> (a org.apache.curator.ConnectionState)
        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:194)
        - locked <0x000000071651de48> (a org.apache.curator.ConnectionState)
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474)
        at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172)
        at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
        at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157)
        at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148)
        at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36)
        at com.alu.dal.zooKeeper.ZooKeeperSession.checkHealth(ZooKeeperSession.java:350)
        at com.alu.dal.zooKeeper.ZooKeeperSession.check(ZooKeeperSession.java:86)
        at com.alu.orchestration.cluster.ClusterInstanceServiceImpl.checkQuorum(ClusterInstanceServiceImpl.java:464)
        at com.alu.orchestration.cluster.ClusterInstanceServiceImpl.checkHealthState(ClusterInstanceServiceImpl.java:400)
        at com.alu.tasks.health.CheckHealthTaskImpl.doWork(CheckHealthTaskImpl.java:37)
        at com.alu.scheduler.JobSchedulerDetails$QuartzJob.executeInternal(JobSchedulerDetails.java:95)
        at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)


"localhost-startStop-1-SendThread(11.1.1.11:2181)" daemon prio=10 tid=0x00007f257c61a000 nid=0x7c3
waiting for monitor entry [0x00007f2562e65000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:177)
        - waiting to lock <0x000000071651de48> (a org.apache.curator.ConnectionState)
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:793)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyncForSuspendedConnection(CuratorFrameworkImpl.java:668)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$800(CuratorFrameworkImpl.java:58)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$7.retriesExhausted(CuratorFrameworkImpl.java:664)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:683)
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496)
        at org.apache.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:50)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
        at org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:478)
        - locked <0x0000000714935b18> (a java.util.concurrent.LinkedBlockingQueue)
        at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:630)
        at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:648)
        at org.apache.zookeeper.ClientCnxn.access$2400(ClientCnxn.java:85)
        at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1194)
        - locked <0x000000071b205bf0> (a java.util.LinkedList)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1122)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message