Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 88FA717CFB for ; Wed, 25 Mar 2015 17:06:54 +0000 (UTC) Received: (qmail 55658 invoked by uid 500); 25 Mar 2015 17:06:54 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 55609 invoked by uid 500); 25 Mar 2015 17:06:54 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 55596 invoked by uid 99); 25 Mar 2015 17:06:54 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2015 17:06:54 +0000 Date: Wed, 25 Mar 2015 17:06:54 +0000 (UTC) From: "Jordan Zimmerman (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (CURATOR-194) Deadlock in ConnectionState.checkTimeouts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CURATOR-194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan Zimmerman resolved CURATOR-194. -------------------------------------- Resolution: Duplicate > Deadlock in ConnectionState.checkTimeouts > ----------------------------------------- > > Key: CURATOR-194 > URL: https://issues.apache.org/jira/browse/CURATOR-194 > Project: Apache Curator > Issue Type: Bug > Components: Client > Affects Versions: 2.6.0 > Reporter: Amir Gur > > When ConnectionState.checkTimeouts actually detects a timeout, it calls 'reset' > which calls org.apache.zookeeper.ClientCnxn.close, which sends a ZooDefs.OpCode.closeSession request. > Then it waits on the packet, until SendThread calls 'notifyAll' on the packet. > At that time, SendThread is blocked because it tries to enter the synchronized method 'ConnectionState.checkTimeouts'. > So it will never notify the packet. > Here is the thread dump: > "job-scheduler_Worker-19-CheckHealthTask" prio=10 tid=0x00007f260609c000 nid=0x5a97 in Object.wait() [0x00007f25723e1000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x0000000725fc0580> (a org.apache.zookeeper.ClientCnxn$Packet) > at java.lang.Object.wait(Object.java:503) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) > - locked <0x0000000725fc0580> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1314) > at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:677) > - locked <0x0000000723949c88> (a org.apache.zookeeper.ZooKeeper) > at org.apache.curator.HandleHolder.internalClose(HandleHolder.java:139) > at org.apache.curator.HandleHolder.closeAndReset(HandleHolder.java:77) > at org.apache.curator.ConnectionState.reset(ConnectionState.java:218) > - locked <0x000000071651de48> (a org.apache.curator.ConnectionState) > at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:194) > - locked <0x000000071651de48> (a org.apache.curator.ConnectionState) > at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) > at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474) > at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) > at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) > at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) > at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) > at com.alu.dal.zooKeeper.ZooKeeperSession.checkHealth(ZooKeeperSession.java:350) > at com.alu.dal.zooKeeper.ZooKeeperSession.check(ZooKeeperSession.java:86) > at com.alu.orchestration.cluster.ClusterInstanceServiceImpl.checkQuorum(ClusterInstanceServiceImpl.java:464) > at com.alu.orchestration.cluster.ClusterInstanceServiceImpl.checkHealthState(ClusterInstanceServiceImpl.java:400) > at com.alu.tasks.health.CheckHealthTaskImpl.doWork(CheckHealthTaskImpl.java:37) > at com.alu.scheduler.JobSchedulerDetails$QuartzJob.executeInternal(JobSchedulerDetails.java:95) > at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114) > at org.quartz.core.JobRunShell.run(JobRunShell.java:216) > at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) > "localhost-startStop-1-SendThread(11.1.1.11:2181)" daemon prio=10 tid=0x00007f257c61a000 nid=0x7c3 waiting for monitor entry [0x00007f2562e65000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:177) > - waiting to lock <0x000000071651de48> (a org.apache.curator.ConnectionState) > at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) > at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:793) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyncForSuspendedConnection(CuratorFrameworkImpl.java:668) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$800(CuratorFrameworkImpl.java:58) > at org.apache.curator.framework.imps.CuratorFrameworkImpl$7.retriesExhausted(CuratorFrameworkImpl.java:664) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:683) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496) > at org.apache.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:50) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609) > at org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:478) > - locked <0x0000000714935b18> (a java.util.concurrent.LinkedBlockingQueue) > at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:630) > at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:648) > at org.apache.zookeeper.ClientCnxn.access$2400(ClientCnxn.java:85) > at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1194) > - locked <0x000000071b205bf0> (a java.util.LinkedList) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1122) -- This message was sent by Atlassian JIRA (v6.3.4#6332)