Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A47A518378 for ; Wed, 6 May 2015 07:23:00 +0000 (UTC) Received: (qmail 80520 invoked by uid 500); 6 May 2015 07:23:00 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 80476 invoked by uid 500); 6 May 2015 07:23:00 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 80464 invoked by uid 99); 6 May 2015 07:23:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2015 07:23:00 +0000 Date: Wed, 6 May 2015 07:23:00 +0000 (UTC) From: "Assaf Sinvani (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CURATOR-172) Deadlock when performing background operation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CURATOR-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530076#comment-14530076 ] Assaf Sinvani commented on CURATOR-172: --------------------------------------- We have the same issue (CURATOR-194), and we use a no-retry policy (allowRetry returns false) We can't see who will notify() the packet to release CuratorFramework-0 > Deadlock when performing background operation > --------------------------------------------- > > Key: CURATOR-172 > URL: https://issues.apache.org/jira/browse/CURATOR-172 > Project: Apache Curator > Issue Type: Bug > Components: Client > Affects Versions: 2.4.2 > Environment: Linux HOSTNAME-REMOVED 2.6.32-279.19.1.el6.x86_64 #1 SMP Tue Dec 18 15:04:44 PST 2012 x86_64 x86_64 x86_64 GNU/Linux > java version "1.7.0_60" > Java(TM) SE Runtime Environment (build 1.7.0_60-b19) > Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) > Reporter: Tom Byrne > > Had a box get into a state where our ZK connections were all deadlocked, waiting on an object monitor. jstack shows that our background thread that was creating a node was waiting on a lock that was held by the CuratorFramework thread, who was waiting on an object monitor that looks like it couldn't be completed until our other write was finished (packet.finish would never return true.) > We have seen this happen twice, but don't notice it until afterwards, and don't have enough logging to know what's triggering it (possible ZK connections going away?) > Rest of the box is fine, network connections are not flapping, main IO threads continue to accept and process connections, until we get backed up waiting for ZK. > Here are the two stack traces: > "ZooChangeWatcher-BackgroundReader--2-1-SendThread()" daemon prio=10 tid=0x00007fcf64108000 nid=0x88d waiting for monitor entry [0x00007fcbf5d16000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:177) > - waiting to lock <0x00000000d526bcc8> (a org.apache.curator.ConnectionState) > at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) > at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:763) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:470) > at org.apache.curator.framework.imps.CreateBuilderImpl.pathInBackground(CreateBuilderImpl.java:648) > at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:427) > at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44) > at org.apache.curator.framework.recipes.nodes.PersistentEphemeralNode.createNode(PersistentEphemeralNode.java:340) > at org.apache.curator.framework.recipes.nodes.PersistentEphemeralNode.access$000(PersistentEphemeralNode.java:52) > at org.apache.curator.framework.recipes.nodes.PersistentEphemeralNode$4.processResult(PersistentEphemeralNode.java:224) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:686) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:659) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:479) > at org.apache.curator.framework.imps.CreateBuilderImpl.sendBackgroundResponse(CreateBuilderImpl.java:526) > at org.apache.curator.framework.imps.CreateBuilderImpl.access$600(CreateBuilderImpl.java:44) > at org.apache.curator.framework.imps.CreateBuilderImpl$6.processResult(CreateBuilderImpl.java:485) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:602) > at org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:475) > - locked <0x00000000fa8e16f8> (a java.util.concurrent.LinkedBlockingQueue) > at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:627) > at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:645) > at org.apache.zookeeper.ClientCnxn.access$2400(ClientCnxn.java:85) > at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1160) > - locked <0x00000000fa8e1380> (a java.util.LinkedList) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1109) > "CuratorFramework-0" daemon prio=10 tid=0x00007fd02cb57800 nid=0x4425 in Object.wait() [0x00007fcfc507e000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) > - locked <0x00000000fa8e6750> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1281) > at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:677) > - locked <0x00000000fa8e0948> (a org.apache.zookeeper.ZooKeeper) > at org.apache.curator.HandleHolder.internalClose(HandleHolder.java:139) > at org.apache.curator.HandleHolder.closeAndReset(HandleHolder.java:77) > at org.apache.curator.ConnectionState.reset(ConnectionState.java:218) > - locked <0x00000000d526bcc8> (a org.apache.curator.ConnectionState) > at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:194) > - locked <0x00000000d526bcc8> (a org.apache.curator.ConnectionState) > at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) > at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:763) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:749) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:56) > at org.apache.curator.framework.imps.CuratorFrameworkImpl$3.call(CuratorFrameworkImpl.java:244) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Help me Obi-Wan Kenobi, you're my only hope. -- This message was sent by Atlassian JIRA (v6.3.4#6332)