Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4BE3317C2C for ; Thu, 30 Oct 2014 17:37:34 +0000 (UTC) Received: (qmail 29533 invoked by uid 500); 30 Oct 2014 17:37:34 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 29490 invoked by uid 500); 30 Oct 2014 17:37:34 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 29478 invoked by uid 99); 30 Oct 2014 17:37:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Oct 2014 17:37:34 +0000 Date: Thu, 30 Oct 2014 17:37:33 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CURATOR-157) Avoid stack traces closing PathChildrenCache followed by closing CuratorFramework MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CURATOR-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190442#comment-14190442 ] ASF GitHub Bot commented on CURATOR-157: ---------------------------------------- Github user bdumon commented on a diff in the pull request: https://github.com/apache/curator/pull/51#discussion_r19621734 --- Diff: curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/PathChildrenCache.java --- @@ -382,6 +383,11 @@ public void close() throws IOException connectionStateListener = null; childrenWatcher = null; dataWatcher = null; + + synchronized (backgroundTaskMonitor) { --- End diff -- No problem, there might a problem in my reasoning, which is as follows: a refresh might be going on when PathChildrenCache.close() is called. The refresh runs on a background thread, the ZK event thread if I'm not mistaken. Since the processResult will be running within a synchronized block, the PathChildrenCache.close() method will not be able to return until the processResult is finished, since it tries to acquire the same lock. The second synchronized block in refresh(), around the statements "if-closed-return, getChildren()..." is for similar reasons: the if-closed in that block might be called before the closed state is set, while the getChildren might be launched afterwards, but by then zookeeper might already be closed as well. By putting it in a synchronized block, we are sure no ZooKeeper operation is going on after returning from the close() method, and CuratorFramework/ZooKeeper can be closed without causing errors to be logged. > Avoid stack traces closing PathChildrenCache followed by closing CuratorFramework > --------------------------------------------------------------------------------- > > Key: CURATOR-157 > URL: https://issues.apache.org/jira/browse/CURATOR-157 > Project: Apache Curator > Issue Type: Improvement > Reporter: Bruno Dumon > Assignee: Jordan Zimmerman > Attachments: LogProblemIllustration.java > > > When closing PathChildrenCache, and immediately afterwards closing CuratorFramework, some ERROR-level stack traces are logged. > This was previously reported on the mailing list: http://curator.markmail.org/thread/bmfr62ekx5p2vv7f > The cause is that the BackgroundCallback defined in PathChildrenCache.refresh() will, when triggered, perform some more ZooKeeper operations. > Thus one can get in sequences such as: > * operation with BackgroundCallback is submitted > * processResult of the BackgroundCallback is called > * PathChildrenCache is closed > * CuratorFramework is closed > * processResult, which is running on another thread, comes to the point it does operations on ZooKeeper, which fail because ZooKeeper is closed. > There is no real impact on the application, it is just for log-esthetical reasons that I'd like to avoid it. > In the more common case, the processResult will receive an IllegalStateException, which could be easily catched and ignored in PathChildrenCache if the PathChildrenCache is closed: > {noformat} > 14/10/30 11:24:51 ERROR org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception was not retry-able or retry gave up > java.lang.IllegalStateException: instance must be started before calling this method > at com.google.common.base.Preconditions.checkState(Preconditions.java:149) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.getData(CuratorFrameworkImpl.java:360) > at org.apache.curator.framework.recipes.cache.PathChildrenCache.getDataAndStat(PathChildrenCache.java:545) > at org.apache.curator.framework.recipes.cache.PathChildrenCache.processChildren(PathChildrenCache.java:668) > at org.apache.curator.framework.recipes.cache.PathChildrenCache.access$200(PathChildrenCache.java:68) > at org.apache.curator.framework.recipes.cache.PathChildrenCache$4.processResult(PathChildrenCache.java:490) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:715) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:502) > at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:166) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:590) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {noformat} > But sometimes it also fails with other async operations deeper down: > {noformat} > 14/10/30 11:24:51 ERROR org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception was not retry-able or retry gave up > java.lang.IllegalStateException: Client is not started > at com.google.common.base.Preconditions.checkState(Preconditions.java:149) > at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474) > at org.apache.curator.framework.imps.GetDataBuilderImpl.performBackgroundOperation(GetDataBuilderImpl.java:263) > at org.apache.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:65) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:789) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:487) > at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:275) > at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41) > at org.apache.curator.framework.recipes.cache.PathChildrenCache.getDataAndStat(PathChildrenCache.java:545) > at org.apache.curator.framework.recipes.cache.PathChildrenCache.processChildren(PathChildrenCache.java:668) > at org.apache.curator.framework.recipes.cache.PathChildrenCache.access$200(PathChildrenCache.java:68) > at org.apache.curator.framework.recipes.cache.PathChildrenCache$4.processResult(PathChildrenCache.java:490) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:715) > at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:502) > at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:166) > at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:590) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {noformat} > Therefore I have created a patch where PathChildrenCache.close() will wait until the possibly running BackgroundCallback is finished. > I will also attach a small class that illustrates the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)