Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 419EF200AE1 for ; Mon, 23 May 2016 05:25:17 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 40394160A28; Mon, 23 May 2016 03:25:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 88C0D160A06 for ; Mon, 23 May 2016 05:25:16 +0200 (CEST) Received: (qmail 82472 invoked by uid 500); 23 May 2016 03:25:15 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 82461 invoked by uid 99); 23 May 2016 03:25:15 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 May 2016 03:25:15 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id DEE022C1F56 for ; Mon, 23 May 2016 03:25:14 +0000 (UTC) Date: Mon, 23 May 2016 03:25:14 +0000 (UTC) From: "Jordan Zimmerman (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (CURATOR-328) PathChildrenCache fails silently if server is unavailable for sufficient time when client starts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 23 May 2016 03:25:17 -0000 [ https://issues.apache.org/jira/browse/CURATOR-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan Zimmerman resolved CURATOR-328. -------------------------------------- Resolution: Fixed > PathChildrenCache fails silently if server is unavailable for sufficient time when client starts > ------------------------------------------------------------------------------------------------ > > Key: CURATOR-328 > URL: https://issues.apache.org/jira/browse/CURATOR-328 > Project: Apache Curator > Issue Type: Bug > Components: Recipes > Affects Versions: 3.1.0, 2.10.0 > Reporter: Gerd Behrmann > Assignee: Jordan Zimmerman > Fix For: 2.10.1, 3.1.1 > > > When initializing the PathChildrenCache, if the curator client is not yet connected to the ZooKeeper server (e.g. the server is down or the network connection is unavailable), then the internal initialization of the cache will eventually fail silently and the cache stays empty even after the client finally connects to the server and the path is populated with znodes. > The following unit test demonstrates the problem (the unit test is ugly as the problem depends on timing, but it suffices to demonstrate the issue): > {code:java} > @Test > public void pathChildrenCacheTest() throws Exception > { > TestingServer server = new TestingServer(false); > Timing timing = new Timing(); > CuratorFramework client = CuratorFrameworkFactory.newClient( > server.getConnectString(), timing.session(), timing.connection(), new ExponentialBackoffRetry(1000, 3)); > try { > new Thread() { > @Override > public void run() > { > try { > Thread.sleep(60000); > server.start(); > } catch (Exception e) { > e.printStackTrace(); > } > } > }.start(); > client.start(); > PathChildrenCache cache = new PathChildrenCache(client, "/", true); > cache.start(); > client.blockUntilConnected(); > client.create().creatingParentContainersIfNeeded().forPath("/baz", new byte[] {1,2,3}); > assertNotNull("/baz does not exist", client.checkExists().forPath("/baz")); > /* Ugly hack for this test to ensure the cache got time to update itself. */ > Thread.sleep(1000); > assertNotNull("cache doesn't see /baz", cache.getCurrentData("/baz")); > } finally { > client.close(); > server.stop(); > } > } > {code} > Here the server startup is delayed until some point after the curator client was started and after the recipe has been created. Eventually the server starts and the path is populated with data - some time is given for the cache to update itself, yet no data is visible: The second assertion fails. > If the startup time is reduced to - say - 20 seconds, the test passes. > If the client is allowed to first connect to the server before creating the recipe and then disconnect and reconnect after creating the recipe, then the test passes too. > I tracked down the problem to the state change listener of the recipe: If the connection to the server is down for long enough, the refresh call during the background initialization will eventually fail (ensurePath throws an exception). This isn't a problem as the recipe has a state change listener, so it gets notified when the client eventually connects to the server. The handleStateChange method however doesn't react to a CONNECTED event - only to a RECONNECTED event. Thus if the client has been connected to the server in the past, everything works, however if this is the first time it connects, the recipe will not react to the event and thus not refresh itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)