Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 51011200B9C for ; Mon, 10 Oct 2016 20:00:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4FCE0160AE1; Mon, 10 Oct 2016 18:00:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 58753160AD1 for ; Mon, 10 Oct 2016 20:00:25 +0200 (CEST) Received: (qmail 15251 invoked by uid 500); 10 Oct 2016 18:00:24 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 14924 invoked by uid 99); 10 Oct 2016 18:00:24 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2016 18:00:24 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 36C022C0D55 for ; Mon, 10 Oct 2016 18:00:24 +0000 (UTC) Date: Mon, 10 Oct 2016 18:00:24 +0000 (UTC) From: "Jordan Zimmerman (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CURATOR-355) Curator client fails when connecting to read-only ensemble MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 10 Oct 2016 18:00:27 -0000 [ https://issues.apache.org/jira/browse/CURATOR-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562989#comment-15562989 ] Jordan Zimmerman edited comment on CURATOR-355 at 10/10/16 6:00 PM: -------------------------------------------------------------------- Firstly, the call to {{client.getZookeeperClient().blockUntilConnectedOrTimedOut();}} is unnecessary as Curator does this internally. Curator 3.0 has better connection timeout behavior than Curator 2.0. In 2.0, the connection timeout is applied for each iteration of the Retry Policy. So, in this case, you'd expect {{getData()}} to wait 15 seconds * 3, plus 5 seconds * 3 for a total of one minute. In my recreation of your test that's exactly what I see: {code} System.setProperty("readonlymode.enabled", "true"); TestingCluster cluster = new TestingCluster(3); cluster.getServers().get(0).stop(); cluster.getServers().get(1).stop(); CuratorFrameworkFactory.Builder curatorClientBuilder = CuratorFrameworkFactory.builder() .connectString(cluster.getConnectString()) .sessionTimeoutMs(45000).connectionTimeoutMs(15000) .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); CuratorFramework client = curatorClientBuilder.build(); client.start(); client.getZookeeperClient().blockUntilConnectedOrTimedOut(); System.out.println("Successfully established the connection with ZooKeeper"); client.getData().forPath("/"); System.out.println("Done."); {code} With Curator 3.0, the time improves to just 15 seconds * 2 - the connection timeout number twice. Once for the {{blockUntilConnectedOrTimedOut()}} and once for the {{getData()}}. Note: {{blockUntilConnectedOrTimedOut()}} in all cases would've returned {{false}} implying you should not continue. was (Author: randgalt): Firstly, the call to {code}client.getZookeeperClient().blockUntilConnectedOrTimedOut();{code} is unnecessary as Curator does this internally. Curator 3.0 has better connection timeout behavior than Curator 2.0. In 2.0, the connection timeout is applied for each iteration of the Retry Policy. So, in this case, you'd expect {code}getData(){code} to wait 15 seconds * 3, plus 5 seconds * 3 for a total of one minute. In my recreation of your test that's exactly what I see: {code} System.setProperty("readonlymode.enabled", "true"); TestingCluster cluster = new TestingCluster(3); cluster.getServers().get(0).stop(); cluster.getServers().get(1).stop(); CuratorFrameworkFactory.Builder curatorClientBuilder = CuratorFrameworkFactory.builder() .connectString(cluster.getConnectString()) .sessionTimeoutMs(45000).connectionTimeoutMs(15000) .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); CuratorFramework client = curatorClientBuilder.build(); client.start(); client.getZookeeperClient().blockUntilConnectedOrTimedOut(); System.out.println("Successfully established the connection with ZooKeeper"); client.getData().forPath("/"); System.out.println("Done."); {code} With Curator 3.0, the time improves to just 15 seconds * 2 - the connection timeout number twice. Once for the {code}blockUntilConnectedOrTimedOut(){code} and once for the {code}getData(){code}. Note: {code}blockUntilConnectedOrTimedOut(){code} in all cases would've returned {code}false{code} implying you should not continue. > Curator client fails when connecting to read-only ensemble > ---------------------------------------------------------- > > Key: CURATOR-355 > URL: https://issues.apache.org/jira/browse/CURATOR-355 > Project: Apache Curator > Issue Type: Bug > Components: Client > Affects Versions: 2.11.0 > Reporter: Benjamin Jaton > Priority: Critical > Attachments: test2.log > > > ZK is 3.5.1-alpha > I have a 3 nodes ZK cluster , readonly mode is enabled. > 2 nodes are down, so one of them (QA-E8WIN11) is in read-only (verified by using the ZK API manually). All the machines of the ensemble can be pinged from the client. > I'm using this piece of code: > {code} > Builder curatorClientBuilder = CuratorFrameworkFactory.builder() > .connectString("QA-E8WIN11:2181,QA-E8WIN12:2181") > .sessionTimeoutMs(45000).connectionTimeoutMs(15000) > .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true); > CuratorFramework client = curatorClientBuilder.build(); > client.start(); > client.getZookeeperClient().blockUntilConnectedOrTimedOut(); > System.out.println("Successfully established the connection with ZooKeeper"); > > client.getData().forPath("/"); > System.out.println("Done.");{code} > When curator pick the host that is UP first, it goes through very quickly. When it picks the host that is down first (QA-E8WIN12), it seems to be stuck at the getData() call for a very long time, and then eventually fail with a ConnectionLossException. (see attached log) -- This message was sent by Atlassian JIRA (v6.3.4#6332)