Mailing-List: contact dev-help@curator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@curator.apache.org
Date: Mon, 10 Oct 2016 18:00:24 +0000 (UTC)
From: "Jordan Zimmerman (JIRA)" <jira@apache.org>
To: dev@curator.apache.org
Message-ID: <JIRA.13011111.1476119370000.784731.1476122424222@Atlassian.JIRA>
In-Reply-To: <JIRA.13011111.1476119370000@Atlassian.JIRA>
References: <JIRA.13011111.1476119370000@Atlassian.JIRA> <JIRA.13011111.1476119370369@arcas>
Subject: [jira] [Comment Edited] (CURATOR-355) Curator client fails when
 connecting to read-only ensemble
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 10 Oct 2016 18:00:27 -0000


    [ https://issues.apache.org/jira/browse/CURATOR-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562989#comment-15562989 ] 

Jordan Zimmerman edited comment on CURATOR-355 at 10/10/16 6:00 PM:
--------------------------------------------------------------------

Firstly, the call to {{client.getZookeeperClient().blockUntilConnectedOrTimedOut();}} is unnecessary as Curator does this internally. 

Curator 3.0 has better connection timeout behavior than Curator 2.0. In 2.0, the connection timeout is applied for each iteration of the Retry Policy. So, in this case, you'd expect {{getData()}} to wait 15 seconds * 3, plus 5 seconds * 3 for a total of one minute. In my recreation of your test that's exactly what I see:

{code}
        System.setProperty("readonlymode.enabled", "true");
        TestingCluster cluster = new TestingCluster(3);
        cluster.getServers().get(0).stop();
        cluster.getServers().get(1).stop();

        CuratorFrameworkFactory.Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
            .connectString(cluster.getConnectString())
            .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
            .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);

        CuratorFramework client = curatorClientBuilder.build();
        client.start();
        client.getZookeeperClient().blockUntilConnectedOrTimedOut();
        System.out.println("Successfully established the connection with ZooKeeper");

        client.getData().forPath("/");
        System.out.println("Done.");
{code}

With Curator 3.0, the time improves to just 15 seconds * 2 - the connection timeout number twice. Once for the {{blockUntilConnectedOrTimedOut()}} and once for the {{getData()}}. Note: {{blockUntilConnectedOrTimedOut()}} in all cases would've returned {{false}} implying you should not continue.


was (Author: randgalt):
Firstly, the call to {code}client.getZookeeperClient().blockUntilConnectedOrTimedOut();{code} is unnecessary as Curator does this internally. 

Curator 3.0 has better connection timeout behavior than Curator 2.0. In 2.0, the connection timeout is applied for each iteration of the Retry Policy. So, in this case, you'd expect {code}getData(){code} to wait 15 seconds * 3, plus 5 seconds * 3 for a total of one minute. In my recreation of your test that's exactly what I see:

{code}
        System.setProperty("readonlymode.enabled", "true");
        TestingCluster cluster = new TestingCluster(3);
        cluster.getServers().get(0).stop();
        cluster.getServers().get(1).stop();

        CuratorFrameworkFactory.Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
            .connectString(cluster.getConnectString())
            .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
            .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);

        CuratorFramework client = curatorClientBuilder.build();
        client.start();
        client.getZookeeperClient().blockUntilConnectedOrTimedOut();
        System.out.println("Successfully established the connection with ZooKeeper");

        client.getData().forPath("/");
        System.out.println("Done.");
{code}

With Curator 3.0, the time improves to just 15 seconds * 2 - the connection timeout number twice. Once for the {code}blockUntilConnectedOrTimedOut(){code} and once for the {code}getData(){code}. Note: {code}blockUntilConnectedOrTimedOut(){code} in all cases would've returned {code}false{code} implying you should not continue.

> Curator client fails when connecting to read-only ensemble
> ----------------------------------------------------------
>
>                 Key: CURATOR-355
>                 URL: https://issues.apache.org/jira/browse/CURATOR-355
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.11.0
>            Reporter: Benjamin Jaton
>            Priority: Critical
>         Attachments: test2.log
>
>
> ZK is 3.5.1-alpha
> I have a 3 nodes ZK cluster , readonly mode is enabled.
> 2 nodes are down, so one of them (QA-E8WIN11) is in read-only (verified by using the ZK API manually). All the machines of the ensemble can be pinged from the client.
> I'm using this piece of code:
> {code}
> 		Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
> 				.connectString("QA-E8WIN11:2181,QA-E8WIN12:2181")
> 				.sessionTimeoutMs(45000).connectionTimeoutMs(15000)
> 				.retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);
> 		CuratorFramework client = curatorClientBuilder.build();
> 		client.start();
> 		client.getZookeeperClient().blockUntilConnectedOrTimedOut();
> 		System.out.println("Successfully established the connection with ZooKeeper");
> 		
> 		client.getData().forPath("/");
> 		System.out.println("Done.");{code}
> When curator pick the host that is UP first, it goes through very quickly. When it picks the host that is down first (QA-E8WIN12), it seems to be stuck at the getData() call for a very long time, and then eventually fail with a ConnectionLossException. (see attached log)


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)