curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jordan Zimmerman (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CURATOR-355) Curator client fails when connecting to read-only ensemble
Date Mon, 10 Oct 2016 18:00:24 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562989#comment-15562989
] 

Jordan Zimmerman edited comment on CURATOR-355 at 10/10/16 6:00 PM:
--------------------------------------------------------------------

Firstly, the call to {{client.getZookeeperClient().blockUntilConnectedOrTimedOut();}} is unnecessary
as Curator does this internally. 

Curator 3.0 has better connection timeout behavior than Curator 2.0. In 2.0, the connection
timeout is applied for each iteration of the Retry Policy. So, in this case, you'd expect
{{getData()}} to wait 15 seconds * 3, plus 5 seconds * 3 for a total of one minute. In my
recreation of your test that's exactly what I see:

{code}
        System.setProperty("readonlymode.enabled", "true");
        TestingCluster cluster = new TestingCluster(3);
        cluster.getServers().get(0).stop();
        cluster.getServers().get(1).stop();

        CuratorFrameworkFactory.Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
            .connectString(cluster.getConnectString())
            .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
            .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);

        CuratorFramework client = curatorClientBuilder.build();
        client.start();
        client.getZookeeperClient().blockUntilConnectedOrTimedOut();
        System.out.println("Successfully established the connection with ZooKeeper");

        client.getData().forPath("/");
        System.out.println("Done.");
{code}

With Curator 3.0, the time improves to just 15 seconds * 2 - the connection timeout number
twice. Once for the {{blockUntilConnectedOrTimedOut()}} and once for the {{getData()}}. Note:
{{blockUntilConnectedOrTimedOut()}} in all cases would've returned {{false}} implying you
should not continue.


was (Author: randgalt):
Firstly, the call to {code}client.getZookeeperClient().blockUntilConnectedOrTimedOut();{code}
is unnecessary as Curator does this internally. 

Curator 3.0 has better connection timeout behavior than Curator 2.0. In 2.0, the connection
timeout is applied for each iteration of the Retry Policy. So, in this case, you'd expect
{code}getData(){code} to wait 15 seconds * 3, plus 5 seconds * 3 for a total of one minute.
In my recreation of your test that's exactly what I see:

{code}
        System.setProperty("readonlymode.enabled", "true");
        TestingCluster cluster = new TestingCluster(3);
        cluster.getServers().get(0).stop();
        cluster.getServers().get(1).stop();

        CuratorFrameworkFactory.Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
            .connectString(cluster.getConnectString())
            .sessionTimeoutMs(45000).connectionTimeoutMs(15000)
            .retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);

        CuratorFramework client = curatorClientBuilder.build();
        client.start();
        client.getZookeeperClient().blockUntilConnectedOrTimedOut();
        System.out.println("Successfully established the connection with ZooKeeper");

        client.getData().forPath("/");
        System.out.println("Done.");
{code}

With Curator 3.0, the time improves to just 15 seconds * 2 - the connection timeout number
twice. Once for the {code}blockUntilConnectedOrTimedOut(){code} and once for the {code}getData(){code}.
Note: {code}blockUntilConnectedOrTimedOut(){code} in all cases would've returned {code}false{code}
implying you should not continue.

> Curator client fails when connecting to read-only ensemble
> ----------------------------------------------------------
>
>                 Key: CURATOR-355
>                 URL: https://issues.apache.org/jira/browse/CURATOR-355
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.11.0
>            Reporter: Benjamin Jaton
>            Priority: Critical
>         Attachments: test2.log
>
>
> ZK is 3.5.1-alpha
> I have a 3 nodes ZK cluster , readonly mode is enabled.
> 2 nodes are down, so one of them (QA-E8WIN11) is in read-only (verified by using the
ZK API manually). All the machines of the ensemble can be pinged from the client.
> I'm using this piece of code:
> {code}
> 		Builder curatorClientBuilder = CuratorFrameworkFactory.builder()
> 				.connectString("QA-E8WIN11:2181,QA-E8WIN12:2181")
> 				.sessionTimeoutMs(45000).connectionTimeoutMs(15000)
> 				.retryPolicy(new RetryNTimes(3, 5000)).canBeReadOnly(true);
> 		CuratorFramework client = curatorClientBuilder.build();
> 		client.start();
> 		client.getZookeeperClient().blockUntilConnectedOrTimedOut();
> 		System.out.println("Successfully established the connection with ZooKeeper");
> 		
> 		client.getData().forPath("/");
> 		System.out.println("Done.");{code}
> When curator pick the host that is UP first, it goes through very quickly. When it picks
the host that is down first (QA-E8WIN12), it seems to be stuck at the getData() call for a
very long time, and then eventually fail with a ConnectionLossException. (see attached log)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message