Mailing-List: contact dev-help@curator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@curator.apache.org
Date: Tue, 12 Aug 2014 16:43:11 +0000 (UTC)
From: "Benjamin Jaton (JIRA)" <jira@apache.org>
To: dev@curator.apache.org
Message-ID: <JIRA.12731378.1406917643065.64396.1407861791632@arcas>
In-Reply-To: <JIRA.12731378.1406917643065@arcas>
References: <JIRA.12731378.1406917643065@arcas>
Subject: [jira] [Commented] (CURATOR-134) Curator sends a connection LOST
 event before sessionTimeout
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CURATOR-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094292#comment-14094292 ] 

Benjamin Jaton commented on CURATOR-134:
----------------------------------------

Regardless of the sessionTimeout, with a RetryNTimes(3,10000) retry policy, I think we shouldn't have a LOST event 13 seconds after the last RECONNECTED event, it should be >= 30 seconds.

> Curator sends a connection LOST event before sessionTimeout
> -----------------------------------------------------------
>
>                 Key: CURATOR-134
>                 URL: https://issues.apache.org/jira/browse/CURATOR-134
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.6.0
>         Environment: Ubuntu 12.04
>            Reporter: Benjamin Jaton
>            Priority: Critical
>         Attachments: Test.java
>
>
> Created a Curator client with:
> - connection timeout: 10 seconds
> - session timeout: 30 seconds
> - retry policy: RetryNTimes(3, 10000)
> A scenario where the ensemble is lost produces the the curator client to send a LOST event in less than the expected 30 seconds:
> Fri Aug 01 11:17:19 PDT 2014 - CURATOR STATE: SUSPENDED
> Fri Aug 01 11:17:29 PDT 2014 - CURATOR STATE: LOST
> The client code is attached, this is the complete output:
> Fri Aug 01 11:16:53 PDT 2014 - CURATOR STATE: CONNECTED
> Fri Aug 01 11:16:54 PDT 2014 - Creating ZK client...
> Fri Aug 01 11:16:54 PDT 2014 - ZK client created...
> Fri Aug 01 11:16:54 PDT 2014 - ZOOKEEPER STATE: SyncConnected
> Fri Aug 01 11:16:58 PDT 2014 - ZOOKEEPER STATE: Disconnected
> Fri Aug 01 11:16:58 PDT 2014 - CURATOR STATE: SUSPENDED
> Fri Aug 01 11:17:16 PDT 2014 - CURATOR STATE: RECONNECTED
> Fri Aug 01 11:17:17 PDT 2014 - ZOOKEEPER STATE: SyncConnected
> Fri Aug 01 11:17:19 PDT 2014 - ZOOKEEPER STATE: Disconnected
> Fri Aug 01 11:17:19 PDT 2014 - CURATOR STATE: SUSPENDED
> Fri Aug 01 11:17:29 PDT 2014 - CURATOR STATE: LOST
> I think that the LOST event is actually 30 seconds away from the very first SUSPENDED event, whereas is should be 30 seconds away from the last one.
> To reproduce it, I started only 2 ZK servers in a 3 nodes ensembles, then I stopped one of them (-> 1st SUSPENDED), waited for 10-20 seconds, then started it and stopped it again.


--
This message was sent by Atlassian JIRA
(v6.2#6252)