curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jordan Zimmerman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-134) Curator sends a connection LOST event before sessionTimeout
Date Thu, 15 Jan 2015 22:27:35 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279400#comment-14279400
] 

Jordan Zimmerman commented on CURATOR-134:
------------------------------------------

LOST can also be generated from a session expiration which occurs on reconnect. It can happen
for a variety of reasons. The important event is SUSPENDED which occurs on the initial connection
loss. All Curator recipes, however, look for RECONNECTED to reset themselves. This is the
safest way to write a recipe. 

> Curator sends a connection LOST event before sessionTimeout
> -----------------------------------------------------------
>
>                 Key: CURATOR-134
>                 URL: https://issues.apache.org/jira/browse/CURATOR-134
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.6.0
>         Environment: Ubuntu 12.04
>            Reporter: Benjamin Jaton
>            Priority: Critical
>         Attachments: Test.java
>
>
> Created a Curator client with:
> - connection timeout: 10 seconds
> - session timeout: 30 seconds
> - retry policy: RetryNTimes(3, 10000)
> A scenario where the ensemble is lost produces the the curator client to send a LOST
event in less than the expected 30 seconds:
> Fri Aug 01 11:17:19 PDT 2014 - CURATOR STATE: SUSPENDED
> Fri Aug 01 11:17:29 PDT 2014 - CURATOR STATE: LOST
> The client code is attached, this is the complete output:
> Fri Aug 01 11:16:53 PDT 2014 - CURATOR STATE: CONNECTED
> Fri Aug 01 11:16:54 PDT 2014 - Creating ZK client...
> Fri Aug 01 11:16:54 PDT 2014 - ZK client created...
> Fri Aug 01 11:16:54 PDT 2014 - ZOOKEEPER STATE: SyncConnected
> Fri Aug 01 11:16:58 PDT 2014 - ZOOKEEPER STATE: Disconnected
> Fri Aug 01 11:16:58 PDT 2014 - CURATOR STATE: SUSPENDED
> Fri Aug 01 11:17:16 PDT 2014 - CURATOR STATE: RECONNECTED
> Fri Aug 01 11:17:17 PDT 2014 - ZOOKEEPER STATE: SyncConnected
> Fri Aug 01 11:17:19 PDT 2014 - ZOOKEEPER STATE: Disconnected
> Fri Aug 01 11:17:19 PDT 2014 - CURATOR STATE: SUSPENDED
> Fri Aug 01 11:17:29 PDT 2014 - CURATOR STATE: LOST
> I think that the LOST event is actually 30 seconds away from the very first SUSPENDED
event, whereas is should be 30 seconds away from the last one.
> To reproduce it, I started only 2 ZK servers in a 3 nodes ensembles, then I stopped one
of them (-> 1st SUSPENDED), waited for 10-20 seconds, then started it and stopped it again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message