curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evaristo (JIRA)" <>
Subject [jira] [Commented] (CURATOR-72) Background operations don't wait for connection timeout
Date Fri, 17 Jan 2014 10:54:28 GMT


Evaristo commented on CURATOR-72:

Hi there:

I have been comparing code in CURATOR-72 (current code) and CURATOR 2.1.0, and these are my

- In both releases  there is race condition between background and foreground errors  and
KeeperState events that can break the ConnectionState listener statusm (as demonstrated by
attached tests
- CURATOR-72 (I think from Curator 2.3.0) there is an issue and the number of retries in some
cases is huge and out of control and therefore the previous race condition is much more evident.
(Can be also seen with attached tests using LeaderElection)

So in fact there are 2 issues here. The first one (the race condition I think was already
there before 2.3) and the second one is new.

On the other hand, I suggest to have a clear defintion of the different values for ConnectionState
and when they are triggered, because implementations in CUARTOR 2.1 and CURATOR 72 are different
and in my opinion they follow a different specification (e.g. how ConnectionLoss exceptions
are managed)

In CURATOR-72 this is the logic:
- CONNECTED AND RECONNECTED are clear for me (they only depend on KeepState events)
- SUSPENDED can be triggered by KeeperState.Disconnected event or by a ConnectionLoss or OperationTimeout
operation (whatever happens first)
- LOST can be triggered directly by KeeperState.Expired and SessionExpired exceptions and
when received a ConnectionLost or OperationTimeout a background operation is started to check
if it is possible to connect to other ZK server and in that case LOST is also triggered

In CURATOR-2.1.0 the logic is:
- CONNECTED AND RECONNECTED are clear for me (they only depend on KeepState events)
- SUSPENDED can be only triggered by KeeperState.Disconnected
- LOST is triggered by any ConnectionLoss exception

I am trying to provide a patch for the code but I am struggling, but if we agree on definition
I can help with test cases

> Background operations don't wait for connection timeout
> -------------------------------------------------------
>                 Key: CURATOR-72
>                 URL:
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 2.3.0
>            Reporter: Evaristo Camarero
>            Assignee: Jordan Zimmerman
>             Fix For: 2.4.0
>         Attachments:,,,,,
> Background operations don't wait for the configured connection timeout before failing.
Attached test shows the problem.

This message was sent by Atlassian JIRA

View raw message