curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Jaton <bja...@radiantlogic.com>
Subject Re: Curator connection states
Date Thu, 15 Jan 2015 01:09:44 GMT
Some of the comment in https://issues.apache.org/jira/browse/CURATOR-134
are interesting.

Apparently having a LOST event doesn't mean that the session has timed out.

The doc says (http://curator.apache.org/errors.html) :
"The connection is confirmed to be lost. Close any locks, leaders, etc. and
attempt to re-create them. NOTE: it is possible to get a RECONNECTED state
after this but you should still consider any locks, etc. as dirty/unstable."

But then in some cases we are going to recover our previous session after
we received the LOST event.
If that's the case, then the LOST event isn't as useful as I thought it was.

What I would like would be an event on the session loss. Is there any way
to do this?

Also is there a way to be notified of when Curator stops retrying for good?

Thanks,
Ben







On Wed, Jan 14, 2015 at 4:28 PM, Benjamin Jaton <bjaton@radiantlogic.com>
wrote:

> Hello,
>
> I am running some simple tests around the connection state listener
> behavior.
> I use a regular 3 nodes ensemble, 1 of them being down, I start/stop a
> second one to trigger an outage of the ensemble.
>
> I use:
> - connection timeout : 18 seconds
> - session timeout : 72 seconds
> - retry interval : 5 seconds
>
> Case 0: there is no retry:
> - the switch SUSPENDED -> LOST takes less than a second
> - the background retry goes on for 18 seconds
>
> Case 1: there is 1 retry:
> - the switch SUSPENDED -> LOST takes 7 seconds
> - the background retry goes on for 41 seconds
>
> Case 2: there is 2 retries:
> - the switch SUSPENDED -> LOST takes 12 seconds
> - the background retry goes on for 64 seconds
>
> I expected to see the same numbers, i.e. I thought that we received a LOST
> event when Curator gave up trying.
>
> But apparently the duration of the background retries is this:
> *connectionTimeout * nbRetries + retryInterval * max(0, nbRetries-1)*
>
> Why is it linked to the connectionTimeout since the connection fails
> before that (case 0, 1 and 2 all go into LOST state in less than 18 seconds)
>
> According to http://curator.apache.org/errors.html , LOST means that "the
> connection is confirmed to be lost."
> So a LOST state is when I lose my ephemeral nodes (for example).
> Is that correct?
>
> Then I am wondering why it would be different whether we have 0, 1 or 2
> retries?
>
> Thanks for your insights,
> Benjamin
>
>
>

Mime
View raw message