Return-Path: X-Original-To: apmail-curator-user-archive@minotaur.apache.org Delivered-To: apmail-curator-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4ECC21097B for ; Thu, 15 Jan 2015 01:10:56 +0000 (UTC) Received: (qmail 88969 invoked by uid 500); 15 Jan 2015 01:10:58 -0000 Delivered-To: apmail-curator-user-archive@curator.apache.org Received: (qmail 88922 invoked by uid 500); 15 Jan 2015 01:10:58 -0000 Mailing-List: contact user-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@curator.apache.org Delivered-To: mailing list user@curator.apache.org Received: (qmail 88912 invoked by uid 99); 15 Jan 2015 01:10:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2015 01:10:57 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of bjaton@radiantlogic.com does not designate 209.85.218.44 as permitted sender) Received: from [209.85.218.44] (HELO mail-oi0-f44.google.com) (209.85.218.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2015 01:10:32 +0000 Received: by mail-oi0-f44.google.com with SMTP id a141so10201494oig.3 for ; Wed, 14 Jan 2015 17:09:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=zhTCO8IT2+kw76kc8HMiDfD+dRewZ4hUX/sSLwWWUPk=; b=adbZXDl3yfX15GcobvlCasNzFv+AnGHwC1nZnscHYNVr8PPtg16lzgngpjq9Ka0C9r +LY1jk0Tfr+QxKIT42L0YMpyKophaNkAUvlErIrFrQiv7kZQ8R3lE2aYyELqzOj4qJ+u MKwPebrWXm9jzZrfdcsORw+s7rtGbUQ5pYqtqmZQgtWnKMpWhr+S6atsn2MUem+CYDVN rpRL+zsZgo4JLVyzfAoIcqdvYXyWRrnYTFHpdOL2H1QRoqPwM5/Qg3oP3JdbSu2OWV32 Sex7Dv+kfC8hzYlii2eo94qOfek7ph/9vtfgDwPymQBfr3MwDwIOt9SsdXoExoYVdk4w nNgg== X-Gm-Message-State: ALoCoQlVTS8Xrlo3Ufh+r3X6MzzsEI2EEDO7F+KdxR8L8ylWmO73viZGWOyzn1bjgVZuR973+k9gYxqJE/C7intPsBa+DeFhW4/emXD0lOTq5e/sNSUAqdy3XajJXfbG3TlawC8MAgOhv1uY+VZ1J/sSOcoL095dhLQVbMKb5V2iQhDJ9LwmnGF0IpCtMRDcnoJ2gmmRE0fy MIME-Version: 1.0 X-Received: by 10.182.71.73 with SMTP id s9mr4374499obu.15.1421284185032; Wed, 14 Jan 2015 17:09:45 -0800 (PST) Received: by 10.202.205.206 with HTTP; Wed, 14 Jan 2015 17:09:44 -0800 (PST) In-Reply-To: References: Date: Wed, 14 Jan 2015 17:09:44 -0800 Message-ID: Subject: Re: Curator connection states From: Benjamin Jaton To: user@curator.apache.org Content-Type: multipart/alternative; boundary=e89a8fb1fa7ef9674f050ca682ec X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb1fa7ef9674f050ca682ec Content-Type: text/plain; charset=UTF-8 Some of the comment in https://issues.apache.org/jira/browse/CURATOR-134 are interesting. Apparently having a LOST event doesn't mean that the session has timed out. The doc says (http://curator.apache.org/errors.html) : "The connection is confirmed to be lost. Close any locks, leaders, etc. and attempt to re-create them. NOTE: it is possible to get a RECONNECTED state after this but you should still consider any locks, etc. as dirty/unstable." But then in some cases we are going to recover our previous session after we received the LOST event. If that's the case, then the LOST event isn't as useful as I thought it was. What I would like would be an event on the session loss. Is there any way to do this? Also is there a way to be notified of when Curator stops retrying for good? Thanks, Ben On Wed, Jan 14, 2015 at 4:28 PM, Benjamin Jaton wrote: > Hello, > > I am running some simple tests around the connection state listener > behavior. > I use a regular 3 nodes ensemble, 1 of them being down, I start/stop a > second one to trigger an outage of the ensemble. > > I use: > - connection timeout : 18 seconds > - session timeout : 72 seconds > - retry interval : 5 seconds > > Case 0: there is no retry: > - the switch SUSPENDED -> LOST takes less than a second > - the background retry goes on for 18 seconds > > Case 1: there is 1 retry: > - the switch SUSPENDED -> LOST takes 7 seconds > - the background retry goes on for 41 seconds > > Case 2: there is 2 retries: > - the switch SUSPENDED -> LOST takes 12 seconds > - the background retry goes on for 64 seconds > > I expected to see the same numbers, i.e. I thought that we received a LOST > event when Curator gave up trying. > > But apparently the duration of the background retries is this: > *connectionTimeout * nbRetries + retryInterval * max(0, nbRetries-1)* > > Why is it linked to the connectionTimeout since the connection fails > before that (case 0, 1 and 2 all go into LOST state in less than 18 seconds) > > According to http://curator.apache.org/errors.html , LOST means that "the > connection is confirmed to be lost." > So a LOST state is when I lose my ephemeral nodes (for example). > Is that correct? > > Then I am wondering why it would be different whether we have 0, 1 or 2 > retries? > > Thanks for your insights, > Benjamin > > > --e89a8fb1fa7ef9674f050ca682ec Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Some of the comment in=C2=A0https://issues.apache.org/jira/browse/CURAT= OR-134 are interesting.

Apparently having a LOST event doe= sn't mean that the session has timed out.

"The connection is confirmed t= o be lost. Close any locks, leaders, etc. and attempt to re-create them. NO= TE: it is possible to get a RECONNECTED state after this but you should sti= ll consider any locks, etc. as dirty/unstable."

But then in some cases we are going to recover our previous session afte= r we received the LOST event.
If that's the case, then the LO= ST event isn't as useful as I thought it was.

= What I would like would be an event on the session loss. Is there any way t= o do this?

Also is there a way to be notified of w= hen Curator stops retrying for good?

Thanks,
=
Ben




<= div>


On Wed, Jan 14, 2015 at 4:28 PM, Benjamin Jaton <bjaton@radiantlogic.com> wrote:
Hello,

I am running some simp= le tests around the connection state listener behavior.
I use a r= egular 3 nodes ensemble, 1 of them being down, I start/stop a second one to= trigger an outage of the ensemble.

I use:
- connection timeout : 18 seconds
- session timeout : 72 secon= ds
- retry interval : 5 seconds

Case 0: = there is no retry:
- the switch SUSPENDED -> LOST takes less t= han a second
- the background retry goes on for 18 seconds
<= div>
Case 1: there is 1 retry:
- the switch SUSPEND= ED -> LOST takes 7 seconds
- the background retry goes on for = 41 seconds

Case 2: there is 2 retries:
-= the switch SUSPENDED -> LOST takes 12 seconds
- the backgroun= d retry goes on for 64 seconds

I expected to see th= e same numbers, i.e. I thought that we received a LOST event when Curator g= ave up trying.

But apparently the duration of the = background retries is this:
connectionTimeout * nbRetries = + retryInterval * max(0, nbRetries-1)

W= hy is it linked to the connectionTimeout since the connection fails before = that (case 0, 1 and 2 all go into LOST state in less than 18 seconds)
=

According to=C2=A0http://curator.apache.org/errors= .html , LOST means that "the connection is confirmed to be lost.&q= uot;
So a LOST state is when I lose my ephemeral nodes (for examp= le).
Is that correct?

Then I am wonderin= g why it would be different whether we have 0, 1 or 2 retries?

Thanks for your insights,
Benjamin
<= div>


--e89a8fb1fa7ef9674f050ca682ec--