Mailing-List: contact user-help@curator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@curator.apache.org
Received-SPF: softfail (nike.apache.org: transitioning domain of
 bjaton@radiantlogic.com does not designate 209.85.218.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CABzk7oo6u67artf4ff9cs6ecMHrHazg2-7zgXPqPpczUpnja4g@mail.gmail.com>
References: 
 <CABzk7oo6u67artf4ff9cs6ecMHrHazg2-7zgXPqPpczUpnja4g@mail.gmail.com>
Date: Wed, 14 Jan 2015 17:09:44 -0800
Message-ID: 
 <CABzk7opoCqW4w-XAsaNQB9heQm-Ppx-WMYRSqLi2DaoPLy8rRA@mail.gmail.com>
Subject: Re: Curator connection states
From: Benjamin Jaton <bjaton@radiantlogic.com>
To: user@curator.apache.org
Content-Type: multipart/alternative; boundary=e89a8fb1fa7ef9674f050ca682ec

--e89a8fb1fa7ef9674f050ca682ec
Content-Type: text/plain; charset=UTF-8

Some of the comment in https://issues.apache.org/jira/browse/CURATOR-134
are interesting.

Apparently having a LOST event doesn't mean that the session has timed out.

The doc says (http://curator.apache.org/errors.html) :
"The connection is confirmed to be lost. Close any locks, leaders, etc. and
attempt to re-create them. NOTE: it is possible to get a RECONNECTED state
after this but you should still consider any locks, etc. as dirty/unstable."

But then in some cases we are going to recover our previous session after
we received the LOST event.
If that's the case, then the LOST event isn't as useful as I thought it was.

What I would like would be an event on the session loss. Is there any way
to do this?

Also is there a way to be notified of when Curator stops retrying for good?

Thanks,
Ben


On Wed, Jan 14, 2015 at 4:28 PM, Benjamin Jaton <bjaton@radiantlogic.com>
wrote:

> Hello,
>
> I am running some simple tests around the connection state listener
> behavior.
> I use a regular 3 nodes ensemble, 1 of them being down, I start/stop a
> second one to trigger an outage of the ensemble.
>
> I use:
> - connection timeout : 18 seconds
> - session timeout : 72 seconds
> - retry interval : 5 seconds
>
> Case 0: there is no retry:
> - the switch SUSPENDED -> LOST takes less than a second
> - the background retry goes on for 18 seconds
>
> Case 1: there is 1 retry:
> - the switch SUSPENDED -> LOST takes 7 seconds
> - the background retry goes on for 41 seconds
>
> Case 2: there is 2 retries:
> - the switch SUSPENDED -> LOST takes 12 seconds
> - the background retry goes on for 64 seconds
>
> I expected to see the same numbers, i.e. I thought that we received a LOST
> event when Curator gave up trying.
>
> But apparently the duration of the background retries is this:
> *connectionTimeout * nbRetries + retryInterval * max(0, nbRetries-1)*
>
> Why is it linked to the connectionTimeout since the connection fails
> before that (case 0, 1 and 2 all go into LOST state in less than 18 seconds)
>
> According to http://curator.apache.org/errors.html , LOST means that "the
> connection is confirmed to be lost."
> So a LOST state is when I lose my ephemeral nodes (for example).
> Is that correct?
>
> Then I am wondering why it would be different whether we have 0, 1 or 2
> retries?
>
> Thanks for your insights,
> Benjamin
>
>
>

--e89a8fb1fa7ef9674f050ca682ec
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Some of the comment in=C2=A0<a href=3D"https://issues.apac=
he.org/jira/browse/CURATOR-134">https://issues.apache.org/jira/browse/CURAT=
OR-134</a> are interesting.<div><br><div>Apparently having a LOST event doe=
sn&#39;t mean that the session has timed out.</div><div><br></div><div>The =
doc says (<a href=3D"http://curator.apache.org/errors.html">http://curator.=
apache.org/errors.html</a>) :</div><div>&quot;The connection is confirmed t=
o be lost. Close any locks, leaders, etc. and attempt to re-create them. NO=
TE: it is possible to get a RECONNECTED state after this but you should sti=
ll consider any locks, etc. as dirty/unstable.&quot;</div><div><br></div><d=
iv>But then in some cases we are going to recover our previous session afte=
r we received the LOST event.</div><div>If that&#39;s the case, then the LO=
ST event isn&#39;t as useful as I thought it was.</div><div><br></div><div>=
What I would like would be an event on the session loss. Is there any way t=
o do this?</div><div><br></div><div>Also is there a way to be notified of w=
hen Curator stops retrying for good?</div><div><br></div><div>Thanks,</div>=
<div>Ben</div><div><br></div><div><br></div><div><br></div><div><br></div><=
div><br></div><div><br></div></div></div><div class=3D"gmail_extra"><br><di=
v class=3D"gmail_quote">On Wed, Jan 14, 2015 at 4:28 PM, Benjamin Jaton <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:bjaton@radiantlogic.com" target=3D"_bl=
ank">bjaton@radiantlogic.com</a>&gt;</span> wrote:<br><blockquote class=3D"=
gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-=
left:1ex"><div dir=3D"ltr">Hello,<div><br></div><div>I am running some simp=
le tests around the connection state listener behavior.</div><div>I use a r=
egular 3 nodes ensemble, 1 of them being down, I start/stop a second one to=
 trigger an outage of the ensemble.</div><div><br></div><div>I use:</div><d=
iv>- connection timeout : 18 seconds</div><div>- session timeout : 72 secon=
ds</div><div>- retry interval : 5 seconds</div><div><br></div><div>Case 0: =
there is no retry:</div><div>- the switch SUSPENDED -&gt; LOST takes less t=
han a second</div><div>- the background retry goes on for 18 seconds</div><=
div><br></div><div>Case 1: there is 1 retry:</div><div>- the switch SUSPEND=
ED -&gt; LOST takes 7 seconds</div><div>- the background retry goes on for =
41 seconds</div><div><br></div><div>Case 2: there is 2 retries:</div><div>-=
 the switch SUSPENDED -&gt; LOST takes 12 seconds</div><div>- the backgroun=
d retry goes on for 64 seconds</div><div><div><div><br>I expected to see th=
e same numbers, i.e. I thought that we received a LOST event when Curator g=
ave up trying.</div><div><br></div><div>But apparently the duration of the =
background retries is this:<br></div><div><i>connectionTimeout * nbRetries =
+ retryInterval * max(0, nbRetries-1)</i></div><div><i><br></i></div><div>W=
hy is it linked to the connectionTimeout since the connection fails before =
that (case 0, 1 and 2 all go into LOST state in less than 18 seconds)</div>=
</div></div><div><br></div><div>According to=C2=A0<a href=3D"http://curator=
.apache.org/errors.html" target=3D"_blank">http://curator.apache.org/errors=
.html</a> , LOST means that &quot;the connection is confirmed to be lost.&q=
uot;</div><div>So a LOST state is when I lose my ephemeral nodes (for examp=
le).</div><div>Is that correct?</div><div><br></div><div>Then I am wonderin=
g why it would be different whether we have 0, 1 or 2 retries?</div><div><d=
iv><br></div></div><div>Thanks for your insights,</div><div>Benjamin</div><=
div><br></div><div><br></div></div>
</blockquote></div><br></div>

--e89a8fb1fa7ef9674f050ca682ec--