Return-Path: X-Original-To: apmail-curator-user-archive@minotaur.apache.org Delivered-To: apmail-curator-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7CE7B18851 for ; Wed, 18 Nov 2015 23:23:36 +0000 (UTC) Received: (qmail 46691 invoked by uid 500); 18 Nov 2015 23:23:36 -0000 Delivered-To: apmail-curator-user-archive@curator.apache.org Received: (qmail 46649 invoked by uid 500); 18 Nov 2015 23:23:36 -0000 Mailing-List: contact user-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@curator.apache.org Delivered-To: mailing list user@curator.apache.org Received: (qmail 46640 invoked by uid 99); 18 Nov 2015 23:23:36 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Nov 2015 23:23:36 +0000 Received: from mail-yk0-f176.google.com (mail-yk0-f176.google.com [209.85.160.176]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id E29FC1A0288 for ; Wed, 18 Nov 2015 23:23:35 +0000 (UTC) Received: by ykdr82 with SMTP id r82so88062141ykd.3 for ; Wed, 18 Nov 2015 15:23:35 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.129.153.198 with SMTP id q189mr4805375ywg.337.1447889015092; Wed, 18 Nov 2015 15:23:35 -0800 (PST) Reply-To: cammckenzie@apache.org Received: by 10.37.208.76 with HTTP; Wed, 18 Nov 2015 15:23:35 -0800 (PST) In-Reply-To: References: Date: Thu, 19 Nov 2015 10:23:35 +1100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Query on SESSION_LOST (3.0.0) From: Cameron McKenzie To: Vikrant Singh Cc: user@curator.apache.org Content-Type: multipart/alternative; boundary=94eb2c0b73166b480f0524d8eeb6 --94eb2c0b73166b480f0524d8eeb6 Content-Type: text/plain; charset=UTF-8 Not necessarily false alarms, just that the LOST event didn't necessarily mean session loss, just that curator was giving up. With 3.0.0 the LOST event will occur when Curator is explicitly told that a session has expired by Zookeeper, or if no connection to Zookeeper is available, Curator will publish a LOST event when it thinks that the session has been lost. This is based on a timer and the negotiated session timeout with ZooKeeper. On Thu, Nov 19, 2015 at 10:13 AM, Vikrant Singh wrote: > Thanks a lot for reply. So if I am understanding it correct, there were > false alarms (or mistaken connection lost) . With 3.0.0 connection_lost > events will happen only when there is true session lost. > > On Wed, Nov 18, 2015 at 1:16 PM, Cameron McKenzie > wrote: > >> Hey Vikrant, >> The issue was that the LOST event was being published by Curator when it >> gave up trying to reconnect to Zookeeper after connection loss, whereas >> most people were interpreting it to mean that the session was lost. >> >> So, the change in CURATOR-3.0 is that the LOST event will be published >> when the session has either expired and Curator is explicitly told this by >> Zookeeper (implying that a connection is present), or when Curator has been >> disconnected from Zookeeper for long enough for the session to have expired >> on the server (this will occur when no connection to Zookeeper is present). >> >> So, I'm not sure how it will help your case. It is just a more reliable >> way of knowing that the session is gone and all related ephemeral state on >> the Zookeeper server will also be gone. >> >> Note that it's also possible to tell Curator to use the legacy way of >> interpreting the LOST event. >> cheers >> >> >> On Thu, Nov 19, 2015 at 8:09 AM, Vikrant Singh < >> vikrant.subscribe@gmail.com> wrote: >> >>> Hello All, >>> I need some guidance on understanding how to a fix done in latest >>> release 3.0.0 . I am talking about following fix - >>> https://issues.apache.org/jira/browse/CURATOR-247 . >>> >>> In my project we create some ephemeral nodes and monitor a cluster >>> through a tree cache . Framework for treecache and ephemeral node is >>> created using ExponentialBackoffRetry with retry interval of 1 sec and >>> retry count of 29 (which is MAX_RETRIES_LIMIT ) . We do kill the >>> process moment we get TreeCacheEvent.Type.CONNECTION_LOST event . >>> >>> As process restart is really expensive, I want to understand how I can >>> leverage from this fix. >>> >>> Please help me in understanding what is the issue and how it may affect >>> a setup like ours. We are still not on 3.0.0. >>> >>> Thanks, >>> Vikrant >>> >>> >>> >> > --94eb2c0b73166b480f0524d8eeb6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Not necessarily false alarms, just that the LOST event did= n't necessarily mean session loss, just that curator was giving up.
With 3.0.0 the LOST event will occur when Curator is explic= itly told that a session has expired by Zookeeper, or if no connection to Z= ookeeper is available, Curator will publish a LOST event when it thinks tha= t the session has been lost. This is based on a timer and the negotiated se= ssion timeout with ZooKeeper.



On Thu, Nov 19, 2015 = at 10:13 AM, Vikrant Singh <vikrant.subscribe@gmail.com><= /span> wrote:
Thanks a l= ot for reply. So if I am understanding it correct, there were false alarms = (or mistaken connection lost) . With 3.0.0 connection_lost events will happ= en only when there is true session lost.=C2=A0
<= div class=3D"h5">

= On Wed, Nov 18, 2015 at 1:16 PM, Cameron McKenzie <cammckenzie@apache= .org> wrote:
Hey Vikrant,
The issue was that the LOST event was being published b= y Curator when it gave up trying to reconnect to Zookeeper after connection= loss, whereas most people were interpreting it to mean that the session wa= s lost.

So, the change in CURATOR-3.0 is that the = LOST event will be published when the session has either expired and Curato= r is explicitly told this by Zookeeper (implying that a connection is prese= nt), or when Curator has been disconnected from Zookeeper for long enough f= or the session to have expired on the server (this will occur when no conne= ction to Zookeeper is present).

So, I'm not su= re how it will help your case. It is just a more reliable way of knowing th= at the session is gone and all related ephemeral state on the Zookeeper ser= ver will also be gone.

Note that it's also pos= sible to tell Curator to use the legacy way of interpreting the LOST event.=
cheers


On Thu, Nov 19, 2015 at 8:09 AM, Vikra= nt Singh <vikrant.subscribe@gmail.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
Hello All,
I need some g= uidance on understanding how to a fix done in latest release 3.0.0 . I am t= alking about following fix - https://issues.apache.org/jira/browse/CUR= ATOR-247 .

In my project we create some epheme= ral nodes and monitor a cluster through a tree cache .=C2=A0Framework for t= reecache and ephemeral node is created using=C2=A0ExponentialBackoffRetry w= ith retry interval of 1 sec and retry count of 29 (which is=C2=A0MAX= _RETRIES_LIMIT )=C2=A0.=C2=A0 We do kill the process moment = =C2=A0we get=C2=A0TreeCacheEvent.Type.CONNECTION_LOST event .=C2=A0

As process restart is really expensi= ve, I want to understand how I can leverage from this fix.

Please help me in understanding w= hat is the issue and how it may affect a setup like ours. We are sti= ll not on 3.0.0.

Thanks,
Vikrant

=



--94eb2c0b73166b480f0524d8eeb6--