Return-Path: X-Original-To: apmail-curator-user-archive@minotaur.apache.org Delivered-To: apmail-curator-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20D0410E67 for ; Tue, 27 May 2014 20:12:28 +0000 (UTC) Received: (qmail 96202 invoked by uid 500); 27 May 2014 20:12:28 -0000 Delivered-To: apmail-curator-user-archive@curator.apache.org Received: (qmail 96155 invoked by uid 500); 27 May 2014 20:12:28 -0000 Mailing-List: contact user-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@curator.apache.org Delivered-To: mailing list user@curator.apache.org Received: (qmail 96147 invoked by uid 99); 27 May 2014 20:12:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 May 2014 20:12:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.192.53] (HELO mail-qg0-f53.google.com) (209.85.192.53) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 May 2014 20:12:25 +0000 Received: by mail-qg0-f53.google.com with SMTP id f51so14687973qge.26 for ; Tue, 27 May 2014 13:12:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=RSTSpVypd9wxIkDh5Uxkmtlf2n4zBTYdFXfw7w4WM8w=; b=iKKoOmSAXkAVR5qxJHPRfqfYmXmuvqpAys9m5uXBfVHHkKPgtfSX0d2L/DGAV9XZNs cCOYNTnQYnI0p541eQKaAuKi7BM1GA+UDldQDqwoh8q+5PtcHS1geE9ivdNNQqIvp+5W Q9Q+0nswPm+NoJ80WB3h/YkSxga2PM6dlTz5+gLyUZy/GGzRUXUelSTktXjHwNtci94a OhDfQqSgbRxFT/7jNI/fAI2udqL0f1Rj/r6rdCFL99hChAj0h9i5/cmiOMF1yDgkzgo1 zq60Ia9uavRMWOOSpIUo6sTAH9qATVeEhaknRgxjH60PHDvwo8wUSBiI0qmJzMfEk1zq hmjA== X-Gm-Message-State: ALoCoQk8b2lene0pzW4Je+pTyoLTBrGRH6851FUxQVWOThYKz33TKk8zAWiL7nkyuCwpToGxMCrD X-Received: by 10.140.47.167 with SMTP id m36mr43715805qga.21.1401221520945; Tue, 27 May 2014 13:12:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.100.195 with HTTP; Tue, 27 May 2014 13:11:40 -0700 (PDT) In-Reply-To: References: From: =?UTF-8?Q?Mathias_S=C3=B6derberg?= Date: Tue, 27 May 2014 22:11:40 +0200 Message-ID: Subject: Re: LeaderLatch recipe and error handling To: Jordan Zimmerman Cc: user Content-Type: multipart/alternative; boundary=001a11c1641e020af504fa674fee X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1641e020af504fa674fee Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Right, that=E2=80=99s what I assumed after I actually read the code for the LeaderLatch class. We=E2=80=99re not using await(), but have a number of LeaderLatches and cur= rently we=E2=80=99re caching the last response of getLeader() (for a each LeaderLa= tch), and we add watches for the election paths and update the cache if we get a NodeChildrenChanged notification. When we get a LOST event followed by a RECONNECTED event we clear the cache and start over as we have no clue who=E2=80=99s responsible for what. If we= get a SUSPENDED event we don=E2=80=99t permit reads from the cache until we get a RECONNECTED event (or rather we return null as we cannot be sure who=E2=80= =99s leader). Perhaps we should clear the cache when we get a SUSPENDED event as well, to be on the safe side. But in conclusion there=E2=80=99s no need to actually close and re-create LeaderLatches in case of a connection loss, which is really what I was wondering about. Best regards, Mathias On Tue, May 27, 2014 at 9:41 PM, Jordan Zimmerman < jordan@jordanzimmerman.com> wrote: > The documentation probably needs updating as this has been refined over > time. > > - The LeaderLatch installs its own connection state listener > - If the connection drops (SUSPENDED or LOST), the LeaderLatch changes > its internal state to =E2=80=9Cleader =3D=3D false=E2=80=9D > - If the connection goes to RECONNECTED, the LeaderLatch will attempt > to regain leadership > > This has implications for users of LeaderLatch. If, for example you've > called await() on the LeaderLatch your code will assume that it is the > leader. However, if the connection drops you may no longer be the leader. > So, clients should install their own ConnectionStateListener and notice > that the connection has dropped. Also, you can examine > LeaderLatch.hasLeadership() before your client code does anything where i= t > assumes it is the leader and then periodically re-check it. > > I hope this helps. > > -JZ > > > From: Mathias S=C3=B6derberg mathias@burtcorp.com > Reply: user@curator.apache.org user@curator.apache.org > Date: May 27, 2014 at 2:33:21 PM > To: user@curator.apache.org user@curator.apache.org > Subject: LeaderLatch recipe and error handling > > Good evening, > > I=E2=80=99m currently working on a project where we=E2=80=99re utilising = Curator and more > specifically (quite heavily) the LeaderLatch recipe. > > The documentation for error handling in =E2=80=9Cgeneral=E2=80=9D states = the following for > a LOST notification: > > The connection is confirmed to be lost. Close any locks, leaders, etc. > and attempt to re-create them. NOTE: it is possible to get a RECONNECTED > state after this but you should still consider any locks, etc. as > dirty/unstable. > > And the documentation for the LeaderLatch recipe states the following: > > LeaderLatch instances add a ConnectionStateListener to watch for > connection problems. If SUSPENDED or LOST is reported, the LeaderLatch th= at > is the leader will report that it is no longer the leader (i.e. there wil= l > not be a leader until the connection is re-established). If a LOST > connection is RECONNECTED, the LeaderLatch will delete its previous ZNode > and create a new one. > > Users of LeaderLatch must take account that connection issues can cause > leadership to be lost. i.e. hasLeadership() returns true but some time > later the connection is SUSPENDED or LOST. At that point hasLeadership() > will return false. It is highly recommended that LeaderLatch users regist= er > a ConnectionStateListener. > > My conclusion from reading these two sections is that we=E2=80=99re supp= osed to > add a ConnectionStateListener and when we=E2=80=99re notified of a LOST e= vent > followed by a RECONNECTED event, we=E2=80=99re supposed to close the curr= ent > LeaderLatches that we=E2=80=99re holding and re-create them? > > However, looking through the actual code for the LeaderLatch, it appears > that this is actually already handled, i.e. it appears to create a new > znode when it encounters a RECONNECTED event, or am I reading this wrong? > (The documentation also states this as a fact). > > My question is really: do we have to take any particular precaution > regarding the LeaderLatch recipe and connection loss scenarios? i.e. do w= e > have to close and re-create the LeaderLatches? Or can we be calm and just > carry on with our business as Curator handles this? > > If anything is unclear, let me know. > > Best regards, > > Mathias S=C3=B6derberg > Software Developer, Burt > > www.burtcorp.com > Cell: + 46 762 79 57 55 | Skype: mthssdrbrg > http://twitter.com/mthssdrbrg | http://twitter.com/burtcorp > =E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93= =E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2= =80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80= =93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93= =E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2= =80=93=E2=80=93 > > The Analytics Platform for Online Media > > --=20 Mathias S=C3=B6derberg Software Developer, Burt www.burtcorp.com Cell: + 46 762 79 57 55 | Skype: mthssdrbrg http://twitter.com/mthssdrbrg | http://twitter.com/burtcorp =E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2= =80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80= =93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93= =E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2= =80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80= =93=E2=80=93 The Analytics Platform for Online Media --001a11c1641e020af504fa674fee Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Right, that=E2=80=99s what I assumed after I actually read= the code for the LeaderLatch class.

We=E2=80=99re not u= sing await(), but have a number of LeaderLatches and currently we=E2=80=99r= e caching the last response of getLeader() (for a each LeaderLatch), and we= add watches for the election paths and update the cache if we get a NodeCh= ildrenChanged notification.
When we get a LOST event followed by a RECONNECTED event we clear the = cache and start over as we have no clue who=E2=80=99s responsible for what.= If we get a SUSPENDED event we don=E2=80=99t permit reads from the cache u= ntil we get a RECONNECTED event (or rather we return null as we cannot be s= ure who=E2=80=99s leader).

Perhaps we should clear the cache when we get a SUSPEND= ED event as well, to be on the safe side.

But in c= onclusion there=E2=80=99s no need to actually close and re-create LeaderLat= ches in case of a connection loss, which is really what I was wondering abo= ut.

Best regards,
Mathias


On Tue, May 27, 2014 at= 9:41 PM, Jordan Zimmerman <jordan@jordanzimmerman.com> wrote:
The documentation probably needs updating as this has been refined over tim= e.
  • The LeaderLatch installs its own connection state listener<= /li>
  • If the connection drops (SUSPENDED or LOST), the LeaderLatch change= s its internal state to =E2=80=9Cleader =3D=3D false=E2=80=9D
  • If the connection goes to RECONNECTED, the LeaderLatch will attempt to = regain leadership
This has implications for users of LeaderLatch. If, for exa= mple you've called await() on the LeaderLatch your code will assume tha= t it is the leader. However, if the connection drops you may no longer be t= he leader. So, clients should install their own ConnectionStateListener and= notice that the connection has dropped. Also, you can examine LeaderLatch.= hasLeadership() before your client code does anything where it assumes it i= s the leader and then periodically re-check it.

I hope this helps.
=

-JZ


From:=C2=A0Mathias S=C3=B6derberg mathias@burtcorp.com
Reply:=C2=A0user@curator.apache.org user@curator.apache.org<= br>Date:=C2=A0May 27, 2014 at 2:33:21 PM=
To:=C2=A0user@curator.apache.org user@curator.apache.org
= Subject:=C2=A0 LeaderLatch recipe and error han= dling

Good evening,

I=E2=80=99m currently working on a project where we=E2=80=99re utilisi= ng Curator and more specifically (quite heavily) the LeaderLatch recipe.

The documentation for error handling in =E2=80=9Cgeneral=E2=80=9D stat= es the following for a LOST notification:

The connection is confirmed to be lost. Close any locks, leaders, etc. and attempt to re-create them. NOTE: it is possible to get a RECONNECTED state after this but you should still consider any locks, etc. as dirty/unstable.

And the documentation for the LeaderLatch recipe states the following:

LeaderLatch instances add a ConnectionStateListener to watch for connection problems. If SUSPENDED or LOST is reported, the LeaderLatch that is the leader will report that it is no longer the leader (i.e. there will not be a leader until the connection is re-established). If a LOST connection is RECONNECTED, the LeaderLatch will delete its previous ZNode and create a new one.

Users of LeaderLatch must take account that connection issues can cause leadership to be lost. i.e. hasLeadership() returns true but some time later the connection is SUSPENDED or LOST. At that point hasLeadership() will return false. It is highly recommended that LeaderLatch users register a ConnectionStateListener.

My conclusion from reading these two sections is that we=E2=80=99re supposed to add a ConnectionStateListener and when we=E2=80=99re notified of a LOST event followed by a RECONNECTED event, we=E2=80=99re supposed to close the current LeaderLatches that we=E2=80=99re holding and re-create them?

However, looking through the actual code for the LeaderLatch, it appears that this is actually already handled, i.e. it appears to create a new znode when it encounters a RECONNECTED event, or am I reading this wrong? (The documentation also states this as a fact).

My question is really: do we have to take any particular precaution regarding the LeaderLatch recipe and connection loss scenarios? i.e. do we have to close and re-create the LeaderLatches? Or can we be calm and just carry on with our business as Curator handles this?

If anything is unclear, let me know.

Best regards,

Mathias S=C3=B6derberg

Software Developer, Burt

www.burtcorp.com
Cell:=C2=A0= + 46 762 79 57 55<= /a>=C2=A0| Skype: mthssdrbrg
http://twitter.com/mthssdrbr= g=C2=A0|=C2=A0= http://twitter.com/burtcorp
=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80= =93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93= =E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2= =80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2= =80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93

The Analytics Platform for Online Media


--

Mathias S= =C3=B6derberg

S= oftware Developer, Burt

www.burtcorp.com
Cell:=C2=A0+ 46 762 79 57 55=C2=A0| Skype: mthssdrbrg
http://twitter.com/mthssdrbr= g=C2=A0|=C2=A0= http://twitter.com/burtcorp
=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80= =93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93= =E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2= =80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2= =80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93=E2=80=93

The Analytics Platform for Online Media
--001a11c1641e020af504fa674fee--