Mailing-List: contact user-help@curator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@curator.apache.org
From: Jordan Zimmerman <jordan@jordanzimmerman.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_CA150A19-EED0-496B-9D9C-11D16C9AD139"
Message-Id: <5EBD6976-7EAF-4E0E-9FDB-C36E45DFC12F@jordanzimmerman.com>
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Leader Latch question
Date: Wed, 17 Aug 2016 15:21:55 -0500
References: <BN6PR02MB254584BCA934CA6756203DDDCE140@BN6PR02MB2545.namprd02.prod.outlook.com> <3D09EAC9-75AF-434B-8BB5-A738967CA457@jordanzimmerman.com> <BN6PR02MB2545316D0FC1011BE78516E0CE140@BN6PR02MB2545.namprd02.prod.outlook.com> <A9FEB8A6-C86B-4987-9C9A-E3FAD82B6E69@jordanzimmerman.com> <BN6PR02MB254573C9C9555017BC9116C3CE140@BN6PR02MB2545.namprd02.prod.outlook.com>
To: user@curator.apache.org
In-Reply-To: <BN6PR02MB254573C9C9555017BC9116C3CE140@BN6PR02MB2545.namprd02.prod.outlook.com>
archived-at: Wed, 17 Aug 2016 20:22:15 -0000


--Apple-Mail=_CA150A19-EED0-496B-9D9C-11D16C9AD139
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

No - notLeader() will not get called automatically when there=E2=80=99s =
a network partition. Please see:

http://curator.apache.org/errors.html

and

http://curator.apache.org/curator-recipes/leader-latch.html - Error =
Handling

-Jordan

> On Aug 17, 2016, at 3:14 PM, Steve Boyle <sboyle@connexity.com> wrote:
>=20
> I should note that we are using version 2.9.1.  I believe we rely on =
Curator to handle the Lost and Suspended cases, looks like we=E2=80=99d =
expect calls to leaderLatchListener.isLeader and =
leaderLatchListener.notLeader.  We=E2=80=99ve never seen long GCs with =
this app, I=E2=80=99ll start logging that.
> =20
> Thanks,
> Steve
> =C2=A0 <>
> From: Jordan Zimmerman [mailto:jordan@jordanzimmerman.com]=20
> Sent: Wednesday, August 17, 2016 11:23 AM
> To: user@curator.apache.org
> Subject: Re: Leader Latch question
> =20
> * How do you handle CONNECTION_SUSPENDED and CONNECTION_LOST?=20
> * Was there possibly a very long gc? See =
https://cwiki.apache.org/confluence/display/CURATOR/TN10 =
<https://cwiki.apache.org/confluence/display/CURATOR/TN10>
> =20
> -Jordan
> =20
> On Aug 17, 2016, at 1:07 PM, Steve Boyle <sboyle@connexity.com =
<mailto:sboyle@connexity.com>> wrote:
> =20
> I appreciate your response.  Any thoughts on how the issue may have =
occurred in production?  Or thoughts on how to reproduce that scenario?
> =20
> In the production case, there were two instances of the app =E2=80=93 =
both configured for a list of 5 zookeepers.
> =20
> Thanks,
> Steve
> =20
> From: Jordan Zimmerman [mailto:jordan@jordanzimmerman.com =
<mailto:jordan@jordanzimmerman.com>]=20
> Sent: Wednesday, August 17, 2016 11:03 AM
> To: user@curator.apache.org <mailto:user@curator.apache.org>
> Subject: Re: Leader Latch question
> =20
> Manual removal of the latch node isn=E2=80=99t supported. It would =
require the latch to add a watch on its own node and that has =
performance/runtime overhead. The recommended behavior is to watch for =
connection loss/suspended events and exit your latch when that happens.=20=

> =20
> -Jordan
> =20
> On Aug 17, 2016, at 12:43 PM, Steve Boyle <sboyle@connexity.com =
<mailto:sboyle@connexity.com>> wrote:
> =20
> I=E2=80=99m using the Leader Latch recipe.  I can successfully bring =
up two instances of my app and have one become =E2=80=98active=E2=80=99 =
and one become =E2=80=98standby=E2=80=99.  Most everything works as =
expected.  We had an issue, in production, when adding a zookeeper to =
our existing quorum, both instances of the app became =E2=80=98active=E2=80=
=99.  Unfortunately, the log files rolled over before we could check for =
exceptions.  I=E2=80=99ve been trying to reproduce this issue in a test =
environment.  In my test environment, I have two instances of my app =
configured to use a single zookeeper =E2=80=93 this zookeeper is part of =
a 5 node quorum and is not currently the leader.  I can trigger both =
instances of the app to become =E2=80=98active=E2=80=99 if I use zkCli =
and manually delete the latch path from the single zookeeper to which my =
apps are connected.  When I manually delete the latch path, I can see =
via debug logging that the instance that was previously =E2=80=98standby=E2=
=80=99 gets a notification from zookeeper =E2=80=9CGot WatchedEvent =
state:SyncConnected type:NodeDeleted=E2=80=9D.  However, the instance =
that had already been active gets no notification at all.  Is it =
expected that manually removing the latch path would only generate =
notifications to some instances of my app?
> =20
> Thanks,
> Steve Boyle


--Apple-Mail=_CA150A19-EED0-496B-9D9C-11D16C9AD139
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D""><div class=3D"">No - notLeader() will not get called =
automatically when there=E2=80=99s a network partition. Please =
see:</div><div class=3D""><br class=3D""></div><div class=3D""><a =
href=3D"http://curator.apache.org/errors.html" =
class=3D"">http://curator.apache.org/errors.html</a></div><div =
class=3D""><br class=3D""></div><div class=3D"">and</div><div =
class=3D""><br class=3D""></div><div class=3D""><a =
href=3D"http://curator.apache.org/curator-recipes/leader-latch.html" =
class=3D"">http://curator.apache.org/curator-recipes/leader-latch.html</a>=
 - Error Handling</div><div class=3D""><br class=3D""></div><div =
class=3D"">-Jordan</div><br class=3D""><div><blockquote type=3D"cite" =
class=3D""><div class=3D"">On Aug 17, 2016, at 3:14 PM, Steve Boyle =
&lt;<a href=3D"mailto:sboyle@connexity.com" =
class=3D"">sboyle@connexity.com</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div =
class=3D"WordSection1" style=3D"page: WordSection1; font-family: =
Helvetica; font-size: 14px; font-style: normal; font-variant-caps: =
normal; font-weight: normal; letter-spacing: normal; orphans: auto; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: =
0px;"><div style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; =
font-family: 'Times New Roman', serif;" class=3D""><span =
style=3D"font-size: 11pt; font-family: Calibri, sans-serif; color: =
rgb(31, 73, 125);" class=3D"">I should note that we are using version =
2.9.1.&nbsp; I believe we rely on Curator to handle the Lost and =
Suspended cases, looks like we=E2=80=99d expect calls to =
leaderLatchListener.isLeader and leaderLatchListener.notLeader.&nbsp; =
We=E2=80=99ve never seen long GCs with this app, I=E2=80=99ll start =
logging that.<o:p class=3D""></o:p></span></div><div style=3D"margin: =
0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', =
serif;" class=3D""><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; color: rgb(31, 73, 125);" class=3D""><o:p =
class=3D"">&nbsp;</o:p></span></div><div style=3D"margin: 0in 0in =
0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" =
class=3D""><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; color: rgb(31, 73, 125);" class=3D"">Thanks,<o:p =
class=3D""></o:p></span></div><div style=3D"margin: 0in 0in 0.0001pt; =
font-size: 12pt; font-family: 'Times New Roman', serif;" class=3D""><span =
style=3D"font-size: 11pt; font-family: Calibri, sans-serif; color: =
rgb(31, 73, 125);" class=3D"">Steve<o:p class=3D""></o:p></span></div><div=
 style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D""><a name=3D"_MailEndCompose" =
class=3D""><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; color: rgb(31, 73, 125);" class=3D""><o:p =
class=3D"">&nbsp;</o:p></span></a></div><div class=3D""><div =
style=3D"border-style: solid none none; border-top-color: rgb(225, 225, =
225); border-top-width: 1pt; padding: 3pt 0in 0in;" class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D""><b class=3D""><span style=3D"font-size: =
11pt; font-family: Calibri, sans-serif;" class=3D"">From:</span></b><span =
style=3D"font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D""><span class=3D"Apple-converted-space">&nbsp;</span>Jordan =
Zimmerman [<a href=3D"mailto:jordan@jordanzimmerman.com" =
class=3D"">mailto:jordan@jordanzimmerman.com</a>]<span =
class=3D"Apple-converted-space">&nbsp;</span><br class=3D""><b =
class=3D"">Sent:</b><span =
class=3D"Apple-converted-space">&nbsp;</span>Wednesday, August 17, 2016 =
11:23 AM<br class=3D""><b class=3D"">To:</b><span =
class=3D"Apple-converted-space">&nbsp;</span><a =
href=3D"mailto:user@curator.apache.org" =
class=3D"">user@curator.apache.org</a><br class=3D""><b =
class=3D"">Subject:</b><span =
class=3D"Apple-converted-space">&nbsp;</span>Re: Leader Latch =
question<o:p class=3D""></o:p></span></div></div></div><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D""><o:p class=3D"">&nbsp;</o:p></div><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D"">* How do you handle CONNECTION_SUSPENDED =
and CONNECTION_LOST?&nbsp;<o:p class=3D""></o:p></div><div class=3D""><div=
 style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D"">* Was there possibly a very long gc? =
See&nbsp;<a =
href=3D"https://cwiki.apache.org/confluence/display/CURATOR/TN10" =
style=3D"color: purple; text-decoration: underline;" =
class=3D"">https://cwiki.apache.org/confluence/display/CURATOR/TN10</a><o:=
p class=3D""></o:p></div><div class=3D""><div style=3D"margin: 0in 0in =
0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" =
class=3D""><o:p class=3D"">&nbsp;</o:p></div></div><div class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D"">-Jordan<o:p =
class=3D""></o:p></div></div><div class=3D""><div style=3D"margin: 0in =
0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" =
class=3D""><o:p class=3D"">&nbsp;</o:p></div><div class=3D""><blockquote =
style=3D"margin-top: 5pt; margin-bottom: 5pt;" class=3D""><div =
class=3D""><div style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; =
font-family: 'Times New Roman', serif;" class=3D"">On Aug 17, 2016, at =
1:07 PM, Steve Boyle &lt;<a href=3D"mailto:sboyle@connexity.com" =
style=3D"color: purple; text-decoration: underline;" =
class=3D"">sboyle@connexity.com</a>&gt; wrote:<o:p =
class=3D""></o:p></div></div><div style=3D"margin: 0in 0in 0.0001pt; =
font-size: 12pt; font-family: 'Times New Roman', serif;" class=3D""><o:p =
class=3D"">&nbsp;</o:p></div><div class=3D""><div class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D""><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class=3D"">I =
appreciate your response.&nbsp; Any thoughts on how the issue may have =
occurred in production?&nbsp; Or thoughts on how to reproduce that =
scenario?</span><o:p class=3D""></o:p></div></div><div class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D""><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" =
class=3D"">&nbsp;</span><o:p class=3D""></o:p></div></div><div =
class=3D""><div style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; =
font-family: 'Times New Roman', serif;" class=3D""><span =
style=3D"font-size: 11pt; font-family: Calibri, sans-serif; color: =
rgb(31, 73, 125);" class=3D"">In the production case, there were two =
instances of the app =E2=80=93 both configured for a list of 5 =
zookeepers.</span><o:p class=3D""></o:p></div></div><div class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D""><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" =
class=3D"">&nbsp;</span><o:p class=3D""></o:p></div></div><div =
class=3D""><div style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; =
font-family: 'Times New Roman', serif;" class=3D""><span =
style=3D"font-size: 11pt; font-family: Calibri, sans-serif; color: =
rgb(31, 73, 125);" class=3D"">Thanks,</span><o:p =
class=3D""></o:p></div></div><div class=3D""><div style=3D"margin: 0in =
0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" =
class=3D""><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; color: rgb(31, 73, 125);" class=3D"">Steve</span><o:p =
class=3D""></o:p></div></div><div class=3D""><div style=3D"margin: 0in =
0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" =
class=3D""><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif; color: rgb(31, 73, 125);" class=3D"">&nbsp;</span><o:p =
class=3D""></o:p></div></div><div class=3D""><div style=3D"border-style: =
solid none none; border-top-color: rgb(225, 225, 225); border-top-width: =
1pt; padding: 3pt 0in 0in;" class=3D""><div class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D""><b class=3D""><span style=3D"font-size: =
11pt; font-family: Calibri, sans-serif;" class=3D"">From:</span></b><span =
class=3D"apple-converted-space"><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif;" class=3D"">&nbsp;</span></span><span =
style=3D"font-size: 11pt; font-family: Calibri, sans-serif;" =
class=3D"">Jordan Zimmerman [<a href=3D"mailto:jordan@jordanzimmerman.com"=
 style=3D"color: purple; text-decoration: underline;" =
class=3D"">mailto:jordan@jordanzimmerman.com</a>]<span =
class=3D"apple-converted-space">&nbsp;</span><br class=3D""><b =
class=3D"">Sent:</b><span =
class=3D"apple-converted-space">&nbsp;</span>Wednesday, August 17, 2016 =
11:03 AM<br class=3D""><b class=3D"">To:</b><span =
class=3D"apple-converted-space">&nbsp;</span><a =
href=3D"mailto:user@curator.apache.org" style=3D"color: purple; =
text-decoration: underline;" class=3D"">user@curator.apache.org</a><br =
class=3D""><b class=3D"">Subject:</b><span =
class=3D"apple-converted-space">&nbsp;</span>Re: Leader Latch =
question</span><o:p class=3D""></o:p></div></div></div></div><div =
class=3D""><div style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; =
font-family: 'Times New Roman', serif;" class=3D"">&nbsp;<o:p =
class=3D""></o:p></div></div><div class=3D""><div style=3D"margin: 0in =
0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" =
class=3D"">Manual removal of the latch node isn=E2=80=99t supported. It =
would require the latch to add a watch on its own node and that has =
performance/runtime overhead. The recommended behavior is to watch for =
connection loss/suspended events and exit your latch when that =
happens.&nbsp;<o:p class=3D""></o:p></div></div><div class=3D""><div =
class=3D""><div style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; =
font-family: 'Times New Roman', serif;" class=3D"">&nbsp;<o:p =
class=3D""></o:p></div></div></div><div class=3D""><div class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D"">-Jordan<o:p =
class=3D""></o:p></div></div><div class=3D""><div class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D"">&nbsp;<o:p =
class=3D""></o:p></div></div><div class=3D""><blockquote =
style=3D"margin-top: 5pt; margin-bottom: 5pt;" class=3D""><div =
class=3D""><div class=3D""><div style=3D"margin: 0in 0in 0.0001pt; =
font-size: 12pt; font-family: 'Times New Roman', serif;" class=3D"">On =
Aug 17, 2016, at 12:43 PM, Steve Boyle &lt;<a =
href=3D"mailto:sboyle@connexity.com" style=3D"color: purple; =
text-decoration: underline;" class=3D""><span style=3D"color: purple;" =
class=3D"">sboyle@connexity.com</span></a>&gt; wrote:<o:p =
class=3D""></o:p></div></div></div><div class=3D""><div style=3D"margin: =
0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', =
serif;" class=3D"">&nbsp;<o:p class=3D""></o:p></div></div><div =
class=3D""><div class=3D""><div class=3D""><div style=3D"margin: 0in 0in =
0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;" =
class=3D""><span style=3D"font-size: 11pt; font-family: Calibri, =
sans-serif;" class=3D"">I=E2=80=99m using the Leader Latch recipe.&nbsp; =
I can successfully bring up two instances of my app and have one become =
=E2=80=98active=E2=80=99 and one become =E2=80=98standby=E2=80=99.&nbsp; =
Most everything works as expected.&nbsp; We had an issue, in production, =
when adding a zookeeper to our existing quorum, both instances of the =
app became =E2=80=98active=E2=80=99.&nbsp; Unfortunately, the log files =
rolled over before we could check for exceptions.&nbsp; I=E2=80=99ve =
been trying to reproduce this issue in a test environment.&nbsp; In my =
test environment, I have two instances of my app configured to use a =
single zookeeper =E2=80=93 this zookeeper is part of a 5 node quorum and =
is not currently the leader.&nbsp; I can trigger both instances of the =
app to become =E2=80=98active=E2=80=99 if I use zkCli and manually =
delete the latch path from the single zookeeper to which my apps are =
connected.&nbsp; When I manually delete the latch path, I can see via =
debug logging that the instance that was previously =E2=80=98standby=E2=80=
=99 gets a notification from zookeeper =E2=80=9CGot WatchedEvent =
state:SyncConnected type:NodeDeleted=E2=80=9D.&nbsp; However, the =
instance that had already been active gets no notification at all.&nbsp; =
Is it expected that manually removing the latch path would only generate =
notifications to some instances of my app?</span><o:p =
class=3D""></o:p></div></div></div><div class=3D""><div class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D""><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif;" class=3D"">&nbsp;</span><o:p =
class=3D""></o:p></div></div></div><div class=3D""><div class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D""><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif;" class=3D"">Thanks,</span><o:p =
class=3D""></o:p></div></div></div><div class=3D""><div class=3D""><div =
style=3D"margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times =
New Roman', serif;" class=3D""><span style=3D"font-size: 11pt; =
font-family: Calibri, sans-serif;" class=3D"">Steve =
Boyle</span></div></div></div></div></blockquote></div></div></div></div><=
/blockquote></div></div></div></div></div></blockquote></div><br =
class=3D""></body></html>=

--Apple-Mail=_CA150A19-EED0-496B-9D9C-11D16C9AD139--