Mailing-List: contact user-help@helix.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@helix.apache.org
Received-SPF: pass (nike.apache.org: domain of
 prvs=4696898da=zzhang@linkedin.com designates 69.28.149.80 as permitted
 sender)
From: Zhen Zhang <zzhang@linkedin.com>
To: "user@helix.apache.org" <user@helix.apache.org>
Subject: RE: Excessive ZooKeeper load
Thread-Topic: Excessive ZooKeeper load
Thread-Index: AQHQP0gL7Qn9JslVqEiPFGJ8z6Z4g5zemFEAgAADjwD//6c+qg==
Date: Tue, 3 Feb 2015 03:37:24 +0000
Message-ID: <23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9@ESV4-MB02.linkedin.biz>
References: 
 <CAKxWWm322ojg5=EgWCCEXLEOnoEJr7+ztBxiud30pk=O7OK6BQ@mail.gmail.com>
	<CAKxWWm0v54p6H47MNv6kh_z7S7TbyZqQFhOwTniPs5CKghgSuw@mail.gmail.com>,<CAKxWWm0pptKRg=1siteMjWOsFyYP9vgoKSStFhb2RT6p2XDq9Q@mail.gmail.com>
In-Reply-To: 
 <CAKxWWm0pptKRg=1siteMjWOsFyYP9vgoKSStFhb2RT6p2XDq9Q@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9ESV4MB02linkedi_"
MIME-Version: 1.0

--_000_23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9ESV4MB02linkedi_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hey Varun,

I guess your external view is pretty large, since each external view callba=
ck takes ~3s. The RoutingTableProvider is callback based, so only when ther=
e is a change in the external view, RoutingTableProvider will read the enti=
re external view from ZK. During the rolling upgrade, there are lots of liv=
e instance change, which may lead to a lot of changes in the external view.=
 One possible way to mitigate the issue is to smooth the traffic by having =
some delays in between bouncing nodes. We can do a rough estimation on how =
many external view changes you might have during the upgrade, how many list=
eners you have, and how large is the external views. Once we have these num=
bers, we might know the ZK bandwidth requirement. ZK read bandwidth can be =
scaled by adding ZK observers.

ZK watcher is one time only, so every time a listener receives a callback, =
it will re-register its watcher again to ZK.

It's normally unreliable to depend on delta changes instead of reading the =
entire znode. There might be some corner cases where you would lose delta c=
hanges if you depend on that.

For the ZK connection issue, do you have any log on the ZK server side rega=
rding this connection?

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com]
Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack trace - it probabl=
y lost connection and is now stampeding it:


"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinz=
k003e:2181" daemon prio=3D10 tid=3D0x00007f534144b800 nid=3D0x7db5 in Objec=
t.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:13=
09)

        - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Pa=
cket)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:6=
75)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:=
136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(=
CallbackHandler.java:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(=
CallbackHandler.java:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandl=
er.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHeli=
xManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(Cal=
lbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <varun@pinterest.com<mailto:va=
run@pinterest.com>> wrote:
I am wondering what is causing the zk subscription to happen every 2-3 seco=
nds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <varun@pinterest.com<mailto:va=
run@pinterest.com>> wrote:
Hi,

We are serving a few different resources whose total # of partitions is ~ 3=
0K. We just did a rolling restart fo the cluster and the clients which use =
the RoutingTableProvider are stuck in a bad state where they are constantly=
 subscribing to changes in the external view of a cluster. Here is the heli=
x log on the client after our rolling restart was finished - the client is =
constantly polling ZK. The zookeeper node is pushing 300mbps right now and =
most of the traffic is being pulled by clients. Is this a race condition - =
also is there an easy way to make the clients not poll so aggressively. We =
restarted one of the clients and we don't see these same messages anymore. =
Also is it possible to just propagate external view diffs instead of the wh=
ole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV=
IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA=
LVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-cha=
nge. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Routi=
ngTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV=
IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA=
LVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-cha=
nge. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Routi=
ngTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV=
IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA=
LVIEW listener:org.apache.helix.spectator.RoutingTableProvider


--_000_23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9ESV4MB02linkedi_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html dir=3D"ltr">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<style type=3D"text/css" id=3D"owaParaStyle"></style>
</head>
<body fpstyle=3D"1" ocsi=3D"0">
<div style=3D"direction: ltr;font-family: Tahoma;color: #000000;font-size: =
10pt;">Hey Varun,&nbsp;
<div><br>
</div>
<div>I guess your external view is pretty large, since each external view c=
allback takes ~3s.&nbsp;<span style=3D"font-size: 10pt;">The RoutingTablePr=
ovider is callback based, so only when there is a change in the external vi=
ew, RoutingTableProvider will read the
 entire external view from ZK. During the rolling upgrade, there are lots o=
f live instance change, which may lead to a lot of changes in the external =
view. One possible way to mitigate the issue is to smooth the traffic by ha=
ving some delays in between bouncing
 nodes. We can do a rough estimation on how many external view changes you =
might have during the upgrade, how many listeners you have, and how large i=
s the external views. Once we have these numbers, we might know the ZK band=
width requirement. ZK read bandwidth
 can be scaled by adding ZK observers.</span></div>
<div><span style=3D"font-size: 10pt;"><br>
</span></div>
<div><span style=3D"font-size: 10pt;">ZK watcher is one time only, so every=
 time a listener receives a callback, it will re-register its watcher again=
 to ZK.</span></div>
<div><span style=3D"font-size: 10pt;"><br>
</span></div>
<div><span style=3D"font-size: 10pt;">It's normally unreliable to depend on=
 delta changes instead of reading the entire znode. There might be some cor=
ner cases where you would lose delta changes if you depend on that.</span><=
/div>
<div><span style=3D"font-size: 10pt;"><br>
</span></div>
<div><span style=3D"font-size: 10pt;">For the ZK connection issue, do you h=
ave any log on the ZK server side regarding this connection?</span></div>
<div><span style=3D"font-size: 10pt;"><br>
</span></div>
<div><span style=3D"font-size: 10pt;">Thanks,</span></div>
<div><span style=3D"font-size: 10pt;">Jason</span></div>
<div><span style=3D"font-size: 10pt;"><br>
</span></div>
<div>
<div style=3D"font-family: Times New Roman; color: #000000; font-size: 16px=
">
<hr tabindex=3D"-1">
<div id=3D"divRpF297841" style=3D"direction: ltr;"><font face=3D"Tahoma" si=
ze=3D"2" color=3D"#000000"><b>From:</b> Varun Sharma [varun@pinterest.com]<=
br>
<b>Sent:</b> Monday, February 02, 2015 4:41 PM<br>
<b>To:</b> user@helix.apache.org<br>
<b>Subject:</b> Re: Excessive ZooKeeper load<br>
</font><br>
</div>
<div></div>
<div>
<div dir=3D"ltr">I believe there is a misbehaving client. Here is a stack t=
race - it probably lost connection and is now stampeding it:
<div><br>
</div>
<div>
<p class=3D"">&quot;ZkClient-EventThread-104-terrapin<span class=3D"">zk</s=
pan>001a:2181,terrapin<span class=3D"">zk</span>002b:2181,terrapin<span cla=
ss=3D"">zk</span>003e:2181&quot; daemon prio=3D10 tid=3D0x00007f534144b800 =
nid=3D0x7db5 in Object.wait() [0x00007f52ca9c3000]</p>
<p class=3D"">&nbsp;&nbsp; java.lang.Thread.State: WAITING (on object monit=
or)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at java.lang.Object.wait(Native M=
ethod)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at java.lang.Object.wait(Object.j=
ava:503)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.zookeeper.ClientCnx=
n.submitRequest(ClientCnxn.java:1309)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; - locked &lt;0x00000004fb0d8c38&g=
t; (a org.apache.zookeeper.ClientCnxn$Packet)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.zookeeper.ZooKeeper=
.exists(ZooKeeper.java:1036)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.zookeeper.ZooKeeper=
.exists(ZooKeeper.java:1069)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span class=3D"">zk=
</span>client.ZkConnection.exists(ZkConnection.java:95)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span class=3D"">zk=
</span>client.ZkClient$11.call(ZkClient.java:823)</p>
<p class=3D""><b>&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span class=3D""=
>zk</span>client.ZkClient.retryUntilConnected(ZkClient.java:675)</b></p>
<p class=3D""><b>&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span class=3D""=
>zk</span>client.ZkClient.watchForData(ZkClient.java:820)</b></p>
<p class=3D""><b>&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span class=3D""=
>zk</span>client.ZkClient.subscribeDataChanges(ZkClient.java:136)</b></p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.helix.manager.<span=
 class=3D"">zk</span>.CallbackHandler.subscribeDataChange(CallbackHandler.j=
ava:241)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.helix.manager.<span=
 class=3D"">zk</span>.CallbackHandler.subscribeForChanges(CallbackHandler.j=
ava:287)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.helix.manager.<span=
 class=3D"">zk</span>.CallbackHandler.invoke(CallbackHandler.java:202)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; - locked &lt;0x000000056b75a948&g=
t; (a org.apache.helix.manager.<span class=3D"">zk</span>.ZKHelixManager)</=
p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.helix.manager.<span=
 class=3D"">zk</span>.CallbackHandler.handleDataChange(CallbackHandler.java=
:338)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span class=3D"">zk=
</span>client.ZkClient$6.run(ZkClient.java:547)</p>
<p class=3D"">&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span class=3D"">zk=
</span>client.ZkEventThread.run(ZkEventThread.java:71)</p>
</div>
</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <sp=
an dir=3D"ltr">
&lt;<a href=3D"mailto:varun@pinterest.com" target=3D"_blank">varun@pinteres=
t.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">I am wondering what is causing the zk subscription to happ=
en every 2-3 seconds - is this a new watch being established every 3 second=
s ?
<div><br>
</div>
<div>Thanks</div>
<span class=3D"HOEnZb"><font color=3D"#888888">
<div>Varun</div>
</font></span></div>
<div class=3D"HOEnZb">
<div class=3D"h5">
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <sp=
an dir=3D"ltr">
&lt;<a href=3D"mailto:varun@pinterest.com" target=3D"_blank">varun@pinteres=
t.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">Hi,
<div><br>
</div>
<div>We are serving a few different resources whose total # of partitions i=
s ~ 30K. We just did a rolling restart fo the cluster and the clients which=
 use the RoutingTableProvider are stuck in a bad state where they are const=
antly subscribing to changes in
 the external view of a cluster. Here is the helix log on the client after =
our rolling restart was finished - the client is constantly polling ZK. The=
 zookeeper node is pushing 300mbps right now and most of the traffic is bei=
ng pulled by clients. Is this a
 race condition - also is there an easy way to make the clients not poll so=
 aggressively. We restarted one of the clients and we don't see these same =
messages anymore. Also is it possible to just propagate external view diffs=
 instead of the whole big znode
 ?</div>
<div><br>
</div>
<div>15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTE=
RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 334=
0ms<br>
</div>
<div>
<p>15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE=
RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider</p>
<p>15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-=
change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Ro=
utingTableProvider@76984879</p>
<p>15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERN=
ALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371m=
s</p>
<p>15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE=
RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider</p>
<p>15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-=
change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Ro=
utingTableProvider@76984879</p>
<p>15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERN=
ALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281m=
s</p>
<p>15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE=
RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider</p>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</body>
</html>

--_000_23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9ESV4MB02linkedi_--