Mailing-List: contact user-help@helix.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@helix.apache.org
Received-SPF: pass (nike.apache.org: domain of
 prvs=471969e83=zzhang@linkedin.com designates 69.28.149.80 as permitted
 sender)
From: Zhen Zhang <zzhang@linkedin.com>
To: "user@helix.apache.org" <user@helix.apache.org>
Subject: RE: Excessive ZooKeeper load
Thread-Topic: Excessive ZooKeeper load
Thread-Index: 
 AQHQP0gL7Qn9JslVqEiPFGJ8z6Z4g5zemFEAgAADjwD//6c+qoAAlPmAgAAKLwCAAr8DgIAABTYA//9+cYmAAPMHgP//ijR/ABYq1YD////uRg==
Date: Thu, 5 Feb 2015 17:27:51 +0000
Message-ID: <23CA11DC8830BA44A37C6B44B14D013CB5DC1CCB@ESV4-MB02.linkedin.biz>
References: 
 <CAKxWWm322ojg5=EgWCCEXLEOnoEJr7+ztBxiud30pk=O7OK6BQ@mail.gmail.com>
	<CAKxWWm0v54p6H47MNv6kh_z7S7TbyZqQFhOwTniPs5CKghgSuw@mail.gmail.com>
	<CAKxWWm0pptKRg=1siteMjWOsFyYP9vgoKSStFhb2RT6p2XDq9Q@mail.gmail.com>
	<23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9@ESV4-MB02.linkedin.biz>
	<CAKxWWm08X8v-c_RQrdY5RpDr4JX8vXvfpvkgCkCCqwCiHNfmmA@mail.gmail.com>
	<CABaj-QbZyqZDuJLXoNxmzsFFKO0j=nPJH8h=c5_z9X2kthtH9A@mail.gmail.com>
	<CAKxWWm2X2ZZzuz_KqTH7k5N=q8QxHjH989cubzsePG1PCP2_Mw@mail.gmail.com>
	<CABaj-QZKH_q-TYoskU+QKcW4x_FNHvTeT8HBKFOkH9Tad+pHaA@mail.gmail.com>
	<23CA11DC8830BA44A37C6B44B14D013CB5DC08E1@ESV4-MB02.linkedin.biz>
	<CABaj-QZ7NvCnnsDrra8-08idH0QC_ojS+n9oxtqyZECv7MnhGw@mail.gmail.com>
	<23CA11DC8830BA44A37C6B44B14D013CB5DC1BDD@ESV4-MB02.linkedin.biz>,<CAKxWWm3GeH7ao7+Zqos2yFvCcgO-y7a5NF_rQNQFF6t=c0-_ww@mail.gmail.com>
In-Reply-To: 
 <CAKxWWm3GeH7ao7+Zqos2yFvCcgO-y7a5NF_rQNQFF6t=c0-_ww@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_23CA11DC8830BA44A37C6B44B14D013CB5DC1CCBESV4MB02linkedi_"
MIME-Version: 1.0

--_000_23CA11DC8830BA44A37C6B44B14D013CB5DC1CCBESV4MB02linkedi_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Yes. It will get invoked when external views are added or deleted.
________________________________
From: Varun Sharma [varun@pinterest.com]
Sent: Thursday, February 05, 2015 1:27 AM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

I had another question - does the RoutingTableProvider onExternalViewChange=
 call get invoked when a resource gets deleted (and hence its external view=
 znode) ?

On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zzhang@linkedin.com<mailto:zzh=
ang@linkedin.com>> wrote:
Yes. I think we did this in the incubating stage or even before. It's proba=
bly in a separate branch for some performance evaluation.

________________________________
From: kishore g [g.kishore@gmail.com<mailto:g.kishore@gmail.com>]
Sent: Wednesday, February 04, 2015 9:54 PM

To: user@helix.apache.org<mailto:user@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

Jason, I remember having the ability to compress/decompress and  before we =
added the support to bucketize, compression was used to support large numbe=
r of partitions. However I dont see the code anywhere. Did we do this on a =
separate branch?

thanks,
Kishore G

On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zzhang@linkedin.com<mailto:zzha=
ng@linkedin.com>> wrote:
Hi Varun, we can certainly add compression and have a config for turning it=
 on/off. We do have implemented compression in our own zkclient before. The=
 issue for compression might be:
1) cpu consumption on controller will increase.
2) hard to debug

Thanks,
Jason
________________________________
From: kishore g [g.kishore@gmail.com<mailto:g.kishore@gmail.com>]
Sent: Wednesday, February 04, 2015 3:08 PM

To: user@helix.apache.org<mailto:user@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

we do have the ability to compress the data. I am not sure if there is a ea=
sy way to turn on/off the compression.

On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <varun@pinterest.com<mailto:va=
run@pinterest.com>> wrote:
I am wondering if its possible to gzip the external view znode - a simple g=
zip cut down the data size by 25X. Is it possible to plug in compression/de=
compression as zookeeper nodes are read ?

Varun

On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g.kishore@gmail.com<mailto:g.kis=
hore@gmail.com>> wrote:

There are multiple options we can try here.
what if we used cacheddataaccessor for this use case?.clients will only rea=
d if node has changed. This optimization can benefit all use cases.

What about batching the watch triggers. Not sure which version of helix has=
 this option.

Another option is to use a poll based roundtable instead of watch based. Th=
is can coupled with cacheddataaccessor can be over efficient.

Thanks,
Kishore G

On Feb 2, 2015 8:17 PM, "Varun Sharma" <varun@pinterest.com<mailto:varun@pi=
nterest.com>> wrote:
My total external view across all resources is roughly 3M in size and there=
 are 100 clients downloading it twice for every node restart - thats 600M o=
f data for every restart. So I guess that is causing this issue. We are thi=
nking of doing some tricks to limit the # of clients to 1 from 100. I guess=
 that should help significantly.

Varun

On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zzhang@linkedin.com<mailto:zzha=
ng@linkedin.com>> wrote:
Hey Varun,

I guess your external view is pretty large, since each external view callba=
ck takes ~3s. The RoutingTableProvider is callback based, so only when ther=
e is a change in the external view, RoutingTableProvider will read the enti=
re external view from ZK. During the rolling upgrade, there are lots of liv=
e instance change, which may lead to a lot of changes in the external view.=
 One possible way to mitigate the issue is to smooth the traffic by having =
some delays in between bouncing nodes. We can do a rough estimation on how =
many external view changes you might have during the upgrade, how many list=
eners you have, and how large is the external views. Once we have these num=
bers, we might know the ZK bandwidth requirement. ZK read bandwidth can be =
scaled by adding ZK observers.

ZK watcher is one time only, so every time a listener receives a callback, =
it will re-register its watcher again to ZK.

It's normally unreliable to depend on delta changes instead of reading the =
entire znode. There might be some corner cases where you would lose delta c=
hanges if you depend on that.

For the ZK connection issue, do you have any log on the ZK server side rega=
rding this connection?

Thanks,
Jason

________________________________
From: Varun Sharma [varun@pinterest.com<mailto:varun@pinterest.com>]
Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org<mailto:user@helix.apache.org>
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack trace - it probabl=
y lost connection and is now stampeding it:


"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinz=
k003e:2181" daemon prio=3D10 tid=3D0x00007f534144b800 nid=3D0x7db5 in Objec=
t.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:13=
09)

        - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Pa=
cket)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:6=
75)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:=
136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(=
CallbackHandler.java:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(=
CallbackHandler.java:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandl=
er.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHeli=
xManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(Cal=
lbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <varun@pinterest.com<mailto:va=
run@pinterest.com>> wrote:
I am wondering what is causing the zk subscription to happen every 2-3 seco=
nds - is this a new watch being established every 3 seconds ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <varun@pinterest.com<mailto:va=
run@pinterest.com>> wrote:
Hi,

We are serving a few different resources whose total # of partitions is ~ 3=
0K. We just did a rolling restart fo the cluster and the clients which use =
the RoutingTableProvider are stuck in a bad state where they are constantly=
 subscribing to changes in the external view of a cluster. Here is the heli=
x log on the client after our rolling restart was finished - the client is =
constantly polling ZK. The zookeeper node is pushing 300mbps right now and =
most of the traffic is being pulled by clients. Is this a race condition - =
also is there an easy way to make the clients not poll so aggressively. We =
restarted one of the clients and we don't see these same messages anymore. =
Also is it possible to just propagate external view diffs instead of the wh=
ole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV=
IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA=
LVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-cha=
nge. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Routi=
ngTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV=
IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA=
LVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-cha=
nge. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Routi=
ngTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV=
IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA=
LVIEW listener:org.apache.helix.spectator.RoutingTableProvider


--_000_23CA11DC8830BA44A37C6B44B14D013CB5DC1CCBESV4MB02linkedi_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html dir=3D"ltr">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<style type=3D"text/css" id=3D"owaParaStyle"></style>
</head>
<body fpstyle=3D"1" ocsi=3D"0">
<div style=3D"direction: ltr;font-family: Tahoma;color: #000000;font-size: =
10pt;">Yes. It will get invoked when external views are added or deleted.<b=
r>
<div style=3D"font-family: Times New Roman; color: #000000; font-size: 16px=
">
<hr tabindex=3D"-1">
<div id=3D"divRpF880841" style=3D"direction: ltr;"><font face=3D"Tahoma" si=
ze=3D"2" color=3D"#000000"><b>From:</b> Varun Sharma [varun@pinterest.com]<=
br>
<b>Sent:</b> Thursday, February 05, 2015 1:27 AM<br>
<b>To:</b> user@helix.apache.org<br>
<b>Subject:</b> Re: Excessive ZooKeeper load<br>
</font><br>
</div>
<div></div>
<div>
<div dir=3D"ltr">I had another question - does the RoutingTableProvider onE=
xternalViewChange call get invoked when a resource gets deleted (and hence =
its external view znode) ?</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <spa=
n dir=3D"ltr">
&lt;<a href=3D"mailto:zzhang@linkedin.com" target=3D"_blank">zzhang@linkedi=
n.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div>
<div style=3D"direction:ltr; font-family:Tahoma; color:#000000; font-size:1=
0pt">Yes. I think we did this in the incubating stage or even before. It's =
probably in a separate branch for some performance evaluation.
<div><br>
<div style=3D"font-family:Times New Roman; color:#000000; font-size:16px">
<hr>
<div style=3D"direction:ltr"><font face=3D"Tahoma" color=3D"#000000"><b>Fro=
m:</b> kishore g [<a href=3D"mailto:g.kishore@gmail.com" target=3D"_blank">=
g.kishore@gmail.com</a>]<br>
<b>Sent:</b> Wednesday, February 04, 2015 9:54 PM
<div>
<div class=3D"h5"><br>
<b>To:</b> <a href=3D"mailto:user@helix.apache.org" target=3D"_blank">user@=
helix.apache.org</a><br>
<b>Subject:</b> Re: Excessive ZooKeeper load<br>
</div>
</div>
</font><br>
</div>
<div>
<div class=3D"h5">
<div></div>
<div>
<div dir=3D"ltr">
<div>Jason, I remember having the ability to compress/decompress and&nbsp; =
before we added the support to bucketize, compression was used to support l=
arge number of partitions. However I dont see the code anywhere. Did we do =
this on a separate branch?<br>
<br>
</div>
thanks,<br>
Kishore G<br>
</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <span=
 dir=3D"ltr">
&lt;<a href=3D"mailto:zzhang@linkedin.com" target=3D"_blank">zzhang@linkedi=
n.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div>
<div style=3D"direction:ltr; font-family:Tahoma; color:#000000; font-size:1=
0pt">Hi Varun, we can certainly add compression and have a config for turni=
ng it on/off. We do have implemented compression in our own zkclient before=
. The issue for compression might
 be:
<div>1) cpu consumption on controller will increase.</div>
<div>2) hard to debug</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Jason<br>
<div style=3D"font-family:Times New Roman; color:#000000; font-size:16px">
<hr>
<div style=3D"direction:ltr"><font face=3D"Tahoma" color=3D"#000000"><b>Fro=
m:</b> kishore g [<a href=3D"mailto:g.kishore@gmail.com" target=3D"_blank">=
g.kishore@gmail.com</a>]<br>
<b>Sent:</b> Wednesday, February 04, 2015 3:08 PM
<div>
<div><br>
<b>To:</b> <a href=3D"mailto:user@helix.apache.org" target=3D"_blank">user@=
helix.apache.org</a><br>
<b>Subject:</b> Re: Excessive ZooKeeper load<br>
</div>
</div>
</font><br>
</div>
<div>
<div>
<div></div>
<div>
<div dir=3D"ltr">we do have the ability to compress the data. I am not sure=
 if there is a easy way to turn on/off the compression.<br>
</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <sp=
an dir=3D"ltr">
&lt;<a href=3D"mailto:varun@pinterest.com" target=3D"_blank">varun@pinteres=
t.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">I am wondering if its possible to gzip the external view z=
node - a simple gzip cut down the data size by 25X. Is it possible to plug =
in compression/decompression as zookeeper nodes are read ?<span><font color=
=3D"#888888">
<div><br>
</div>
<div>Varun</div>
</font></span></div>
<div>
<div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Mon, Feb 2, 2015 at 8:53 PM, kishore g <span =
dir=3D"ltr">
&lt;<a href=3D"mailto:g.kishore@gmail.com" target=3D"_blank">g.kishore@gmai=
l.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<p dir=3D"ltr">There are multiple options we can try here.<br>
what if we used cacheddataaccessor for this use case?.clients will only rea=
d if node has changed. This optimization can benefit all use cases.</p>
<p dir=3D"ltr">What about batching the watch triggers. Not sure which versi=
on of helix has this option.</p>
<p dir=3D"ltr">Another option is to use a poll based roundtable instead of =
watch based. This can coupled with cacheddataaccessor can be over efficient=
.<br>
</p>
<p dir=3D"ltr">Thanks,<br>
Kishore G</p>
<div>
<div>
<div class=3D"gmail_quote">On Feb 2, 2015 8:17 PM, &quot;Varun Sharma&quot;=
 &lt;<a href=3D"mailto:varun@pinterest.com" target=3D"_blank">varun@pintere=
st.com</a>&gt; wrote:<br type=3D"attribution">
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">My total external view across all resources is roughly 3M =
in size and there are 100 clients downloading it twice for every node resta=
rt - thats 600M of data for every restart. So I guess that is causing this =
issue. We are thinking of doing some
 tricks to limit the # of clients to 1 from 100. I guess that should help s=
ignificantly.
<div><br>
</div>
<div>Varun</div>
</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <span=
 dir=3D"ltr">
&lt;<a href=3D"mailto:zzhang@linkedin.com" target=3D"_blank">zzhang@linkedi=
n.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div>
<div style=3D"direction:ltr; font-family:Tahoma; color:#000000; font-size:1=
0pt">Hey Varun,&nbsp;
<div><br>
</div>
<div>I guess your external view is pretty large, since each external view c=
allback takes ~3s.&nbsp;<span style=3D"font-size:10pt">The RoutingTableProv=
ider is callback based, so only when there is a change in the external view=
, RoutingTableProvider will read the entire
 external view from ZK. During the rolling upgrade, there are lots of live =
instance change, which may lead to a lot of changes in the external view. O=
ne possible way to mitigate the issue is to smooth the traffic by having so=
me delays in between bouncing nodes.
 We can do a rough estimation on how many external view changes you might h=
ave during the upgrade, how many listeners you have, and how large is the e=
xternal views. Once we have these numbers, we might know the ZK bandwidth r=
equirement. ZK read bandwidth can
 be scaled by adding ZK observers.</span></div>
<div><span style=3D"font-size:10pt"><br>
</span></div>
<div><span style=3D"font-size:10pt">ZK watcher is one time only, so every t=
ime a listener receives a callback, it will re-register its watcher again t=
o ZK.</span></div>
<div><span style=3D"font-size:10pt"><br>
</span></div>
<div><span style=3D"font-size:10pt">It's normally unreliable to depend on d=
elta changes instead of reading the entire znode. There might be some corne=
r cases where you would lose delta changes if you depend on that.</span></d=
iv>
<div><span style=3D"font-size:10pt"><br>
</span></div>
<div><span style=3D"font-size:10pt">For the ZK connection issue, do you hav=
e any log on the ZK server side regarding this connection?</span></div>
<div><span style=3D"font-size:10pt"><br>
</span></div>
<div><span style=3D"font-size:10pt">Thanks,</span></div>
<div><span style=3D"font-size:10pt">Jason</span></div>
<div><span style=3D"font-size:10pt"><br>
</span></div>
<div>
<div style=3D"font-family:Times New Roman; color:#000000; font-size:16px">
<hr>
<div style=3D"direction:ltr"><font face=3D"Tahoma" color=3D"#000000"><b>Fro=
m:</b> Varun Sharma [<a href=3D"mailto:varun@pinterest.com" target=3D"_blan=
k">varun@pinterest.com</a>]<br>
<b>Sent:</b> Monday, February 02, 2015 4:41 PM<br>
<b>To:</b> <a href=3D"mailto:user@helix.apache.org" target=3D"_blank">user@=
helix.apache.org</a><br>
<b>Subject:</b> Re: Excessive ZooKeeper load<br>
</font><br>
</div>
<div>
<div>
<div></div>
<div>
<div dir=3D"ltr">I believe there is a misbehaving client. Here is a stack t=
race - it probably lost connection and is now stampeding it:
<div><br>
</div>
<div>
<p>&quot;ZkClient-EventThread-104-terrapin<span>zk</span>001a:2181,terrapin=
<span>zk</span>002b:2181,terrapin<span>zk</span>003e:2181&quot; daemon prio=
=3D10 tid=3D0x00007f534144b800 nid=3D0x7db5 in Object.wait() [0x00007f52ca9=
c3000]</p>
<p>&nbsp;&nbsp; java.lang.Thread.State: WAITING (on object monitor)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at java.lang.Object.wait(Native Method)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at java.lang.Object.wait(Object.java:503)</p=
>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.zookeeper.ClientCnxn.submitReq=
uest(ClientCnxn.java:1309)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; - locked &lt;0x00000004fb0d8c38&gt; (a org.a=
pache.zookeeper.ClientCnxn$Packet)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.zookeeper.ZooKeeper.exists(Zoo=
Keeper.java:1036)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.zookeeper.ZooKeeper.exists(Zoo=
Keeper.java:1069)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span>zk</span>client.ZkConnec=
tion.exists(ZkConnection.java:95)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span>zk</span>client.ZkClient=
$11.call(ZkClient.java:823)</p>
<p><b>&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span>zk</span>client.ZkCli=
ent.retryUntilConnected(ZkClient.java:675)</b></p>
<p><b>&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span>zk</span>client.ZkCli=
ent.watchForData(ZkClient.java:820)</b></p>
<p><b>&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span>zk</span>client.ZkCli=
ent.subscribeDataChanges(ZkClient.java:136)</b></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.helix.manager.<span>zk</span>.=
CallbackHandler.subscribeDataChange(CallbackHandler.java:241)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.helix.manager.<span>zk</span>.=
CallbackHandler.subscribeForChanges(CallbackHandler.java:287)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.helix.manager.<span>zk</span>.=
CallbackHandler.invoke(CallbackHandler.java:202)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; - locked &lt;0x000000056b75a948&gt; (a org.a=
pache.helix.manager.<span>zk</span>.ZKHelixManager)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.apache.helix.manager.<span>zk</span>.=
CallbackHandler.handleDataChange(CallbackHandler.java:338)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span>zk</span>client.ZkClient=
$6.run(ZkClient.java:547)</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; at org.I0Itec.<span>zk</span>client.ZkEventT=
hread.run(ZkEventThread.java:71)</p>
</div>
</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <sp=
an dir=3D"ltr">
&lt;<a href=3D"mailto:varun@pinterest.com" target=3D"_blank">varun@pinteres=
t.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">I am wondering what is causing the zk subscription to happ=
en every 2-3 seconds - is this a new watch being established every 3 second=
s ?
<div><br>
</div>
<div>Thanks</div>
<span><font color=3D"#888888">
<div>Varun</div>
</font></span></div>
<div>
<div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <sp=
an dir=3D"ltr">
&lt;<a href=3D"mailto:varun@pinterest.com" target=3D"_blank">varun@pinteres=
t.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">Hi,
<div><br>
</div>
<div>We are serving a few different resources whose total # of partitions i=
s ~ 30K. We just did a rolling restart fo the cluster and the clients which=
 use the RoutingTableProvider are stuck in a bad state where they are const=
antly subscribing to changes in
 the external view of a cluster. Here is the helix log on the client after =
our rolling restart was finished - the client is constantly polling ZK. The=
 zookeeper node is pushing 300mbps right now and most of the traffic is bei=
ng pulled by clients. Is this a
 race condition - also is there an easy way to make the clients not poll so=
 aggressively. We restarted one of the clients and we don't see these same =
messages anymore. Also is it possible to just propagate external view diffs=
 instead of the whole big znode
 ?</div>
<div><br>
</div>
<div>15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTE=
RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 334=
0ms<br>
</div>
<div>
<p>15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE=
RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider</p>
<p>15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-=
change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Ro=
utingTableProvider@76984879</p>
<p>15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERN=
ALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371m=
s</p>
<p>15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE=
RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider</p>
<p>15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-=
change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Ro=
utingTableProvider@76984879</p>
<p>15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERN=
ALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281m=
s</p>
<p>15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE=
RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider</p>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</body>
</html>

--_000_23CA11DC8830BA44A37C6B44B14D013CB5DC1CCBESV4MB02linkedi_--