Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F05317730 for ; Thu, 5 Feb 2015 17:28:22 +0000 (UTC) Received: (qmail 10361 invoked by uid 500); 5 Feb 2015 17:28:22 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 10318 invoked by uid 500); 5 Feb 2015 17:28:22 -0000 Mailing-List: contact user-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.apache.org Delivered-To: mailing list user@helix.apache.org Received: (qmail 10308 invoked by uid 99); 5 Feb 2015 17:28:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Feb 2015 17:28:21 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prvs=471969e83=zzhang@linkedin.com designates 69.28.149.80 as permitted sender) Received: from [69.28.149.80] (HELO esv4-mav04.corp.linkedin.com) (69.28.149.80) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Feb 2015 17:27:55 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linkedin.com; i=@linkedin.com; q=dns/txt; s=proddkim1024; t=1423157293; x=1454693293; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=yaa/KDWMmEpe1W29DCJ7roNfry2Ccbq0tNciPxMQdSE=; b=3HbpyDuRAIFXwoMXHffTo76+ksebE74LnFuEmGTNzCUV1UTmmYjKcDp9 ae9FVP5vZl/iXTVkolHo6SbX99x80XSxGdgjz7HPUTnX5+I0oaM0emLG4 U74TpfyJIV5xJPV5y5P7p8mO9cVAmbgb788AganGL/1QA3DTgkZquGxC1 4=; X-IronPort-AV: E=Sophos;i="5.09,525,1418112000"; d="scan'208,217";a="175539465" Received: from ESV4-MB02.linkedin.biz ([fe80::8093:3d15:3c8e:a479]) by esv4-cas02.linkedin.biz ([172.18.46.142]) with mapi id 14.03.0195.001; Thu, 5 Feb 2015 09:27:52 -0800 From: Zhen Zhang To: "user@helix.apache.org" Subject: RE: Excessive ZooKeeper load Thread-Topic: Excessive ZooKeeper load Thread-Index: AQHQP0gL7Qn9JslVqEiPFGJ8z6Z4g5zemFEAgAADjwD//6c+qoAAlPmAgAAKLwCAAr8DgIAABTYA//9+cYmAAPMHgP//ijR/ABYq1YD////uRg== Date: Thu, 5 Feb 2015 17:27:51 +0000 Message-ID: <23CA11DC8830BA44A37C6B44B14D013CB5DC1CCB@ESV4-MB02.linkedin.biz> References: <23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9@ESV4-MB02.linkedin.biz> <23CA11DC8830BA44A37C6B44B14D013CB5DC08E1@ESV4-MB02.linkedin.biz> <23CA11DC8830BA44A37C6B44B14D013CB5DC1BDD@ESV4-MB02.linkedin.biz>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.18.46.251] Content-Type: multipart/alternative; boundary="_000_23CA11DC8830BA44A37C6B44B14D013CB5DC1CCBESV4MB02linkedi_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_23CA11DC8830BA44A37C6B44B14D013CB5DC1CCBESV4MB02linkedi_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Yes. It will get invoked when external views are added or deleted. ________________________________ From: Varun Sharma [varun@pinterest.com] Sent: Thursday, February 05, 2015 1:27 AM To: user@helix.apache.org Subject: Re: Excessive ZooKeeper load I had another question - does the RoutingTableProvider onExternalViewChange= call get invoked when a resource gets deleted (and hence its external view= znode) ? On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang > wrote: Yes. I think we did this in the incubating stage or even before. It's proba= bly in a separate branch for some performance evaluation. ________________________________ From: kishore g [g.kishore@gmail.com] Sent: Wednesday, February 04, 2015 9:54 PM To: user@helix.apache.org Subject: Re: Excessive ZooKeeper load Jason, I remember having the ability to compress/decompress and before we = added the support to bucketize, compression was used to support large numbe= r of partitions. However I dont see the code anywhere. Did we do this on a = separate branch? thanks, Kishore G On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang > wrote: Hi Varun, we can certainly add compression and have a config for turning it= on/off. We do have implemented compression in our own zkclient before. The= issue for compression might be: 1) cpu consumption on controller will increase. 2) hard to debug Thanks, Jason ________________________________ From: kishore g [g.kishore@gmail.com] Sent: Wednesday, February 04, 2015 3:08 PM To: user@helix.apache.org Subject: Re: Excessive ZooKeeper load we do have the ability to compress the data. I am not sure if there is a ea= sy way to turn on/off the compression. On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma > wrote: I am wondering if its possible to gzip the external view znode - a simple g= zip cut down the data size by 25X. Is it possible to plug in compression/de= compression as zookeeper nodes are read ? Varun On Mon, Feb 2, 2015 at 8:53 PM, kishore g > wrote: There are multiple options we can try here. what if we used cacheddataaccessor for this use case?.clients will only rea= d if node has changed. This optimization can benefit all use cases. What about batching the watch triggers. Not sure which version of helix has= this option. Another option is to use a poll based roundtable instead of watch based. Th= is can coupled with cacheddataaccessor can be over efficient. Thanks, Kishore G On Feb 2, 2015 8:17 PM, "Varun Sharma" > wrote: My total external view across all resources is roughly 3M in size and there= are 100 clients downloading it twice for every node restart - thats 600M o= f data for every restart. So I guess that is causing this issue. We are thi= nking of doing some tricks to limit the # of clients to 1 from 100. I guess= that should help significantly. Varun On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang > wrote: Hey Varun, I guess your external view is pretty large, since each external view callba= ck takes ~3s. The RoutingTableProvider is callback based, so only when ther= e is a change in the external view, RoutingTableProvider will read the enti= re external view from ZK. During the rolling upgrade, there are lots of liv= e instance change, which may lead to a lot of changes in the external view.= One possible way to mitigate the issue is to smooth the traffic by having = some delays in between bouncing nodes. We can do a rough estimation on how = many external view changes you might have during the upgrade, how many list= eners you have, and how large is the external views. Once we have these num= bers, we might know the ZK bandwidth requirement. ZK read bandwidth can be = scaled by adding ZK observers. ZK watcher is one time only, so every time a listener receives a callback, = it will re-register its watcher again to ZK. It's normally unreliable to depend on delta changes instead of reading the = entire znode. There might be some corner cases where you would lose delta c= hanges if you depend on that. For the ZK connection issue, do you have any log on the ZK server side rega= rding this connection? Thanks, Jason ________________________________ From: Varun Sharma [varun@pinterest.com] Sent: Monday, February 02, 2015 4:41 PM To: user@helix.apache.org Subject: Re: Excessive ZooKeeper load I believe there is a misbehaving client. Here is a stack trace - it probabl= y lost connection and is now stampeding it: "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinz= k003e:2181" daemon prio=3D10 tid=3D0x00007f534144b800 nid=3D0x7db5 in Objec= t.wait() [0x00007f52ca9c3000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:13= 09) - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Pa= cket) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95) at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:6= 75) at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820) at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:= 136) at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(= CallbackHandler.java:241) at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(= CallbackHandler.java:287) at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandl= er.java:202) - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHeli= xManager) at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(Cal= lbackHandler.java:338) at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma > wrote: I am wondering what is causing the zk subscription to happen every 2-3 seco= nds - is this a new watch being established every 3 seconds ? Thanks Varun On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma > wrote: Hi, We are serving a few different resources whose total # of partitions is ~ 3= 0K. We just did a rolling restart fo the cluster and the clients which use = the RoutingTableProvider are stuck in a bad state where they are constantly= subscribing to changes in the external view of a cluster. Here is the heli= x log on the client after our rolling restart was finished - the client is = constantly polling ZK. The zookeeper node is pushing 300mbps right now and = most of the traffic is being pulled by clients. Is this a race condition - = also is there an easy way to make the clients not poll so aggressively. We = restarted one of the clients and we don't see these same messages anymore. = Also is it possible to just propagate external view diffs instead of the wh= ole big znode ? 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV= IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA= LVIEW listener:org.apache.helix.spectator.RoutingTableProvider 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-cha= nge. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Routi= ngTableProvider@76984879 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV= IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA= LVIEW listener:org.apache.helix.spectator.RoutingTableProvider 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-cha= nge. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Routi= ngTableProvider@76984879 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV= IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA= LVIEW listener:org.apache.helix.spectator.RoutingTableProvider --_000_23CA11DC8830BA44A37C6B44B14D013CB5DC1CCBESV4MB02linkedi_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Yes. It will get invoked when external views are added or deleted.
From: Varun Sharma [varun@pinterest.com]<= br> Sent: Thursday, February 05, 2015 1:27 AM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

I had another question - does the RoutingTableProvider onE= xternalViewChange call get invoked when a resource gets deleted (and hence = its external view znode) ?

On Wed, Feb 4, 2015 at 10:54 PM, Zhen Zhang <zzhang@linkedi= n.com> wrote:
Yes. I think we did this in the incubating stage or even before. It's = probably in a separate branch for some performance evaluation.


Fro= m: kishore g [= g.kishore@gmail.com]
Sent: Wednesday, February 04, 2015 9:54 PM

To: user@= helix.apache.org
Subject: Re: Excessive ZooKeeper load

Jason, I remember having the ability to compress/decompress and  = before we added the support to bucketize, compression was used to support l= arge number of partitions. However I dont see the code anywhere. Did we do = this on a separate branch?

thanks,
Kishore G

On Wed, Feb 4, 2015 at 3:30 PM, Zhen Zhang <zzhang@linkedi= n.com> wrote:
Hi Varun, we can certainly add compression and have a config for turni= ng it on/off. We do have implemented compression in our own zkclient before= . The issue for compression might be:
1) cpu consumption on controller will increase.
2) hard to debug

Thanks,
Jason

Fro= m: kishore g [= g.kishore@gmail.com]
Sent: Wednesday, February 04, 2015 3:08 PM

To: user@= helix.apache.org
Subject: Re: Excessive ZooKeeper load

we do have the ability to compress the data. I am not sure= if there is a easy way to turn on/off the compression.

On Wed, Feb 4, 2015 at 2:49 PM, Varun Sharma <varun@pinteres= t.com> wrote:
I am wondering if its possible to gzip the external view z= node - a simple gzip cut down the data size by 25X. Is it possible to plug = in compression/decompression as zookeeper nodes are read ?

Varun

On Mon, Feb 2, 2015 at 8:53 PM, kishore g <g.kishore@gmai= l.com> wrote:

There are multiple options we can try here.
what if we used cacheddataaccessor for this use case?.clients will only rea= d if node has changed. This optimization can benefit all use cases.

What about batching the watch triggers. Not sure which versi= on of helix has this option.

Another option is to use a poll based roundtable instead of = watch based. This can coupled with cacheddataaccessor can be over efficient= .

Thanks,
Kishore G

On Feb 2, 2015 8:17 PM, "Varun Sharma"= <varun@pintere= st.com> wrote:
My total external view across all resources is roughly 3M = in size and there are 100 clients downloading it twice for every node resta= rt - thats 600M of data for every restart. So I guess that is causing this = issue. We are thinking of doing some tricks to limit the # of clients to 1 from 100. I guess that should help s= ignificantly.

Varun

On Mon, Feb 2, 2015 at 7:37 PM, Zhen Zhang <zzhang@linkedi= n.com> wrote:
Hey Varun, 

I guess your external view is pretty large, since each external view c= allback takes ~3s. The RoutingTableProv= ider is callback based, so only when there is a change in the external view= , RoutingTableProvider will read the entire external view from ZK. During the rolling upgrade, there are lots of live = instance change, which may lead to a lot of changes in the external view. O= ne possible way to mitigate the issue is to smooth the traffic by having so= me delays in between bouncing nodes. We can do a rough estimation on how many external view changes you might h= ave during the upgrade, how many listeners you have, and how large is the e= xternal views. Once we have these numbers, we might know the ZK bandwidth r= equirement. ZK read bandwidth can be scaled by adding ZK observers.

ZK watcher is one time only, so every t= ime a listener receives a callback, it will re-register its watcher again t= o ZK.

It's normally unreliable to depend on d= elta changes instead of reading the entire znode. There might be some corne= r cases where you would lose delta changes if you depend on that.

For the ZK connection issue, do you hav= e any log on the ZK server side regarding this connection?

Thanks,
Jason


Fro= m: Varun Sharma [varun@pinterest.com]
Sent: Monday, February 02, 2015 4:41 PM
To: user@= helix.apache.org
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack t= race - it probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapin= zk002b:2181,terrapinzk003e:2181" daemon prio= =3D10 tid=3D0x00007f534144b800 nid=3D0x7db5 in Object.wait() [0x00007f52ca9= c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at org.apache.zookeeper.ClientCnxn.submitReq= uest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a org.a= pache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(Zoo= Keeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(Zoo= Keeper.java:1069)

        at org.I0Itec.zkclient.ZkConnec= tion.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient= $11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkCli= ent.retryUntilConnected(ZkClient.java:675)

        at org.I0Itec.zkclient.ZkCli= ent.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkCli= ent.subscribeDataChanges(ZkClient.java:136)

        at org.apache.helix.manager.zk.= CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk.= CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk.= CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.a= pache.helix.manager.zk.ZKHelixManager)

        at org.apache.helix.manager.zk.= CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient= $6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventT= hread.run(ZkEventThread.java:71)


On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <varun@pinteres= t.com> wrote:
I am wondering what is causing the zk subscription to happ= en every 2-3 seconds - is this a new watch being established every 3 second= s ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <varun@pinteres= t.com> wrote:
Hi,

We are serving a few different resources whose total # of partitions i= s ~ 30K. We just did a rolling restart fo the cluster and the clients which= use the RoutingTableProvider are stuck in a bad state where they are const= antly subscribing to changes in the external view of a cluster. Here is the helix log on the client after = our rolling restart was finished - the client is constantly polling ZK. The= zookeeper node is pushing 300mbps right now and most of the traffic is bei= ng pulled by clients. Is this a race condition - also is there an easy way to make the clients not poll so= aggressively. We restarted one of the clients and we don't see these same = messages anymore. Also is it possible to just propagate external view diffs= instead of the whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTE= RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 334= 0ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE= RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-= change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Ro= utingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERN= ALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371m= s

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE= RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-= change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Ro= utingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERN= ALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281m= s

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE= RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider










--_000_23CA11DC8830BA44A37C6B44B14D013CB5DC1CCBESV4MB02linkedi_--