Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A12BC10B7E for ; Tue, 3 Feb 2015 03:38:51 +0000 (UTC) Received: (qmail 68588 invoked by uid 500); 3 Feb 2015 03:38:52 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 68543 invoked by uid 500); 3 Feb 2015 03:38:52 -0000 Mailing-List: contact user-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.apache.org Delivered-To: mailing list user@helix.apache.org Received: (qmail 68533 invoked by uid 99); 3 Feb 2015 03:38:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Feb 2015 03:38:51 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,HTML_OBFUSCATE_05_10,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prvs=4696898da=zzhang@linkedin.com designates 69.28.149.80 as permitted sender) Received: from [69.28.149.80] (HELO esv4-mav04.corp.linkedin.com) (69.28.149.80) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Feb 2015 03:38:26 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linkedin.com; i=@linkedin.com; q=dns/txt; s=proddkim1024; t=1422934725; x=1454470725; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=DbY8HqnH9s/8dzLPPOvv08oyvnuFDWbK7wsC42KRgEs=; b=RddHdBMYSCruTD+kUxn24L3wtczI2q1XNys4jqCAwku1uSVO5q1XslyG ayezNuduSVayLa6J784cEnSR0E0Hw4r5H4SGSyxdJRUW7UQtc4qJNuOip 8KExlppqdOYuD6dpzIJLjI7zH1bZs6iiqcdoPFEzsRSKwD6/BnB+TjAs6 Y=; X-IronPort-AV: E=Sophos;i="5.09,510,1418112000"; d="scan'208,217";a="174672720" Received: from ESV4-MB02.linkedin.biz ([fe80::8093:3d15:3c8e:a479]) by ESV4-HT02.linkedin.biz ([::1]) with mapi id 14.03.0195.001; Mon, 2 Feb 2015 19:37:23 -0800 From: Zhen Zhang To: "user@helix.apache.org" Subject: RE: Excessive ZooKeeper load Thread-Topic: Excessive ZooKeeper load Thread-Index: AQHQP0gL7Qn9JslVqEiPFGJ8z6Z4g5zemFEAgAADjwD//6c+qg== Date: Tue, 3 Feb 2015 03:37:24 +0000 Message-ID: <23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9@ESV4-MB02.linkedin.biz> References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.18.46.254] Content-Type: multipart/alternative; boundary="_000_23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9ESV4MB02linkedi_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9ESV4MB02linkedi_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hey Varun, I guess your external view is pretty large, since each external view callba= ck takes ~3s. The RoutingTableProvider is callback based, so only when ther= e is a change in the external view, RoutingTableProvider will read the enti= re external view from ZK. During the rolling upgrade, there are lots of liv= e instance change, which may lead to a lot of changes in the external view.= One possible way to mitigate the issue is to smooth the traffic by having = some delays in between bouncing nodes. We can do a rough estimation on how = many external view changes you might have during the upgrade, how many list= eners you have, and how large is the external views. Once we have these num= bers, we might know the ZK bandwidth requirement. ZK read bandwidth can be = scaled by adding ZK observers. ZK watcher is one time only, so every time a listener receives a callback, = it will re-register its watcher again to ZK. It's normally unreliable to depend on delta changes instead of reading the = entire znode. There might be some corner cases where you would lose delta c= hanges if you depend on that. For the ZK connection issue, do you have any log on the ZK server side rega= rding this connection? Thanks, Jason ________________________________ From: Varun Sharma [varun@pinterest.com] Sent: Monday, February 02, 2015 4:41 PM To: user@helix.apache.org Subject: Re: Excessive ZooKeeper load I believe there is a misbehaving client. Here is a stack trace - it probabl= y lost connection and is now stampeding it: "ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinz= k003e:2181" daemon prio=3D10 tid=3D0x00007f534144b800 nid=3D0x7db5 in Objec= t.wait() [0x00007f52ca9c3000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:13= 09) - locked <0x00000004fb0d8c38> (a org.apache.zookeeper.ClientCnxn$Pa= cket) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95) at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:6= 75) at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820) at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:= 136) at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(= CallbackHandler.java:241) at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(= CallbackHandler.java:287) at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandl= er.java:202) - locked <0x000000056b75a948> (a org.apache.helix.manager.zk.ZKHeli= xManager) at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(Cal= lbackHandler.java:338) at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma > wrote: I am wondering what is causing the zk subscription to happen every 2-3 seco= nds - is this a new watch being established every 3 seconds ? Thanks Varun On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma > wrote: Hi, We are serving a few different resources whose total # of partitions is ~ 3= 0K. We just did a rolling restart fo the cluster and the clients which use = the RoutingTableProvider are stuck in a bad state where they are constantly= subscribing to changes in the external view of a cluster. Here is the heli= x log on the client after our rolling restart was finished - the client is = constantly polling ZK. The zookeeper node is pushing 300mbps right now and = most of the traffic is being pulled by clients. Is this a race condition - = also is there an easy way to make the clients not poll so aggressively. We = restarted one of the clients and we don't see these same messages anymore. = Also is it possible to just propagate external view diffs instead of the wh= ole big znode ? 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV= IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA= LVIEW listener:org.apache.helix.spectator.RoutingTableProvider 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-cha= nge. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Routi= ngTableProvider@76984879 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV= IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA= LVIEW listener:org.apache.helix.spectator.RoutingTableProvider 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-cha= nge. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Routi= ngTableProvider@76984879 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERNALV= IEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTERNA= LVIEW listener:org.apache.helix.spectator.RoutingTableProvider --_000_23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9ESV4MB02linkedi_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hey Varun, 

I guess your external view is pretty large, since each external view c= allback takes ~3s. The RoutingTablePr= ovider is callback based, so only when there is a change in the external vi= ew, RoutingTableProvider will read the entire external view from ZK. During the rolling upgrade, there are lots o= f live instance change, which may lead to a lot of changes in the external = view. One possible way to mitigate the issue is to smooth the traffic by ha= ving some delays in between bouncing nodes. We can do a rough estimation on how many external view changes you = might have during the upgrade, how many listeners you have, and how large i= s the external views. Once we have these numbers, we might know the ZK band= width requirement. ZK read bandwidth can be scaled by adding ZK observers.

ZK watcher is one time only, so every= time a listener receives a callback, it will re-register its watcher again= to ZK.

It's normally unreliable to depend on= delta changes instead of reading the entire znode. There might be some cor= ner cases where you would lose delta changes if you depend on that.<= /div>

For the ZK connection issue, do you h= ave any log on the ZK server side regarding this connection?

Thanks,
Jason

From: Varun Sharma [varun@pinterest.com]<= br> Sent: Monday, February 02, 2015 4:41 PM
To: user@helix.apache.org
Subject: Re: Excessive ZooKeeper load

I believe there is a misbehaving client. Here is a stack t= race - it probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181" daemon prio=3D10 tid=3D0x00007f534144b800 = nid=3D0x7db5 in Object.wait() [0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monit= or)

        at java.lang.Object.wait(Native M= ethod)

        at java.lang.Object.wait(Object.j= ava:503)

        at org.apache.zookeeper.ClientCnx= n.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38&g= t; (a org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper= .exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper= .exists(ZooKeeper.java:1069)

        at org.I0Itec.zk= client.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zk= client.ZkClient$11.call(ZkClient.java:823)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)

        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)

        at org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeDataChange(CallbackHandler.j= ava:241)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.j= ava:287)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948&g= t; (a org.apache.helix.manager.zk.ZKHelixManager)

        at org.apache.helix.manager.zk.CallbackHandler.handleDataChange(CallbackHandler.java= :338)

        at org.I0Itec.zk= client.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zk= client.ZkEventThread.run(ZkEventThread.java:71)


On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <varun@pinteres= t.com> wrote:
I am wondering what is causing the zk subscription to happ= en every 2-3 seconds - is this a new watch being established every 3 second= s ?

Thanks
Varun

On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <varun@pinteres= t.com> wrote:
Hi,

We are serving a few different resources whose total # of partitions i= s ~ 30K. We just did a rolling restart fo the cluster and the clients which= use the RoutingTableProvider are stuck in a bad state where they are const= antly subscribing to changes in the external view of a cluster. Here is the helix log on the client after = our rolling restart was finished - the client is constantly polling ZK. The= zookeeper node is pushing 300mbps right now and most of the traffic is bei= ng pulled by clients. Is this a race condition - also is there an easy way to make the clients not poll so= aggressively. We restarted one of the clients and we don't see these same = messages anymore. Also is it possible to just propagate external view diffs= instead of the whole big znode ?

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTE= RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 334= 0ms

15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE= RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes child-= change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Ro= utingTableProvider@76984879

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERN= ALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371m= s

15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE= RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider

15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes child-= change. path: /main_a/EXTERNALVIEW, listener: org.apache.helix.spectator.Ro= utingTableProvider@76984879

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE /main_a/EXTERN= ALVIEW listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281m= s

15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE /main_a/EXTE= RNALVIEW listener:org.apache.helix.spectator.RoutingTableProvider





--_000_23CA11DC8830BA44A37C6B44B14D013CB5DBE8D9ESV4MB02linkedi_--