helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Sharma <va...@pinterest.com>
Subject Re: Excessive ZooKeeper load
Date Tue, 03 Feb 2015 00:41:36 GMT
I believe there is a misbehaving client. Here is a stack trace - it
probably lost connection and is now stampeding it:

"ZkClient-EventThread-104-terrapinzk001a:2181,terrapinzk002b:2181,terrapinzk003e:2181"
daemon prio=10 tid=0x00007f534144b800 nid=0x7db5 in Object.wait()
[0x00007f52ca9c3000]

   java.lang.Thread.State: WAITING (on object monitor)

        at java.lang.Object.wait(Native Method)

        at java.lang.Object.wait(Object.java:503)

        at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)

        - locked <0x00000004fb0d8c38> (a
org.apache.zookeeper.ClientCnxn$Packet)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)

        at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:823)

*        at
org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)*

*        at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:820)*

*        at
org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:136)*

        at org.apache.helix.manager.zk
.CallbackHandler.subscribeDataChange(CallbackHandler.java:241)

        at org.apache.helix.manager.zk
.CallbackHandler.subscribeForChanges(CallbackHandler.java:287)

        at org.apache.helix.manager.zk
.CallbackHandler.invoke(CallbackHandler.java:202)

        - locked <0x000000056b75a948> (a org.apache.helix.manager.zk
.ZKHelixManager)

        at org.apache.helix.manager.zk
.CallbackHandler.handleDataChange(CallbackHandler.java:338)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:547)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

On Mon, Feb 2, 2015 at 4:28 PM, Varun Sharma <varun@pinterest.com> wrote:

> I am wondering what is causing the zk subscription to happen every 2-3
> seconds - is this a new watch being established every 3 seconds ?
>
> Thanks
> Varun
>
> On Mon, Feb 2, 2015 at 4:23 PM, Varun Sharma <varun@pinterest.com> wrote:
>
>> Hi,
>>
>> We are serving a few different resources whose total # of partitions is ~
>> 30K. We just did a rolling restart fo the cluster and the clients which use
>> the RoutingTableProvider are stuck in a bad state where they are constantly
>> subscribing to changes in the external view of a cluster. Here is the helix
>> log on the client after our rolling restart was finished - the client is
>> constantly polling ZK. The zookeeper node is pushing 300mbps right now and
>> most of the traffic is being pulled by clients. Is this a race condition -
>> also is there an easy way to make the clients not poll so aggressively. We
>> restarted one of the clients and we don't see these same messages anymore.
>> Also is it possible to just propagate external view diffs instead of the
>> whole big znode ?
>>
>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 END:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3340ms
>>
>> 15/02/03 00:21:18 INFO zk.CallbackHandler: 104 START:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider
>>
>> 15/02/03 00:21:18 INFO zk.CallbackHandler: pinacle2084 subscribes
>> child-change. path: /main_a/EXTERNALVIEW, listener:
>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>
>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 END:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3371ms
>>
>> 15/02/03 00:21:22 INFO zk.CallbackHandler: 104 START:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider
>>
>> 15/02/03 00:21:22 INFO zk.CallbackHandler: pinacle2084 subscribes
>> child-change. path: /main_a/EXTERNALVIEW, listener:
>> org.apache.helix.spectator.RoutingTableProvider@76984879
>>
>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 END:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider Took: 3281ms
>>
>> 15/02/03 00:21:25 INFO zk.CallbackHandler: 104 START:INVOKE
>> /main_a/EXTERNALVIEW
>> listener:org.apache.helix.spectator.RoutingTableProvider
>>
>>
>>
>

Mime
View raw message