helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Sharma <va...@pinterest.com>
Subject Re: ZooKeeper disconnects on controller
Date Tue, 05 May 2015 05:38:17 GMT
Here is the output:

Latency min/avg/max: 0/0/3721

Received: 62699205

Sent: 64274675

Connections: 114

Outstanding: 0

Zxid: 0x404e59e28

Mode: follower

Node count: 19162


Latency min/avg/max: 0/4/20127

Received: 47836739

Sent: 48119769

Connections: 242

Outstanding: 0

Zxid: 0x404e59e28

Mode: follower

Node count: 19162


Latency min/avg/max: 0/5/7755

Received: 233509265

Sent: 236291553

Connections: 95

Outstanding: 0

Zxid: 0x404e59e28

Mode: leader

Node count: 19162

On Sat, May 2, 2015 at 1:39 PM, Zhen Zhang <nehzgnahz@gmail.com> wrote:

> you may also check zookeeper log to see if there is any error/exception
> messages
>
> On Sat, May 2, 2015 at 1:08 PM, kishore g <g.kishore@gmail.com> wrote:
>
>> Is zookeeper quorum working fine?. Can you run each stat| nc zkhost
>> zkPort for each zk server and paste the output.
>>  On May 2, 2015 1:02 PM, "Varun Sharma" <varun@pinterest.com> wrote:
>>
>>> We are also seeing that all our machines (participants and controller)
>>> are connecting to the same zookeeper machine which is rather weird - it
>>> also makes it hard to scale up traffic via observers. Is the following the
>>> right way to pass the zookeeper string (with comma separation):
>>>
>>> zk001:2181, zk002:2181,zk003:2181
>>>
>>> Thanks
>>> Varun
>>>
>>> On Fri, May 1, 2015 at 3:32 PM, Varun Sharma <varun@pinterest.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are seeing zookeeper disconnects on the controller and the
>>>> controller gets into a state from which it cannot reconnect back. We see
>>>> messages like the ones below over and over again. It keeps trying to
>>>> re-establish connections against the same session ID and keeps failing. On
>>>> the other hand, the participants see one hiccup while in their zookeeper
>>>> connection but gracefully reconnect back. What would cause the controller
>>>> to keep retrying but failing to connect even after the zookeeper comes back
>>>> to a healthy state ?
>>>>
>>>> 2015-05-01 20:47:02,865 [main-SendThread(terrapinzk001a:2181)]
>>>> (ClientCnxn.java:1061) INFO  Opening socket connection to server
>>>> terrapinzk001a/10.115.59.31:2181
>>>>
>>>> 2015-05-01 20:47:02,866 [main-SendThread(terrapinzk001a:2181)]
>>>> (ClientCnxn.java:950) INFO  Socket connection established to terrapinzk001a/
>>>> 10.115.59.31:2181, initiating session
>>>>
>>>> 2015-05-01 20:47:02,880 [main-SendThread(terrapinzk001a:2181)]
>>>> (ClientCnxn.java:739) INFO  Session establishment complete on server
>>>> terrapinzk001a/10.115.59.31:2181, sessionid = 0x14d111892390023,
>>>> negotiated timeout = 30000
>>>>
>>>> 2015-05-01 20:47:02,884 [main-EventThread] (ZkClient.java:449) INFO
>>>> zookeeper state changed (SyncConnected)
>>>>
>>>> 2015-05-01 20:47:02,884 [main-SendThread(terrapinzk001a:2181)]
>>>> (ClientCnxn.java:1186) INFO  Unable to read additional data from server
>>>> sessionid 0x14d111892390023, likely server has closed socket, closing
>>>> socket connection and attempting reconnect
>>>>
>>>> 2015-05-01 20:47:02,988 [main-EventThread] (ZkClient.java:449) INFO
>>>> zookeeper state changed (Disconnected)
>>>>
>>>
>>>
>

Mime
View raw message