zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Whitney, Adam" <Adam.Whit...@sony.com>
Subject RE: zookeeper client seems to timeout earlier than it should
Date Wed, 02 Nov 2016 22:52:57 GMT
Thanks Ben,

That makes sense.

As for why this timeout keeps on happening ... I'm wondering if I'm running into a swapping
issue because ZooKeeper doesn't have a max heap size specified ... and this host has 10GB
of RAM ... so the zookeeper process is running currently with 4980MB of RAM with 104MB resident
(according to top) ... 4980MB is a bit excessive as I'm only using zookeeper to support replicated
leveldb in activemq.

How could I tell if swapping is causing my disconnects?

Also, is anyone familiar with using zookeeper to support replicated leveldb in activemq? If
so, is 1GB of heap space enough for zookeeper to support that? That is all we're using this
zookeeper for, so it seems like ~5GB of heap might be a bit excessive. For comparison, we've
been running this setup in another datacenter where zookeeper hosts only have 2GB of RAM and
it ran fine there ... but those hosts aren't running anymore and since we didn't specify the
JVM heap size I'm not sure how much RAM zookeeper was actually using ... but I'm guessing
it was somewhere near 1GB (1/2 of RAM)?


-----Original Message-----
From: Benjamin Reed [mailto:breed@apache.org] 
Sent: Wednesday, November 02, 2016 3:02 PM
To: user@zookeeper.apache.org
Subject: Re: zookeeper client seems to timeout earlier than it should

clients need to make sure they move off of a dead server on to a new one to keep their connection
alive, so generally if the client hasn't heard from the server in 2/3 * sessionTimeout it
will try to connect to someone else. if it waited the whole 4 seconds, when connected to an
active server it would be pronounced dead on arrival.


On Wed, Nov 2, 2016 at 5:11 PM, Whitney, Adam <Adam.Whitney@sony.com> wrote:
> (Sorry if this is a repost … I got a strange response to my original 
> email so I’m not sure if it went through or not)
> I have a zookeeper cluster with 3 nodes and tick time set to 2s
> When a client connects to the cluster I see a log entry like this:
> INFO  | Session establishment complete on server XXX, sessionid = XXX, 
> negotiated timeout = 4000 | org.apache.zookeeper.ClientCnxn | 
> main-SendThread(XXX:2181)
> Notice the "negotiated timeout = 4000"
> But about once a day I see a log entry like this:
> INFO  | Client session timed out, have not heard from server in 2953ms 
> for sessionid XXX, closing socket connection and attempting reconnect 
> | org.apache.zookeeper.ClientCnxn | main-SendThread(XXX:2181)
> Why would the client (apparently) timeout the session after only 2953ms if the negotiated
timeout was 4000ms?

View raw message