zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Timeouts and ping handling
Date Wed, 18 Jan 2012 22:41:12 GMT
Forgot to mention, use "stat" and some of the other 4letterwords to
get an idea what your request latency looks like across servers. In
particular you can see the "max latency" and correlate that with what
you're seeing on the clients & gc (etc...) activity.

Patrick

On Wed, Jan 18, 2012 at 2:34 PM, Patrick Hunt <phunt@apache.org> wrote:
> 5 seconds is fairly low. HBs are sent by the client every 1/3 the
> timeout, with expectation that it will get a response in another 1/3
> the timeout. if not the client session will time out.
>
> As a result, any blip of 1.5 sec or more btw the client and server
> could cause this to happen. Network latency, OS latency, ZK server
> latency, client latency etc....
>
> I suspect that you are being effected by GC pauses. Have you tuned the
> GC at all or just the defaults? Monitor the GC in the VM during
> operation and see if this is effecting you. At the very least you need
> to turn on parallel/CMS/incremental GC.
>
> Patrick
>
> On Wed, Jan 18, 2012 at 1:26 PM, Manosiz Bhattacharyya
> <manosizb@gmail.com> wrote:
>> Hello,
>>
>>  We are using Zookeeper-3.3.4 with client session timeouts of 5 seconds,
>> and we see frequent timeouts. We have a cluster of 50 nodes (3 of which are
>> ZK nodes) and each node has 5 client connections (a total of 250 connection
>> to the Ensemble). While investigating the zookeeper connections, we found
>> that sometimes pings sent from the zookeeper client does not return from
>> the server within 5 seconds, and the client connection gets disconnected.
>> Digging deeper it seems that pings are enqueued the same way as other
>> requests in the three stage request processing pipeline (prep, sync,
>> finalize) in zkserver. So if there are a lot of write operations from other
>> active sessions in front of a ping from an inactive session in the queues,
>> the inactive session could timeout.
>>
>> My question is whether we can return the ping request from the client
>> immediately from the server, as the purpose of the ping request seems to be
>> to treat it as an heartbeat from relatively inactive sessions. If we keep a
>> separate ping queue in the Prep phase which forwards it straight to the
>> finalize phase, possible requests before the ping which required I/O inside
>> the sync phase would not cause the client timeouts. I hope pings do not
>> generate any order in the database. I did take a cursory look at the code
>> and thought that could be done. Would really appreciate an opinion
>> regarding this.
>>
>> As an aside I should mention that increasing the session timeout to 20
>> seconds did improved the problem significantly. However as we are using
>> Zookeeper to monitor health of our components, increasing the timeout means
>> that we only get to know a component's death 20 seconds later. This is
>> something we would definitely try to avoid, and would like to go to the 5
>> second timeout.
>>
>> Regards,
>> Manosiz.

Mime
View raw message