zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Timeouts and ping handling
Date Thu, 19 Jan 2012 01:49:25 GMT
On Wed, Jan 18, 2012 at 4:47 PM, Manosiz Bhattacharyya
<manosizb@gmail.com> wrote:
> Thanks Patrick for your answer,

No problem.

> Actually we are in a virtualized environment, we have a FIO disk for
> transactional logs. It does have some latency sometimes during FIO garbage
> collection. We know this could be the potential issue, but was trying to
> workaround that.

Ah, I see. I saw something very similar to this recently with SSDs
used for the datadir. The fdatasync latency was sometimes > 10
seconds. I suspect it happened as a result of disk GC activity.

I was able to identify the problem by running something like this:

sudo strace -r -T -f -p 8066 -e trace=fsync,fdatasync -o trace.txt

and then graphing the results (log scale). You should try running this
against your servers to confirm that it is indeed the problem.

> We were trying to qualify the requests into two types - either HB's or
> normal requests. Isn't it better to reject normal requests if the queue
> size is full to say a certain threshold, but keep the session alive. That
> way the flow control can be achieved with the users session retrying the
> operation, but the session health would be maintained.

What good is a session (connection) that's not usable? You're better
off disconnecting and re-establishing with a server that can process
your requests in a timely fashion.

ZK looks at availability from a service perspective, not from an
individual session/connection perspective. The whole more important
than the parts. There already is very sophisticated flow control going
on - e.g. the sessions shut down and stop reading requests when the
number of outstanding requests on a server exceeds some threshold.
Once the server catches up it starts reading again. Again - checkout
your "stat" results for insight into this. (ie "outstanding requests")


View raw message