zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manosiz Bhattacharyya <manos...@gmail.com>
Subject Re: Timeouts and ping handling
Date Thu, 19 Jan 2012 02:18:17 GMT
I will do as you mention.

We are using the async API's throughout. Also we do not write too much data
into Zookeeper. We just use it for leadership elections and health
monitoring, which is why we see the timeouts typically on idle zookeeper
connections.

The reason why we want the sessions to be alive is because of the
leadership election algorithm that we use from the zookeeper recipe. So if
a connection is broken for the leader node, the ephemeral node that
guaranteed its leadership is lost, and reconnecting will create a new node
which does not guarantee leadership. We then have to re-elect a new leader
- which requires significant work. The bigger the timeout, bigger is the
time the cluster stays without a master for a particular service, as the
old master cannot keep on working once it has known its session is gone and
with it, its ephemeral node. As we are trying to have highly available
service (not internet scale, but at the scale of a storage system with ms
latencies typically), we thought about reducing the timeout, but keeping
the session open. Also note the node that typically is the master does not
write too often into zookeeper.

Thanks,
Manosiz.

On Wed, Jan 18, 2012 at 5:49 PM, Patrick Hunt <phunt@apache.org> wrote:

> On Wed, Jan 18, 2012 at 4:47 PM, Manosiz Bhattacharyya
> <manosizb@gmail.com> wrote:
> > Thanks Patrick for your answer,
>
> No problem.
>
> > Actually we are in a virtualized environment, we have a FIO disk for
> > transactional logs. It does have some latency sometimes during FIO
> garbage
> > collection. We know this could be the potential issue, but was trying to
> > workaround that.
>
> Ah, I see. I saw something very similar to this recently with SSDs
> used for the datadir. The fdatasync latency was sometimes > 10
> seconds. I suspect it happened as a result of disk GC activity.
>
> I was able to identify the problem by running something like this:
>
> sudo strace -r -T -f -p 8066 -e trace=fsync,fdatasync -o trace.txt
>
> and then graphing the results (log scale). You should try running this
> against your servers to confirm that it is indeed the problem.
>
> > We were trying to qualify the requests into two types - either HB's or
> > normal requests. Isn't it better to reject normal requests if the queue
> > size is full to say a certain threshold, but keep the session alive. That
> > way the flow control can be achieved with the users session retrying the
> > operation, but the session health would be maintained.
>
> What good is a session (connection) that's not usable? You're better
> off disconnecting and re-establishing with a server that can process
> your requests in a timely fashion.
>
> ZK looks at availability from a service perspective, not from an
> individual session/connection perspective. The whole more important
> than the parts. There already is very sophisticated flow control going
> on - e.g. the sessions shut down and stop reading requests when the
> number of outstanding requests on a server exceeds some threshold.
> Once the server catches up it starts reading again. Again - checkout
> your "stat" results for insight into this. (ie "outstanding requests")
>
> Patrick
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message