zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manosiz Bhattacharyya <manos...@gmail.com>
Subject Re: Timeouts and ping handling
Date Thu, 19 Jan 2012 17:31:56 GMT
I do not think that there is a problem with the queue size. I guess the
problem is more with latency when the Fusion I/O goes in for a GC. We are
enabling stats on the Zookeeper and the fusion I/O to be more precise. Does
Zookeeper typically do only sequential I/O, or does it do some random too.
We could then move the logs to a disk.

Thanks,
Manosiz.

On Wed, Jan 18, 2012 at 10:18 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> If you aren't pushing much data through ZK, there is almost no way that the
> request queue can fill up without the log or snapshot disks being slow.
>  See what happens if you put the log into a real disk or (heaven help us)
> onto a tmpfs partition.
>
> On Thu, Jan 19, 2012 at 2:18 AM, Manosiz Bhattacharyya
> <manosizb@gmail.com>wrote:
>
> > I will do as you mention.
> >
> > We are using the async API's throughout. Also we do not write too much
> data
> > into Zookeeper. We just use it for leadership elections and health
> > monitoring, which is why we see the timeouts typically on idle zookeeper
> > connections.
> >
> > The reason why we want the sessions to be alive is because of the
> > leadership election algorithm that we use from the zookeeper recipe. So
> if
> > a connection is broken for the leader node, the ephemeral node that
> > guaranteed its leadership is lost, and reconnecting will create a new
> node
> > which does not guarantee leadership. We then have to re-elect a new
> leader
> > - which requires significant work. The bigger the timeout, bigger is the
> > time the cluster stays without a master for a particular service, as the
> > old master cannot keep on working once it has known its session is gone
> and
> > with it, its ephemeral node. As we are trying to have highly available
> > service (not internet scale, but at the scale of a storage system with ms
> > latencies typically), we thought about reducing the timeout, but keeping
> > the session open. Also note the node that typically is the master does
> not
> > write too often into zookeeper.
> >
> > Thanks,
> > Manosiz.
> >
> > On Wed, Jan 18, 2012 at 5:49 PM, Patrick Hunt <phunt@apache.org> wrote:
> >
> > > On Wed, Jan 18, 2012 at 4:47 PM, Manosiz Bhattacharyya
> > > <manosizb@gmail.com> wrote:
> > > > Thanks Patrick for your answer,
> > >
> > > No problem.
> > >
> > > > Actually we are in a virtualized environment, we have a FIO disk for
> > > > transactional logs. It does have some latency sometimes during FIO
> > > garbage
> > > > collection. We know this could be the potential issue, but was trying
> > to
> > > > workaround that.
> > >
> > > Ah, I see. I saw something very similar to this recently with SSDs
> > > used for the datadir. The fdatasync latency was sometimes > 10
> > > seconds. I suspect it happened as a result of disk GC activity.
> > >
> > > I was able to identify the problem by running something like this:
> > >
> > > sudo strace -r -T -f -p 8066 -e trace=fsync,fdatasync -o trace.txt
> > >
> > > and then graphing the results (log scale). You should try running this
> > > against your servers to confirm that it is indeed the problem.
> > >
> > > > We were trying to qualify the requests into two types - either HB's
> or
> > > > normal requests. Isn't it better to reject normal requests if the
> queue
> > > > size is full to say a certain threshold, but keep the session alive.
> > That
> > > > way the flow control can be achieved with the users session retrying
> > the
> > > > operation, but the session health would be maintained.
> > >
> > > What good is a session (connection) that's not usable? You're better
> > > off disconnecting and re-establishing with a server that can process
> > > your requests in a timely fashion.
> > >
> > > ZK looks at availability from a service perspective, not from an
> > > individual session/connection perspective. The whole more important
> > > than the parts. There already is very sophisticated flow control going
> > > on - e.g. the sessions shut down and stop reading requests when the
> > > number of outstanding requests on a server exceeds some threshold.
> > > Once the server catches up it starts reading again. Again - checkout
> > > your "stat" results for insight into this. (ie "outstanding requests")
> > >
> > > Patrick
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message