zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Jagtap <deepak.jag...@maxta.com>
Subject Re: Issue with monitoring zookeeper server state using four letter word commands
Date Wed, 19 Feb 2014 01:26:26 GMT
Thanks Camille!
Regarding the issue that we hit, following is one instance where quorum
falls apart:

It also describes another scenario where leader election was completing
successfully but
zookeeper database sync never completed, leading to continuous leader
election for hours in loop.
Is there any documentation which describes fields and there possible values
for 'mntr' 4ltw command.

On Tue, Feb 18, 2014 at 4:02 PM, Camille Fournier <camille@apache.org>wrote:

> I'd certainly like to understand the fundamental problem you're seeing of
> why any server is unable to enter quorum for any period of time without
> being partitioned, etc. Is there a ticket open for this or do you think
> it's just part of your env somehow?
> As for the larger question, why not run the check on a regular timing and
> fail if it isn't in quorum for more than N checks? We could add a 4lw but
> it seems like you should be able to figure this out in other ways.
> C
> On Tue, Feb 18, 2014 at 5:51 PM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
> > Hi All,
> >
> > I came across couple of instances where one zookeeper server was falling
> > out from the quorum due to some bug/issue with leader election not
> > completing successfully.
> >
> > We are trying to mitigate this problem by monitoring status of zookeeper
> > server to check if it is part of the quorum.
> > If it's not part of the quorum for very long time we restart zookeeper
> > server so that it can join the quorum again.
> >
> > Currently there is no way to check if server is part of quorum :
> > 'ruok'  returns 'imok' even if zookeeper server is running and is not
> part
> > of quorum(i.e it might be continuously running leader election)
> > 'mntr' command reports this information but it doesn't report how long
> > server is in that state.
> >
> > I want to restart zookeeper server only if out of quorum for certain
> amount
> > of time (say: 2 minutes).
> > Do I need to add a new four letter word command to report this info or is
> > there any other way I can achieve this?
> >
> > I would be more than happy to add this to zookeeper if its helpful for
> > other zookeeper users.
> >
> > Thanks & Regards,
> > Deepak
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message