zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@yahoo-inc.com>
Subject RE: Possibility / consequences of having multiple elected leaders
Date Thu, 08 Mar 2012 00:07:44 GMT
> Such a commit will be rejected due to an old epoch.

Ted, can you please point me to the place in the code where this check is performed ?

Thanks a lot,

> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Wednesday, March 07, 2012 10:59 AM
> To: user@zookeeper.apache.org
> Subject: Re: Possibility / consequences of having multiple elected
> leaders
> This can be emulated on Linux by simply pausing the process.
> The correct behavior is that the old leader will freeze and if it comes
> back relatively soon, it will still be recognized as leader.
> If the pause is long enough, then the other members of the quorum will
> decide that they have lost contact with the leader and initiate a new
> leader election.  That election will cause the epoch to be incremented.
>  When the old leader returns, it may attempt to commit a change.  Such
> a
> commit will be rejected due to an old epoch.  Alternately, it will get
> a
> ping or a commit from the other servers and realize that it is behind
> and
> initiate a resynchronization.  Even if the old leader had started a
> commit
> before being paused, the commit will have either succeeded in becoming
> durable or not.  Neither case will cause any discrepancies since the
> leader
> election will cause the remaining quorum to agree on a correct state.
> In any case, the paused server should either survive as leader with the
> assent of a quorum or it should realize it is no longer the leader and
> transparently update itself to the current state of the quorum.
> On Wed, Mar 7, 2012 at 9:48 AM, Scott Lindner
> <scott.a.lindner@gmail.com>wrote:
> > ...
> > This got us to wondering what would happen if the elected leader were
> > "frozen" in this manner?  There's no guarantees where in the code it
> would
> > be hung to know for certain what would happen when it left this
> state, but
> > could there be any problems where the "frozen" server would come out
> of
> > this state still thinking it was the leader (since it was stuck) when
> in
> > fact another server had been elected in the meantime?  I would
> imagine this
> > should resolve itself fairly quickly but is there still a possibility
> that
> > this could lead to bad behavior?  Typically if a server fails I would
> > imagine the zookeeper instance would die or lose leadership because
> of an
> > event (failed connection, etc) but this seems slightly different
> since the
> > code would be blocked in a random state.
> > ...
View raw message