zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Possibility / consequences of having multiple elected leaders
Date Wed, 07 Mar 2012 18:59:28 GMT
This can be emulated on Linux by simply pausing the process.

The correct behavior is that the old leader will freeze and if it comes
back relatively soon, it will still be recognized as leader.

If the pause is long enough, then the other members of the quorum will
decide that they have lost contact with the leader and initiate a new
leader election.  That election will cause the epoch to be incremented.
 When the old leader returns, it may attempt to commit a change.  Such a
commit will be rejected due to an old epoch.  Alternately, it will get a
ping or a commit from the other servers and realize that it is behind and
initiate a resynchronization.  Even if the old leader had started a commit
before being paused, the commit will have either succeeded in becoming
durable or not.  Neither case will cause any discrepancies since the leader
election will cause the remaining quorum to agree on a correct state.

In any case, the paused server should either survive as leader with the
assent of a quorum or it should realize it is no longer the leader and
transparently update itself to the current state of the quorum.

On Wed, Mar 7, 2012 at 9:48 AM, Scott Lindner <scott.a.lindner@gmail.com>wrote:

> ...
> This got us to wondering what would happen if the elected leader were
> "frozen" in this manner?  There's no guarantees where in the code it would
> be hung to know for certain what would happen when it left this state, but
> could there be any problems where the "frozen" server would come out of
> this state still thinking it was the leader (since it was stuck) when in
> fact another server had been elected in the meantime?  I would imagine this
> should resolve itself fairly quickly but is there still a possibility that
> this could lead to bad behavior?  Typically if a server fails I would
> imagine the zookeeper instance would die or lose leadership because of an
> event (failed connection, etc) but this seems slightly different since the
> code would be blocked in a random state.
> ...

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message