> Such a commit will be rejected due to an old epoch. Ted, can you please point me to the place in the code where this check is performed ? Thanks a lot, Alex > -----Original Message----- > From: Ted Dunning [mailto:ted.dunning@gmail.com] > Sent: Wednesday, March 07, 2012 10:59 AM > To: user@zookeeper.apache.org > Subject: Re: Possibility / consequences of having multiple elected > leaders > > This can be emulated on Linux by simply pausing the process. > > The correct behavior is that the old leader will freeze and if it comes > back relatively soon, it will still be recognized as leader. > > If the pause is long enough, then the other members of the quorum will > decide that they have lost contact with the leader and initiate a new > leader election. That election will cause the epoch to be incremented. > When the old leader returns, it may attempt to commit a change. Such > a > commit will be rejected due to an old epoch. Alternately, it will get > a > ping or a commit from the other servers and realize that it is behind > and > initiate a resynchronization. Even if the old leader had started a > commit > before being paused, the commit will have either succeeded in becoming > durable or not. Neither case will cause any discrepancies since the > leader > election will cause the remaining quorum to agree on a correct state. > > In any case, the paused server should either survive as leader with the > assent of a quorum or it should realize it is no longer the leader and > transparently update itself to the current state of the quorum. > > On Wed, Mar 7, 2012 at 9:48 AM, Scott Lindner > wrote: > > > ... > > This got us to wondering what would happen if the elected leader were > > "frozen" in this manner? There's no guarantees where in the code it > would > > be hung to know for certain what would happen when it left this > state, but > > could there be any problems where the "frozen" server would come out > of > > this state still thinking it was the leader (since it was stuck) when > in > > fact another server had been elected in the meantime? I would > imagine this > > should resolve itself fairly quickly but is there still a possibility > that > > this could lead to bad behavior? Typically if a server fails I would > > imagine the zookeeper instance would die or lose leadership because > of an > > event (failed connection, etc) but this seems slightly different > since the > > code would be blocked in a random state. > > ...