zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jzimmer...@netflix.com>
Subject RE: Missing session state handling in most Leader Election implementations
Date Fri, 18 Nov 2011 09:07:43 GMT
FYI 

Curator now has a staged connection notification mechanism for dealing with issues like this.
When the Curator managed connection receives a Disconnect, it posts a message to listeners
that the connection is SUSPENDED. If the connection can be re-established (via a background
sync() using the current retry policy) the listeners receive RECONNECTED otherwise they receive
LOST. Thus, users of the Curator LeaderSelector can know if they should pause their leader
activity and/or stop leader activity.

-JZ
________________________________________
From: Ted Dunning [ted.dunning@gmail.com]
Sent: Monday, November 14, 2011 6:24 PM
To: user@zookeeper.apache.org
Subject: Re: Missing session state handling in most Leader Election implementations

On Mon, Nov 14, 2011 at 2:41 PM, Jordan Zimmerman <jzimmerman@netflix.com>wrote:

> It turns out that this is tricky to solve. When the server you're
> connected to goes down, you get a Watcher.Event.KeeperState.Disconnected.
> However, it could be that you are able to reconnect to another server so
> the disconnected event should be ignored.


The event should not be ignored.  The master should pause in being a
master, but not unload any major data structures.  If it reconnects
instantly, then it should continue as if nothing had happened.  You can
also have a time limit for how long you wait before you decide to pause
operation as master.  As you increase that time, you increase the
probability of two masters existing at the same time.  If the reconnect
happens before the timeout, you don't need to both the master.

Mime
View raw message