zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérémie BORDIER <jeremie.bord...@gmail.com>
Subject Re: Missing session state handling in most Leader Election implementations
Date Fri, 25 Nov 2011 14:44:12 GMT
Just a quick post to point at that the leader election example that
was posted on the list earlier today is very clean and handle the
disconnected / expired cases.



On Fri, Nov 18, 2011 at 7:04 PM, Jordan Zimmerman
<jzimmerman@netflix.com> wrote:
> I just did a quickie test. If the cluster goes down you get the Disconnect
> but do not get a session expiration. So, there wouldn't be an opportunity
> to transition from SUSPENDED to LOST (unless the client makes another ZK
> call). So, this brings me back to doing the background sync().
> -JZ
> On 11/18/11 9:52 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>>Is the background sync even necessary?  The ZK client itself will
>>re-establish connection if it can.
>>I think that LOST should only be sent on session expiration.
>>On Fri, Nov 18, 2011 at 1:07 AM, Jordan Zimmerman
>>> FYI
>>> Curator now has a staged connection notification mechanism for dealing
>>> with issues like this. When the Curator managed connection receives a
>>> Disconnect, it posts a message to listeners that the connection is
>>> SUSPENDED. If the connection can be re-established (via a background
>>> using the current retry policy) the listeners receive RECONNECTED
>>> they receive LOST. Thus, users of the Curator LeaderSelector can know if
>>> they should pause their leader activity and/or stop leader activity.
>>> -JZ
>>> ________________________________________
>>> From: Ted Dunning [ted.dunning@gmail.com]
>>> Sent: Monday, November 14, 2011 6:24 PM
>>> To: user@zookeeper.apache.org
>>> Subject: Re: Missing session state handling in most Leader Election
>>> implementations
>>> On Mon, Nov 14, 2011 at 2:41 PM, Jordan Zimmerman
>>> >wrote:
>>> > It turns out that this is tricky to solve. When the server you're
>>> > connected to goes down, you get a
>>> > However, it could be that you are able to reconnect to another server
>>> > the disconnected event should be ignored.
>>> The event should not be ignored.  The master should pause in being a
>>> master, but not unload any major data structures.  If it reconnects
>>> instantly, then it should continue as if nothing had happened.  You can
>>> also have a time limit for how long you wait before you decide to pause
>>> operation as master.  As you increase that time, you increase the
>>> probability of two masters existing at the same time.  If the reconnect
>>> happens before the timeout, you don't need to both the master.

Jérémie 'ahFeel' BORDIER

View raw message