zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérémie BORDIER <jeremie.bord...@gmail.com>
Subject Re: Missing session state handling in most Leader Election implementations
Date Fri, 25 Nov 2011 14:44:12 GMT
Just a quick post to point at that the leader election example that
was posted on the list earlier today is very clean and handle the
disconnected / expired cases.

https://github.com/cyberroadie/zookeeper-leader/

Jérémie

On Fri, Nov 18, 2011 at 7:04 PM, Jordan Zimmerman
<jzimmerman@netflix.com> wrote:
> I just did a quickie test. If the cluster goes down you get the Disconnect
> but do not get a session expiration. So, there wouldn't be an opportunity
> to transition from SUSPENDED to LOST (unless the client makes another ZK
> call). So, this brings me back to doing the background sync().
>
> -JZ
>
> On 11/18/11 9:52 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>
>>Is the background sync even necessary?  The ZK client itself will
>>re-establish connection if it can.
>>
>>I think that LOST should only be sent on session expiration.
>>
>>On Fri, Nov 18, 2011 at 1:07 AM, Jordan Zimmerman
>><jzimmerman@netflix.com>wrote:
>>
>>> FYI
>>>
>>> Curator now has a staged connection notification mechanism for dealing
>>> with issues like this. When the Curator managed connection receives a
>>> Disconnect, it posts a message to listeners that the connection is
>>> SUSPENDED. If the connection can be re-established (via a background
>>>sync()
>>> using the current retry policy) the listeners receive RECONNECTED
>>>otherwise
>>> they receive LOST. Thus, users of the Curator LeaderSelector can know if
>>> they should pause their leader activity and/or stop leader activity.
>>>
>>> -JZ
>>> ________________________________________
>>> From: Ted Dunning [ted.dunning@gmail.com]
>>> Sent: Monday, November 14, 2011 6:24 PM
>>> To: user@zookeeper.apache.org
>>> Subject: Re: Missing session state handling in most Leader Election
>>> implementations
>>>
>>> On Mon, Nov 14, 2011 at 2:41 PM, Jordan Zimmerman
>>><jzimmerman@netflix.com
>>> >wrote:
>>>
>>> > It turns out that this is tricky to solve. When the server you're
>>> > connected to goes down, you get a
>>>Watcher.Event.KeeperState.Disconnected.
>>> > However, it could be that you are able to reconnect to another server
>>>so
>>> > the disconnected event should be ignored.
>>>
>>>
>>> The event should not be ignored.  The master should pause in being a
>>> master, but not unload any major data structures.  If it reconnects
>>> instantly, then it should continue as if nothing had happened.  You can
>>> also have a time limit for how long you wait before you decide to pause
>>> operation as master.  As you increase that time, you increase the
>>> probability of two masters existing at the same time.  If the reconnect
>>> happens before the timeout, you don't need to both the master.
>>>
>
>



-- 
Jérémie 'ahFeel' BORDIER

Mime
View raw message