zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jzimmer...@netflix.com>
Subject Re: Missing session state handling in most Leader Election implementations
Date Fri, 18 Nov 2011 17:58:51 GMT
The sync is necessary to allow the configured retryPolicy to be applied.
However, you're right that I could just wait for the next connection or
the session expiration. I'll give that some thought.

-JZ

On 11/18/11 9:52 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:

>Is the background sync even necessary?  The ZK client itself will
>re-establish connection if it can.
>
>I think that LOST should only be sent on session expiration.
>
>On Fri, Nov 18, 2011 at 1:07 AM, Jordan Zimmerman
><jzimmerman@netflix.com>wrote:
>
>> FYI
>>
>> Curator now has a staged connection notification mechanism for dealing
>> with issues like this. When the Curator managed connection receives a
>> Disconnect, it posts a message to listeners that the connection is
>> SUSPENDED. If the connection can be re-established (via a background
>>sync()
>> using the current retry policy) the listeners receive RECONNECTED
>>otherwise
>> they receive LOST. Thus, users of the Curator LeaderSelector can know if
>> they should pause their leader activity and/or stop leader activity.
>>
>> -JZ
>> ________________________________________
>> From: Ted Dunning [ted.dunning@gmail.com]
>> Sent: Monday, November 14, 2011 6:24 PM
>> To: user@zookeeper.apache.org
>> Subject: Re: Missing session state handling in most Leader Election
>> implementations
>>
>> On Mon, Nov 14, 2011 at 2:41 PM, Jordan Zimmerman
>><jzimmerman@netflix.com
>> >wrote:
>>
>> > It turns out that this is tricky to solve. When the server you're
>> > connected to goes down, you get a
>>Watcher.Event.KeeperState.Disconnected.
>> > However, it could be that you are able to reconnect to another server
>>so
>> > the disconnected event should be ignored.
>>
>>
>> The event should not be ignored.  The master should pause in being a
>> master, but not unload any major data structures.  If it reconnects
>> instantly, then it should continue as if nothing had happened.  You can
>> also have a time limit for how long you wait before you decide to pause
>> operation as master.  As you increase that time, you increase the
>> probability of two masters existing at the same time.  If the reconnect
>> happens before the timeout, you don't need to both the master.
>>


Mime
View raw message