zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: leader election, scheduled tasks, losing leadership
Date Sun, 09 Dec 2012 05:04:35 GMT
This is why you need a ConnectionStateListener. You'll get a notice that the connection has
been suspended and you should assume all locks/leaders are invalid.

-JZ

On Dec 8, 2012, at 9:02 PM, Henry Robinson <henry@cloudera.com> wrote:

> What about a network disconnection? Presumably leadership is revoked when
> the leader appears to have failed, which can be for more reasons than a VM
> crash (VM running slow, network event, GC pause etc).
> 
> Henry
> 
> On 8 December 2012 21:00, Jordan Zimmerman <jordan@jordanzimmerman.com>wrote:
> 
>> The leader latch lock is the equivalent of task in progress. I assume the
>> task is running in the same VM as the leader lock. The only reason the VM
>> would lose leadership is if it crashes in which case the process would die
>> anyway.
>> 
>> -JZ
>> 
>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <ericacm@gmail.com> wrote:
>> 
>>> If I recall correctly it was Henry Robinson that gave me the advice to
>> have
>>> a "task in progress" check.
>>> 
>>> 
>>> -- Eric
>>> 
>>> 
>>> 
>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <ericacm@gmail.com>
>> wrote:
>>> 
>>>> I am using Curator LeaderLatch :)
>>>> 
>>>> 
>>>> -- Eric
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com> wrote:
>>>> 
>>>>> You might check your leader implementation. Writing a correct leader
>>>>> recipe is actually quite challenging due to edge cases. Have a look at
>>>>> Curator (disclosure: I wrote it) for an example.
>>>>> 
>>>>> -JZ
>>>>> 
>>>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <ericacm@gmail.com> wrote:
>>>>> 
>>>>>> Actually I had the same thought and didn't consider having to do
this
>>>>> until
>>>>>> I talked about my project at a Zookeeper User Group a month or so
ago
>>>>> and I
>>>>>> was given this advice.
>>>>>> 
>>>>>> I know that I do see leadership being lost/transferred when one of
the
>>>>> ZK
>>>>>> servers is restarted (not the whole ensemble).   And it seems like
>> I've
>>>>>> seen it happen even when the ensemble stays totally stable (though
I
>> am
>>>>> not
>>>>>> 100% sure as it's been a while since I have worked on this particular
>>>>>> application).
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- Eric
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman <
>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>> 
>>>>>>> Why would it lose leadership? The only reason I can think of
is if
>> the
>>>>> ZK
>>>>>>> cluster goes down. In normal use, the ZK cluster won't go down
(I
>>>>> assume
>>>>>>> you're running 3 or 5 instances).
>>>>>>> 
>>>>>>> -JZ
>>>>>>> 
>>>>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <ericacm@gmail.com>
wrote:
>>>>>>> 
>>>>>>>> During the time the task is running a cluster member could
lose its
>>>>>>>> leadership.
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 
> 
> 
> -- 
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679


Mime
View raw message