curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chao chu <chuchao...@gmail.com>
Subject Re: Leader Latch recovery after suspended state
Date Mon, 10 Mar 2014 14:42:14 GMT
+ user@curator.apache.org

The original mail thread was updated in the incubator age :)


On Mon, Mar 10, 2014 at 10:38 PM, chao chu <chuchao333@gmail.com> wrote:

> Hi,
>
> Just want to see if there is any progress on this?
>
> I also have a related question about not only re-use the znode, but imho,
> It would be great that LeaderLatch can survive from teomprary
> ConnectionLossException (i.e., due to transient network issue).
>
> I guess in most cases, the context switch due to leader re-election is
> quite expensive, we might not want to do that just because of some
> transient issue. if the current leader can re-connect within the session
> timeout, it should still hold the leadership and no leader change would
> happen during between. The similar rational like the differences between
> ConnestionLossException (which is recoverable) and SessionExipredException
> (which is not recoverable).
>
> what are your thoughts on this? Thanks a lot!
>
> Regards,
>
>
> On Wed, Aug 21, 2013 at 2:05 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> Yes, I was suggesting how to patch Curator.
>>
>> On Aug 20, 2013, at 10:59 AM, Calvin Jia <jia.calvin@gmail.com> wrote:
>>
>> Currently this is not supported in the Curator library, but the Curator
>> library (specifically leader latch's reset method) is the correct/logical
>> place to add this feature if I want it?
>>
>>
>> On Tue, Aug 20, 2013 at 10:34 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com> wrote:
>>
>>> On reset() it could check to see if its node still exists. It would make
>>> the code a lot more complicated though.
>>>
>>> -JZ
>>>
>>> On Aug 20, 2013, at 10:25 AM, Calvin Jia <jia.calvin@gmail.com> wrote:
>>>
>>> A leader latch enters the suspended state after failing to receive a
>>> response from the first ZK machine it heartbeats to (takes 2 thirds of the
>>> timeout). For the last 1 third, it tries to contact another ZK machine. If
>>> it is successful, it will enter the state reconnected.
>>>
>>> However, on reconnect, despite the fact the original node it created in
>>> ZK is still there, it will create another ephemeral-sequential node (the
>>> reset method is called). This means it will relinquish leadership, if there
>>> is another machine with a latch in the same path.
>>>
>>> Is there any way to reconnect and reuse the original ZK node?
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>>
>
>
> --
> ChuChao
>



-- 
ChuChao

Mime
View raw message