curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: Problem with LeaderSelector 2.7.1
Date Thu, 14 Jul 2016 15:10:49 GMT
When you get SUSPENDED/LOST, you should exit your leader selector handler’s takeLeadership()
method. But, there’s no reason to close the leader selector instance. Once the connection
is re-established the clients will contend to be the leader again.

In Curator, as a general rule, only close objects when you are completely done with them.

-Jordan

> On Jul 14, 2016, at 7:38 AM, Cantrell, Curtis <Curtis.Cantrell@bkfs.com> wrote:
> 
> When I wrote this code over a year ago, my understanding of proper handling of error
conditions was to suspend the leaders, locks, etc..  when the connection was SUSPENDED and
to rebuild the leaders, locks, etc..  if the connection had been LOST.     I believe I have
been getting Connection LOST when the session was really still alive.  When my code was then,
upon RECONNECT, created a new LeaderSelector, this was causing a new zNode to be added (queued)
to the leader path.   Clearly, this is not the correct error handling.
>  
> Today, I am upgrading to the 3.x Curator and 3.5 zookeeper.    You imply that I should
not closing the LeaderSelector on a LOST.  What is the correctly handling, assuming I am using
the 3.x branch of Curator. 
>  
> Thank you,
> Curtis
>  
> From: Jordan Zimmerman [mailto:jordan@jordanzimmerman.com] 
> Sent: Wednesday, July 13, 2016 4:26 PM
> To: user@curator.apache.org
> Subject: Re: Problem with LeaderSelector 2.7.1
>  
> I quickly looked at your code and don’t understand why you close the leader selector
on connection LOST. Does your network partition often?  Also, are you really creating a new
Curator instance for every leader selector? You should create one Curator instance for your
entire application.
>  
> -JZ 
>  
> On Jul 13, 2016, at 1:41 PM, Cantrell, Curtis <Curtis.Cantrell@bkfs.com <mailto:Curtis.Cantrell@bkfs.com>>
wrote:
>  
> It looks like maybe there are two Fixes that affect my problem.  CURATOR-264 and CURATOR-247.
     Has CURATOR-247 been merge to the 2.X branch or do I need to update my zookeeper to 3.5
in order to get the fix?
> Leader election: Duplicate ephemeral nodes with same owner id
> https://issues.apache.org/jira/browse/CURATOR-264 <https://issues.apache.org/jira/browse/CURATOR-264>
>  
> We sometimes experience failure in our leader-election functionality when we have network
issues. When this situation occurs we see that there are two ephemeral nodes in the zookeeper
cluster for the same session but there is no active leader.
> Extend Curator's connection state to support SESSION_LOST
> https://issues.apache.org/jira/browse/CURATOR-247 <https://issues.apache.org/jira/browse/CURATOR-247>
>  
> Curator has a connection state for LOST that confuses users. It does not mean that the
session is lost. Instead it means that the retry policy has given up retrying
>  
>  
> The information contained in this message is proprietary and/or confidential. If you
are not the intended recipient, please: (i) delete the message and all copies; (ii) do not
disclose, distribute or use the message in any manner; and (iii) notify the sender immediately.
In addition, please be aware that any message addressed to our domain is subject to archiving
and review by persons other than the intended recipient. Thank you.


Mime
View raw message