curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cantrell, Curtis" <Curtis.Cantr...@bkfs.com>
Subject RE: Problem with LeaderSelector 2.7.1
Date Thu, 14 Jul 2016 12:38:02 GMT
When I wrote this code over a year ago, my understanding of proper handling of error conditions
was to suspend the leaders, locks, etc..  when the connection was SUSPENDED and to rebuild
the leaders, locks, etc..  if the connection had been LOST.     I believe I have been getting
Connection LOST when the session was really still alive.  When my code was then, upon RECONNECT,
created a new LeaderSelector, this was causing a new zNode to be added (queued) to the leader
path.   Clearly, this is not the correct error handling.

Today, I am upgrading to the 3.x Curator and 3.5 zookeeper.    You imply that I should not
closing the LeaderSelector on a LOST.  What is the correctly handling, assuming I am using
the 3.x branch of Curator.

Thank you,
Curtis

From: Jordan Zimmerman [mailto:jordan@jordanzimmerman.com]
Sent: Wednesday, July 13, 2016 4:26 PM
To: user@curator.apache.org
Subject: Re: Problem with LeaderSelector 2.7.1

I quickly looked at your code and don’t understand why you close the leader selector on
connection LOST. Does your network partition often?  Also, are you really creating a new Curator
instance for every leader selector? You should create one Curator instance for your entire
application.

-JZ

On Jul 13, 2016, at 1:41 PM, Cantrell, Curtis <Curtis.Cantrell@bkfs.com<mailto:Curtis.Cantrell@bkfs.com>>
wrote:

It looks like maybe there are two Fixes that affect my problem.  CURATOR-264 and CURATOR-247.
     Has CURATOR-247 been merge to the 2.X branch or do I need to update my zookeeper to 3.5
in order to get the fix?
Leader election: Duplicate ephemeral nodes with same owner id
https://issues.apache.org/jira/browse/CURATOR-264

We sometimes experience failure in our leader-election functionality when we have network
issues. When this situation occurs we see that there are two ephemeral nodes in the zookeeper
cluster for the same session but there is no active leader.
Extend Curator's connection state to support SESSION_LOST
https://issues.apache.org/jira/browse/CURATOR-247

Curator has a connection state for LOST that confuses users. It does not mean that the session
is lost. Instead it means that the retry policy has given up retrying


The information contained in this message is proprietary and/or confidential. If you are not
the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose,
distribute or use the message in any manner; and (iii) notify the sender immediately. In addition,
please be aware that any message addressed to our domain is subject to archiving and review
by persons other than the intended recipient. Thank you.
Mime
View raw message