curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cameron McKenzie <>
Subject Re: Query on SESSION_LOST (3.0.0)
Date Wed, 18 Nov 2015 23:23:35 GMT
Not necessarily false alarms, just that the LOST event didn't necessarily
mean session loss, just that curator was giving up.

With 3.0.0 the LOST event will occur when Curator is explicitly told that a
session has expired by Zookeeper, or if no connection to Zookeeper is
available, Curator will publish a LOST event when it thinks that the
session has been lost. This is based on a timer and the negotiated session
timeout with ZooKeeper.

On Thu, Nov 19, 2015 at 10:13 AM, Vikrant Singh <
> wrote:

> Thanks a lot for reply. So if I am understanding it correct, there were
> false alarms (or mistaken connection lost) . With 3.0.0 connection_lost
> events will happen only when there is true session lost.
> On Wed, Nov 18, 2015 at 1:16 PM, Cameron McKenzie <>
> wrote:
>> Hey Vikrant,
>> The issue was that the LOST event was being published by Curator when it
>> gave up trying to reconnect to Zookeeper after connection loss, whereas
>> most people were interpreting it to mean that the session was lost.
>> So, the change in CURATOR-3.0 is that the LOST event will be published
>> when the session has either expired and Curator is explicitly told this by
>> Zookeeper (implying that a connection is present), or when Curator has been
>> disconnected from Zookeeper for long enough for the session to have expired
>> on the server (this will occur when no connection to Zookeeper is present).
>> So, I'm not sure how it will help your case. It is just a more reliable
>> way of knowing that the session is gone and all related ephemeral state on
>> the Zookeeper server will also be gone.
>> Note that it's also possible to tell Curator to use the legacy way of
>> interpreting the LOST event.
>> cheers
>> On Thu, Nov 19, 2015 at 8:09 AM, Vikrant Singh <
>>> wrote:
>>> Hello All,
>>> I need some guidance on understanding how to a fix done in latest
>>> release 3.0.0 . I am talking about following fix -
>>> .
>>> In my project we create some ephemeral nodes and monitor a cluster
>>> through a tree cache . Framework for treecache and ephemeral node is
>>> created using ExponentialBackoffRetry with retry interval of 1 sec and
>>> retry count of 29 (which is MAX_RETRIES_LIMIT ) .  We do kill the
>>> process moment  we get TreeCacheEvent.Type.CONNECTION_LOST event .
>>> As process restart is really expensive, I want to understand how I can
>>> leverage from this fix.
>>> Please help me in understanding what is the issue and how it may affect
>>> a setup like ours. We are still not on 3.0.0.
>>> Thanks,
>>> Vikrant

View raw message