curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jordan Zimmerman (JIRA)" <>
Subject [jira] [Commented] (CURATOR-246) Parent task for adding a SESSION_LOST connection state, etc.
Date Fri, 21 Aug 2015 23:04:45 GMT


Jordan Zimmerman commented on CURATOR-246:

Implementation notes so far:

* It makes more sense to alter the meaning of the current LOST state than adding a new state
* Now is a good time to fix a very old problem. Every API call bottlenecks through RetryLoop.callWithRetry().
The first thing this method does is client.internalBlockUntilConnectedOrTimedOut(). If the
connection doesn't succeed, the actual API call will fail and the retry policy will signal
a retry which again calls client.internalBlockUntilConnectedOrTimedOut(). This is not reasonable
behavior and makes having a true LOST session event more difficult. So, if the new behavior
is enabled, a timeout during connection will immediately throw KeeperException.ConnectionLossException
without retrying
* ConnectionStateManager has been altered so that the event poller will post a LOST state
if the configured session timeout elapses
* When the new behavior is enabled, the background sync() call is no longer made when the
Disconnect is received. It is no longer necessary as the ConnectionStateManager is now watching
for session timeout.
* The Base testing class now runs each test twice. Once in the pre 3.0 mode and once with
enableSessionExpiredState set to true

> Parent task for adding a SESSION_LOST connection state, etc.
> ------------------------------------------------------------
>                 Key: CURATOR-246
>                 URL:
>             Project: Apache Curator
>          Issue Type: New Feature
>          Components: Framework, Recipes
>            Reporter: Dong Lei
> Spark now leverage curator to help manage the connections to ZK and do leader election.

> Currently, whenever a ZK session gets disassociated, the ConnectionStateManager will
be aware and mark the state to be SUSPENDED and a new leader election will be triggered. 
> Even though a ZK session is able to reconnect to another machine very soon. 
> I wonder if we can tolerate such unstable network trembling and do not trigger a leader
election. Because the upper layer application's (like spark) reaction of new leader can be
very costly. 

This message was sent by Atlassian JIRA

View raw message