incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthieu Morel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (S4-85) Improve handling of ZK connection changes
Date Thu, 19 Jul 2012 09:43:34 GMT

    [ https://issues.apache.org/jira/browse/S4-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418184#comment-13418184
] 

Matthieu Morel commented on S4-85:
----------------------------------

Indeed we should actually never try to reconnect. According to Zookeeper's documentation,
when a disconnected state is reached, the Zookeeper client automatically tries to reconnect,
possibly to a different server from the Zookeeper ensemble. So we shouldn't do anything there.

If the connection is expired, it means that Zookeeper server considered the node is lost,
it removes related ephemeral nodes and notifies related watchers. In that case, reconnection
logic would actually be quite complex, and it is safer to fail fast and actually kill the
current node. The current partition will be picked automatically by a standby S4 node.
                
> Improve handling of ZK connection changes 
> ------------------------------------------
>
>                 Key: S4-85
>                 URL: https://issues.apache.org/jira/browse/S4-85
>             Project: Apache S4
>          Issue Type: Improvement
>    Affects Versions: 0.5
>            Reporter: Daniel Gómez Ferro
>             Fix For: 0.5
>
>
> Currently we are handling ZK state changes a bit differently in two places.
> org.apache.s4.comm.topology.AssignmentFromZK:
> {code}
>     @Override
>     public void handleStateChanged(KeeperState state) throws Exception {
>         this.state = state;
>         if (!state.equals(KeeperState.SyncConnected)) {
>             logger.warn("Session not connected for cluster [{}]: [{}]. Trying to reconnect",
clusterName, state.name());
>             zkClient.close();
>             zkClient.connect(connectionTimeout, null);
>             handleNewSession();
>         }
>     }
> {code}
> org.apache.s4.comm.topology.ClusterFromZK:
> {code}
>     @Override
>     public void handleStateChanged(KeeperState state) throws Exception {
>         this.state = state;
>         if (!state.equals(KeeperState.SyncConnected)) {
>             logger.warn("Session not connected for cluster [{}]: [{}]. Trying to reconnect",
clusterName, state.name());
>             zkClient.connect(connectionTimeout, null);
>             handleNewSession();
>         }
>     }
> {code}
> In the first case we explicitly close the connection before trying to reconnect. Furthermore,
I think we should only try to reconnect when the state is equals to {{KeeperState.Expired}},
since now we are closing the connection on a {{Disconnected}} event too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message