zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuriy Lopotun <yuriy.lopo...@gmail.com>
Subject Zookeeper-Zoodiscovery auto reconnect issue
Date Wed, 15 Apr 2015 17:46:21 GMT
Hi guys,


In our client-server OSGI application we are using ECF Zoodiscovery
provider for remote services discovery which uses Zookeeper (v.3.3.3) under
the hood. When testing the application resiliency, we noticed that when
unplugging/plugging back the network cable, the client in some cases
doesn’t get back remote OSGI services from the server.


I started debugging this use case and found out that in case of session
timeout both Zookeeper internally and Zoodiscovery try reconnecting
simultaneously:

1) Zookeeper internally:

in ClientCnxn.SendThread.run() in case of SessionTimeoutException it closes
socket connection in cleanup(), sends the disconnect event to watchers and
reconnects in startConnect().

2) Zoodiscovery:

Watcher receives the disconnect event from Zookeeper and closes/reopens a
new connection by:

// discard the current stale reader

this.readKeeper.close();

// try reconnecting

this.readKeeper = new ZooKeeper(this.ip, 3000, this);



This results in a connect-disconnect-connect operation (since Zoodiscovery
closes the just reopened by Zookeeper connection and creates a new one)
instead of just one connect. Moreover, this also sometimes results in an
inconsistent client state – connection finally gets re-established, but the
client doesn’t ask the server for the remote services.


I think that the issue in this case is on the Zoodiscovery’s side – it
should not trigger hard disconnect/reconnect in cases when Zookeeper does
it internally. However, I’m not sure how it could distinguish these cases,
because Zookeeper sends an identical disconnect event regardless of whether
or not it’s going to re-connect internally:

eventThread.queueEvent(new WatchedEvent(

                       Event.EventType.None,

                       Event.KeeperState.Disconnected,

                       null));

is in both ClientCnxn.SendThread catch block within the while loop and just
after it.


So, I wanted to ask for your suggestion of how to better handle the
disconnect cases to avoid double reconnects and initiate hard reconnect
from Zoodiscovery only when Zookeper doesn’t do it internally.


Thanks,

Yuriy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message