zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: Zookeeper-Zoodiscovery auto reconnect issue
Date Wed, 15 Apr 2015 20:15:41 GMT
So we have the notion of state that you can check.
zooKeeper.getState().isAlive() will tell you if the client is actually
alive or not.

Looking through the code I'm not 100% sure why we are sending the
Disconnected state change after the while loop, or if the code ever would,
since the state should not be alive at that point (or else it wouldn't have
left the while loop).

In general though it sounds like a bug in the discovery side as you said. A
check for the state liveness (are we closed/auth_failed or just
disconnected) should fix this, I think.

C

On Wed, Apr 15, 2015 at 1:46 PM, Yuriy Lopotun <yuriy.lopotun@gmail.com>
wrote:

> Hi guys,
>
>
> In our client-server OSGI application we are using ECF Zoodiscovery
> provider for remote services discovery which uses Zookeeper (v.3.3.3) under
> the hood. When testing the application resiliency, we noticed that when
> unplugging/plugging back the network cable, the client in some cases
> doesn’t get back remote OSGI services from the server.
>
>
> I started debugging this use case and found out that in case of session
> timeout both Zookeeper internally and Zoodiscovery try reconnecting
> simultaneously:
>
> 1) Zookeeper internally:
>
> in ClientCnxn.SendThread.run() in case of SessionTimeoutException it closes
> socket connection in cleanup(), sends the disconnect event to watchers and
> reconnects in startConnect().
>
> 2) Zoodiscovery:
>
> Watcher receives the disconnect event from Zookeeper and closes/reopens a
> new connection by:
>
> // discard the current stale reader
>
> this.readKeeper.close();
>
> // try reconnecting
>
> this.readKeeper = new ZooKeeper(this.ip, 3000, this);
>
>
>
> This results in a connect-disconnect-connect operation (since Zoodiscovery
> closes the just reopened by Zookeeper connection and creates a new one)
> instead of just one connect. Moreover, this also sometimes results in an
> inconsistent client state – connection finally gets re-established, but the
> client doesn’t ask the server for the remote services.
>
>
> I think that the issue in this case is on the Zoodiscovery’s side – it
> should not trigger hard disconnect/reconnect in cases when Zookeeper does
> it internally. However, I’m not sure how it could distinguish these cases,
> because Zookeeper sends an identical disconnect event regardless of whether
> or not it’s going to re-connect internally:
>
> eventThread.queueEvent(new WatchedEvent(
>
>                        Event.EventType.None,
>
>                        Event.KeeperState.Disconnected,
>
>                        null));
>
> is in both ClientCnxn.SendThread catch block within the while loop and just
> after it.
>
>
> So, I wanted to ask for your suggestion of how to better handle the
> disconnect cases to avoid double reconnects and initiate hard reconnect
> from Zoodiscovery only when Zookeper doesn’t do it internally.
>
>
> Thanks,
>
> Yuriy
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message