zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuriy Lopotun <yuriy.lopo...@gmail.com>
Subject Re: Zookeeper-Zoodiscovery auto reconnect issue
Date Wed, 15 Apr 2015 20:56:15 GMT
Ah, maybe I didn't understand your suggestion correctly.

If you meant that zooKeeper.state.isAlive() should be checked on
Zoodiscovery side before triggering a reconnect - then this should indeed
fix the issue.

Thanks,
Yuriy

2015-04-15 16:49 GMT-04:00 Yuriy Lopotun <yuriy.lopotun@gmail.com>:

> Thanks for your reply.
> I agree that zooKeeper.getState().isAlive() is a good way to check the
> state.
>
> But notice that after sending the Disconnected event (inside the while
> loop) it would almost immediately proceed to the next loop iteration.
> So, "while (zooKeeper.state.isAlive())" at this moment has a hight chance
> to still evaluate to true, because Zoodiscovery would at the same time
> trigger a chain of method invocations:
> ZooKeeper.close() -> ClientCnxn.close() -> disconnect() ->
> sendThread.close() -> zooKeeper.state = States.CLOSED
> which has a high chance to take more time to execute than a condition
> evaluation.
>
> So, ZooKeeper will invoke startConnect() at least 1 time, which will
> trigger a re-connect. At the same time ZooDiscovery, as I mentioned,
> triggered ZooKeeper.close(), which will try to close the new ZooKeeper
> connection.
> I'm trying to find a way to avoid this situation...
>
> Yuriy
>
> 2015-04-15 16:15 GMT-04:00 Camille Fournier <camille@apache.org>:
>
> So we have the notion of state that you can check.
>> zooKeeper.getState().isAlive() will tell you if the client is actually
>> alive or not.
>>
>> Looking through the code I'm not 100% sure why we are sending the
>> Disconnected state change after the while loop, or if the code ever would,
>> since the state should not be alive at that point (or else it wouldn't
>> have
>> left the while loop).
>>
>> In general though it sounds like a bug in the discovery side as you said.
>> A
>> check for the state liveness (are we closed/auth_failed or just
>> disconnected) should fix this, I think.
>>
>> C
>>
>> On Wed, Apr 15, 2015 at 1:46 PM, Yuriy Lopotun <yuriy.lopotun@gmail.com>
>> wrote:
>>
>> > Hi guys,
>> >
>> >
>> > In our client-server OSGI application we are using ECF Zoodiscovery
>> > provider for remote services discovery which uses Zookeeper (v.3.3.3)
>> under
>> > the hood. When testing the application resiliency, we noticed that when
>> > unplugging/plugging back the network cable, the client in some cases
>> > doesn’t get back remote OSGI services from the server.
>> >
>> >
>> > I started debugging this use case and found out that in case of session
>> > timeout both Zookeeper internally and Zoodiscovery try reconnecting
>> > simultaneously:
>> >
>> > 1) Zookeeper internally:
>> >
>> > in ClientCnxn.SendThread.run() in case of SessionTimeoutException it
>> closes
>> > socket connection in cleanup(), sends the disconnect event to watchers
>> and
>> > reconnects in startConnect().
>> >
>> > 2) Zoodiscovery:
>> >
>> > Watcher receives the disconnect event from Zookeeper and closes/reopens
>> a
>> > new connection by:
>> >
>> > // discard the current stale reader
>> >
>> > this.readKeeper.close();
>> >
>> > // try reconnecting
>> >
>> > this.readKeeper = new ZooKeeper(this.ip, 3000, this);
>> >
>> >
>> >
>> > This results in a connect-disconnect-connect operation (since
>> Zoodiscovery
>> > closes the just reopened by Zookeeper connection and creates a new one)
>> > instead of just one connect. Moreover, this also sometimes results in an
>> > inconsistent client state – connection finally gets re-established, but
>> the
>> > client doesn’t ask the server for the remote services.
>> >
>> >
>> > I think that the issue in this case is on the Zoodiscovery’s side – it
>> > should not trigger hard disconnect/reconnect in cases when Zookeeper
>> does
>> > it internally. However, I’m not sure how it could distinguish these
>> cases,
>> > because Zookeeper sends an identical disconnect event regardless of
>> whether
>> > or not it’s going to re-connect internally:
>> >
>> > eventThread.queueEvent(new WatchedEvent(
>> >
>> >                        Event.EventType.None,
>> >
>> >                        Event.KeeperState.Disconnected,
>> >
>> >                        null));
>> >
>> > is in both ClientCnxn.SendThread catch block within the while loop and
>> just
>> > after it.
>> >
>> >
>> > So, I wanted to ask for your suggestion of how to better handle the
>> > disconnect cases to avoid double reconnects and initiate hard reconnect
>> > from Zoodiscovery only when Zookeper doesn’t do it internally.
>> >
>> >
>> > Thanks,
>> >
>> > Yuriy
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message