zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuriy Lopotun <yuriy.lopo...@gmail.com>
Subject Re: Zookeeper-based discovery provider: infinite re-connect loop after server restart
Date Thu, 23 Apr 2015 17:34:41 GMT
Looks like there's an opened bug for the described issue:
https://issues.apache.org/jira/browse/ZOOKEEPER-832

There was some discussion in the comments but looks like the best solution
hasn't been found yet.

Yuriy

2015-04-22 18:55 GMT-04:00 Yuriy Lopotun <yuriy.lopotun@gmail.com>:

> Hi guys,
>
>
>
> In our client-server OSGI application we are using ECF Zookeeper-based
> discovery provider for remote services discovery (based on Zookeeper
> v.3.3.6).
>
> In a standalone mode the plugin opens a dedicated Zookeeper connection
> from the client to each of the servers.
>
>
> When testing the application resiliency, we noticed that when we restart
> the server, the connection never gets re-established. In the server logs I
> found the following:
>
> 2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
> org.apac.zook.serv.NIOServerCnxn - Accepted socket connection from /
> 10.36.64.250:53022
>
> 2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] DEBUG
> org.apac.zook.serv.NIOServerCnxn - Session establishment request from
> client /10.36.64.250:53022 client's lastZxid is 0x8
>
> 2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
> org.apac.zook.serv.NIOServerCnxn - Refusing session request for client /
> 10.36.64.250:53022 as it has seen zxid 0x8 our last zxid is 0x7 client
> must try another server
>
> 2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
> org.apac.zook.serv.NIOServerCnxn - Closed socket connection for client /
> 10.36.64.250:53022 (no session established for client)
>
>
>
> As far as I understood – this is an expected behaviour, since the server
> (due to restart) cleaned up its DB and reset the transaction id.
>
>
> The problem in this case is that the client session keeps trying
> re-connecting to this only server, which causes an infinite loop:
>
> 2015-04-22 18:21:02,760 [pool-2-thread-3-SendThread(
> ca-rd-mbernard.miranda.com:2001)] INFO  org.apac.zook.ClientCnxn -
> Opening socket connection to server
> ca-rd-mbernard.miranda.com/10.36.64.250:2001
>
> 2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread(
> ca-rd-mbernard.miranda.com:2001)] INFO  org.apac.zook.ClientCnxn - Socket
> connection established to ca-rd-mbernard.miranda.com/10.36.64.250:2001,
> initiating session
>
> 2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread(
> ca-rd-mbernard.miranda.com:2001)] DEBUG org.apac.zook.ClientCnxn -
> Session establishment request sent on
> ca-rd-mbernard.miranda.com/10.36.64.250:2001
>
> 2015-04-22 18:21:02,762 [pool-2-thread-3-SendThread(
> ca-rd-mbernard.miranda.com:2001)] INFO  org.apac.zook.ClientCnxn - Unable
> to read additional data from server sessionid 0x14ce32e178c0002, likely
> server has closed socket, closing socket connection and attempting reconnect
>
>
>
> Again, I think this is a correct behaviour in case of several servers. But
> in our case – it’s always 1.
>
> So, I wanted to ask you for a suggestion: what you think we can do in this
> case to achieve automatic reconnect.
>
> I thought, maybe we can close the connection in case of such exception if
> there is only 1 server instead of retrying? Maybe this enhancement is
> already done in more recent versions and could be back-ported?
>
>
>
> Thanks,
>
> Yuriy
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message