zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuriy Lopotun <yuriy.lopo...@gmail.com>
Subject Zookeeper-based discovery provider: infinite re-connect loop after server restart
Date Wed, 22 Apr 2015 22:55:38 GMT
Hi guys,



In our client-server OSGI application we are using ECF Zookeeper-based
discovery provider for remote services discovery (based on Zookeeper
v.3.3.6).

In a standalone mode the plugin opens a dedicated Zookeeper connection from
the client to each of the servers.


When testing the application resiliency, we noticed that when we restart
the server, the connection never gets re-established. In the server logs I
found the following:

2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
org.apac.zook.serv.NIOServerCnxn - Accepted socket connection from /
10.36.64.250:53022

2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] DEBUG
org.apac.zook.serv.NIOServerCnxn - Session establishment request from
client /10.36.64.250:53022 client's lastZxid is 0x8

2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
org.apac.zook.serv.NIOServerCnxn - Refusing session request for client /
10.36.64.250:53022 as it has seen zxid 0x8 our last zxid is 0x7 client must
try another server

2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
org.apac.zook.serv.NIOServerCnxn - Closed socket connection for client /
10.36.64.250:53022 (no session established for client)



As far as I understood – this is an expected behaviour, since the server
(due to restart) cleaned up its DB and reset the transaction id.


The problem in this case is that the client session keeps trying
re-connecting to this only server, which causes an infinite loop:

2015-04-22 18:21:02,760 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] INFO  org.apac.zook.ClientCnxn - Opening
socket connection to server ca-rd-mbernard.miranda.com/10.36.64.250:2001

2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] INFO  org.apac.zook.ClientCnxn - Socket
connection established to ca-rd-mbernard.miranda.com/10.36.64.250:2001,
initiating session

2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] DEBUG org.apac.zook.ClientCnxn - Session
establishment request sent on ca-rd-mbernard.miranda.com/10.36.64.250:2001

2015-04-22 18:21:02,762 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] INFO  org.apac.zook.ClientCnxn - Unable
to read additional data from server sessionid 0x14ce32e178c0002, likely
server has closed socket, closing socket connection and attempting reconnect



Again, I think this is a correct behaviour in case of several servers. But
in our case – it’s always 1.

So, I wanted to ask you for a suggestion: what you think we can do in this
case to achieve automatic reconnect.

I thought, maybe we can close the connection in case of such exception if
there is only 1 server instead of retrying? Maybe this enhancement is
already done in more recent versions and could be back-ported?



Thanks,

Yuriy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message