zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: zk keeps disconnecting and reconnecting
Date Tue, 23 Aug 2011 23:07:23 GMT
Jun,

I don't see what your connection loss problem is, but I do have a question
about why you are using 4 ZK servers.  This doesn't increase availability or
durability over having just three ZK servers.

On Tue, Aug 23, 2011 at 2:58 PM, Jun Rao <junrao@gmail.com> wrote:

> I have a ZK server cluster running on 4 nodes (version 3.3.3) and a few ZK
> clients (version 3.3.0). After the clients have been running for a while,
> each of them starts to constantly disconnect and reconnect to the ZK
> server.
> On the client, I saw lots of entries like the following:
> 2011/08/23 14:42:06.579 INFO [ClientCnxn]
> [main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection
> to
> server esv4-app28.stg/172.18.98.101:12913
> 2011/08/23 14:42:06.579 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection
> established to esv4-app28.stg/172.18.98.101:12913, initiating session
> 2011/08/23 14:42:06.581 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Session establishment
> complete on server esv4-app28.stg/172.18.98.101:12913, sessionid =
> 0x331f77a1ed80004, negotiated timeout = 6000
> 2011/08/23 14:42:06.581 INFO [ZkClient] [main-EventThread] [kafka]
> zookeeper
> state changed (SyncConnected)
> 2011/08/23 14:42:06.583 WARN [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Session 0x331f77a1ed80004
> for server esv4-app28.stg/172.18.98.101:12913, unexpected error, closing
> socket connection and attempting reconnect
> java.lang.StringIndexOutOfBoundsException: String index out of range:
> -3        at java.lang.String.substring(String.java:1937)
>        at java.lang.String.substring(String.java:1904)
>        at
>
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753)
> at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089)
> 2011/08/23 14:42:06.683 INFO [ZkClient] [main-EventThread] [kafka]
> zookeeper
> state changed (Disconnected)
> 2011/08/23 14:42:07.510 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection
> to
> server esv4-app29.stg/172.18.98.89:12913
> 2011/08/23 14:42:07.511 INFO [ClientCnxn]
> [main-SendThread(esv4-app29.stg:12913)] [kafka] Socket connection
> established to esv4-app29.stg/172.18.98.89:12913, initiating session
> 2011/08/23 14:42:07.512 INFO [ClientCnxn]
> [main-SendThread(esv4-app29.stg:12913)] [kafka] Session establishment
> complete on server esv4-app29.stg/172.18.98.89:12913, sessionid = 0x331f
> 77a1ed80004, negotiated timeout = 6000
> 2011/08/23 14:42:07.513 INFO [ZkClient] [main-EventThread] [kafka]
> zookeeper
> state changed (SyncConnected)
> 2011/08/23 14:42:07.552 WARN [ClientCnxn]
> [main-SendThread(esv4-app29.stg:12913)] [kafka] Session 0x331f77a1ed80004
> for server esv4-app29.stg/172.18.98.89:12913, unexpected error, clos
> ing socket connection and attempting reconnect
> java.lang.StringIndexOutOfBoundsException: String index out of range:
> -3        at java.lang.String.substring(String.java:1937)
>        at java.lang.String.substring(String.java:1904)
>        at
>
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753)
> at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089)
> 2011/08/23 14:42:07.653 INFO [ZkClient] [main-EventThread] [kafka]
> zookeeper
> state changed (Disconnected)
>
> On the ZK server, I saw lots of entries like these:
> 2011-08-23 14:34:28,802 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket
> connection from /172.18.98.95:345072011-08-23 14:34:28,803 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client
> attempting to renew session 0x231db9b5f3f010e at /172.18.98.95:34507
> 2011-08-23 14:34:28,803 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client:
> 1581489222990563982011-08-23 14:34:28,803 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established
> session
> 0x231db9b5f3f010e with negotiated timeout 6000 for client /
> 172.18.98.95:345
> 072011-08-23 14:34:28,805 - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable to
> read additional data from client sessionid 0x231db9b5f3
> f010e, likely client has closed socket
> 2011-08-23 14:34:28,805 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for
> client /172.18.98.95:34507 which had sessionid 0x231db9b5
> f3f010e
> 2011-08-23 14:34:29,512 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket
> connection from /172.18.98.94:588812011-08-23 14:34:29,512 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client
> attempting to renew session 0x231db9b5f3f010c at /172.18.98.94:58881
> 2011-08-23 14:34:29,512 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client:
> 1581489222990563962011-08-23 14:34:29,513 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established
> session
> 0x231db9b5f3f010c with negotiated timeout 6000 for client /
> 172.18.98.94:588
> 812011-08-23 14:34:29,514 - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable to
> read additional data from client sessionid 0x231db9b5f3
> f010c, likely client has closed socket
> 2011-08-23 14:34:29,514 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for
> client /172.18.98.94:58881 which had sessionid 0x231db9b5
> f3f010c
>
> Anybody knows what could be the problem? A few other things that may help
> understand the problem: each ZK client is also connecting to another
> cluster
> of ZK servers in the same JVM and all the ZK connections use the ZK
> namespace (chroot).
>
> Thanks,
>
> Jun
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message