zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@hortonworks.com>
Subject Re: zk keeps disconnecting and reconnecting
Date Wed, 24 Aug 2011 03:42:42 GMT
This looks very strange to me. Is it possible that there is version
issue with ZK server and client? Mostly looks like the client is
getting garbled data from the server somehow. I cant imagine how that
would happen. Is it possible to run one of the servers on tracemode
and see the tracelog on whats going on?

mahadev

On Tue, Aug 23, 2011 at 2:58 PM, Jun Rao <junrao@gmail.com> wrote:
> I have a ZK server cluster running on 4 nodes (version 3.3.3) and a few ZK
> clients (version 3.3.0). After the clients have been running for a while,
> each of them starts to constantly disconnect and reconnect to the ZK server.
> On the client, I saw lots of entries like the following:
> 2011/08/23 14:42:06.579 INFO [ClientCnxn]
> [main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection to
> server esv4-app28.stg/172.18.98.101:12913
> 2011/08/23 14:42:06.579 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection
> established to esv4-app28.stg/172.18.98.101:12913, initiating session
> 2011/08/23 14:42:06.581 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Session establishment
> complete on server esv4-app28.stg/172.18.98.101:12913, sessionid =
> 0x331f77a1ed80004, negotiated timeout = 6000
> 2011/08/23 14:42:06.581 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
> state changed (SyncConnected)
> 2011/08/23 14:42:06.583 WARN [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Session 0x331f77a1ed80004
> for server esv4-app28.stg/172.18.98.101:12913, unexpected error, closing
> socket connection and attempting reconnect
> java.lang.StringIndexOutOfBoundsException: String index out of range:
> -3        at java.lang.String.substring(String.java:1937)
>        at java.lang.String.substring(String.java:1904)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753)
> at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089)
> 2011/08/23 14:42:06.683 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
> state changed (Disconnected)
> 2011/08/23 14:42:07.510 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection to
> server esv4-app29.stg/172.18.98.89:12913
> 2011/08/23 14:42:07.511 INFO [ClientCnxn]
> [main-SendThread(esv4-app29.stg:12913)] [kafka] Socket connection
> established to esv4-app29.stg/172.18.98.89:12913, initiating session
> 2011/08/23 14:42:07.512 INFO [ClientCnxn]
> [main-SendThread(esv4-app29.stg:12913)] [kafka] Session establishment
> complete on server esv4-app29.stg/172.18.98.89:12913, sessionid = 0x331f
> 77a1ed80004, negotiated timeout = 6000
> 2011/08/23 14:42:07.513 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
> state changed (SyncConnected)
> 2011/08/23 14:42:07.552 WARN [ClientCnxn]
> [main-SendThread(esv4-app29.stg:12913)] [kafka] Session 0x331f77a1ed80004
> for server esv4-app29.stg/172.18.98.89:12913, unexpected error, clos
> ing socket connection and attempting reconnect
> java.lang.StringIndexOutOfBoundsException: String index out of range:
> -3        at java.lang.String.substring(String.java:1937)
>        at java.lang.String.substring(String.java:1904)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753)
> at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089)
> 2011/08/23 14:42:07.653 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
> state changed (Disconnected)
>
> On the ZK server, I saw lots of entries like these:
> 2011-08-23 14:34:28,802 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket
> connection from /172.18.98.95:345072011-08-23 14:34:28,803 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client
> attempting to renew session 0x231db9b5f3f010e at /172.18.98.95:34507
> 2011-08-23 14:34:28,803 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client:
> 1581489222990563982011-08-23 14:34:28,803 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established session
> 0x231db9b5f3f010e with negotiated timeout 6000 for client /172.18.98.95:345
> 072011-08-23 14:34:28,805 - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable to
> read additional data from client sessionid 0x231db9b5f3
> f010e, likely client has closed socket
> 2011-08-23 14:34:28,805 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for
> client /172.18.98.95:34507 which had sessionid 0x231db9b5
> f3f010e
> 2011-08-23 14:34:29,512 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket
> connection from /172.18.98.94:588812011-08-23 14:34:29,512 - INFO
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client
> attempting to renew session 0x231db9b5f3f010c at /172.18.98.94:58881
> 2011-08-23 14:34:29,512 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client:
> 1581489222990563962011-08-23 14:34:29,513 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established session
> 0x231db9b5f3f010c with negotiated timeout 6000 for client /172.18.98.94:588
> 812011-08-23 14:34:29,514 - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable to
> read additional data from client sessionid 0x231db9b5f3
> f010c, likely client has closed socket
> 2011-08-23 14:34:29,514 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for
> client /172.18.98.94:58881 which had sessionid 0x231db9b5
> f3f010c
>
> Anybody knows what could be the problem? A few other things that may help
> understand the problem: each ZK client is also connecting to another cluster
> of ZK servers in the same JVM and all the ZK connections use the ZK
> namespace (chroot).
>
> Thanks,
>
> Jun
>

Mime
View raw message