zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: zk keeps disconnecting and reconnecting
Date Mon, 29 Aug 2011 04:34:17 GMT
We cleaned up all ZK server data and restarted both the servers and the
clients. We also upgraded the client to 3.3.3. After running for a day and a
half, the same weird reconnect issue showed up in one of the clients. Our ZK
connection string is
esv4-app27.stg:12913,esv4-app28.stg:12913,esv4-app29.stg:12913,esv4-app30.stg:12913/kafka-tracking.
We are on java 1_6_0_21 on RedHat Linux. Note that our ZK client has been
running fine until we upgraded the client code recently. The new version
makes one extra ZK connection to the same ZK cluster. Here are the log
entries and ZK client keeps connecting and disconnecting from each of the 4
ZK servers.

2011/08/26 10:58:39.864 INFO [ClientCnxn]
[main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection to
server esv4-app27.stg/172.18.98.88:12913
2011/08/26 10:58:39.865 INFO [ClientCnxn]
[main-SendThread(esv4-app27.stg:12913)] [kafka] Socket connection
established to esv4-app27.stg/172.18.98.88:12913, initiating session
2011/08/26 10:58:39.867 INFO [ClientCnxn]
[main-SendThread(esv4-app27.stg:12913)] [kafka] Session establishment
complete on server esv4-app27.stg/172.18.98.88:12913, sessionid =
0x131fddd84bc0006, negotiated timeout = 6000
2011/08/26 10:58:39.867 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (SyncConnected)
2011/08/26 10:58:39.868 WARN [ClientCnxn]
[main-SendThread(esv4-app27.stg:12913)] [kafka] Session 0x131fddd84bc0006
for server esv4-app27.stg/172.18.98.88:12913, unexpected error, closing
socket connection and attempting reconnect
java.lang.StringIndexOutOfBoundsException: String index out of range: -3
        at java.lang.String.substring(String.java:1937)
        at java.lang.String.substring(String.java:1904)
        at
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:794)
        at
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:881)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
2011/08/26 10:58:39.969 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (Disconnected)
2011/08/26 10:58:40.276 INFO [ClientCnxn]
[main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection to
server esv4-app29.stg/172.18.98.89:12913
2011/08/26 10:58:40.276 INFO [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Socket connection
established to esv4-app29.stg/172.18.98.89:12913, initiating session
2011/08/26 10:58:40.278 INFO [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Session establishment
complete on server esv4-app29.stg/172.18.98.89:12913, sessionid =
0x131fddd84bc0006, negotiated timeout = 6000
2011/08/26 10:58:40.278 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (SyncConnected)
2011/08/26 10:58:40.279 WARN [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Session 0x131fddd84bc0006
for server esv4-app29.stg/172.18.98.89:12913, unexpected error, closing
socket connection and attempting reconnect
java.lang.StringIndexOutOfBoundsException: String index out of range: -3
        at java.lang.String.substring(String.java:1937)
        at java.lang.String.substring(String.java:1904)
        at
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:794)
        at
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:881)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
2011/08/26 10:58:40.380 INFO [ZkClient] [main-EventThread] [kafka] zookeeper
state changed (Disconnected)
2011/08/26 10:58:40.515 INFO [ClientCnxn]
[main-SendThread(esv4-app29.stg:12913)] [kafka] Opening socket connection to
server esv4-app30.stg/172.18.98.90:12913

Thanks,

Jun

On Thu, Aug 25, 2011 at 9:34 AM, Patrick Hunt <phunt@apache.org> wrote:

> The client seeing the problem in this case is 3.3.0, I see this based
> on the line number in the stack trace not matching up with 3.3.3, with
> 3.3.0 it's this line:
>
>           event.setPath(serverPath.substring(chrootPath.length()));
>
> so for some reason your chroot path is negative in length? That's just
> not possible (string.length() should never return negative).
>
> What JVM are you using? What's your client connect string look like?
>
> Patrick
>
> On Tue, Aug 23, 2011 at 2:58 PM, Jun Rao <junrao@gmail.com> wrote:
> > I have a ZK server cluster running on 4 nodes (version 3.3.3) and a few
> ZK
> > clients (version 3.3.0). After the clients have been running for a while,
> > each of them starts to constantly disconnect and reconnect to the ZK
> server.
> > On the client, I saw lots of entries like the following:
> > 2011/08/23 14:42:06.579 INFO [ClientCnxn]
> > [main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection
> to
> > server esv4-app28.stg/172.18.98.101:12913
> > 2011/08/23 14:42:06.579 INFO [ClientCnxn]
> > [main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection
> > established to esv4-app28.stg/172.18.98.101:12913, initiating session
> > 2011/08/23 14:42:06.581 INFO [ClientCnxn]
> > [main-SendThread(esv4-app28.stg:12913)] [kafka] Session establishment
> > complete on server esv4-app28.stg/172.18.98.101:12913, sessionid =
> > 0x331f77a1ed80004, negotiated timeout = 6000
> > 2011/08/23 14:42:06.581 INFO [ZkClient] [main-EventThread] [kafka]
> zookeeper
> > state changed (SyncConnected)
> > 2011/08/23 14:42:06.583 WARN [ClientCnxn]
> > [main-SendThread(esv4-app28.stg:12913)] [kafka] Session 0x331f77a1ed80004
> > for server esv4-app28.stg/172.18.98.101:12913, unexpected error, closing
> > socket connection and attempting reconnect
> > java.lang.StringIndexOutOfBoundsException: String index out of range:
> > -3        at java.lang.String.substring(String.java:1937)
> >        at java.lang.String.substring(String.java:1904)
> >        at
> >
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753)
> > at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089)
> > 2011/08/23 14:42:06.683 INFO [ZkClient] [main-EventThread] [kafka]
> zookeeper
> > state changed (Disconnected)
> > 2011/08/23 14:42:07.510 INFO [ClientCnxn]
> > [main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection
> to
> > server esv4-app29.stg/172.18.98.89:12913
> > 2011/08/23 14:42:07.511 INFO [ClientCnxn]
> > [main-SendThread(esv4-app29.stg:12913)] [kafka] Socket connection
> > established to esv4-app29.stg/172.18.98.89:12913, initiating session
> > 2011/08/23 14:42:07.512 INFO [ClientCnxn]
> > [main-SendThread(esv4-app29.stg:12913)] [kafka] Session establishment
> > complete on server esv4-app29.stg/172.18.98.89:12913, sessionid = 0x331f
> > 77a1ed80004, negotiated timeout = 6000
> > 2011/08/23 14:42:07.513 INFO [ZkClient] [main-EventThread] [kafka]
> zookeeper
> > state changed (SyncConnected)
> > 2011/08/23 14:42:07.552 WARN [ClientCnxn]
> > [main-SendThread(esv4-app29.stg:12913)] [kafka] Session 0x331f77a1ed80004
> > for server esv4-app29.stg/172.18.98.89:12913, unexpected error, clos
> > ing socket connection and attempting reconnect
> > java.lang.StringIndexOutOfBoundsException: String index out of range:
> > -3        at java.lang.String.substring(String.java:1937)
> >        at java.lang.String.substring(String.java:1904)
> >        at
> >
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753)
> > at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089)
> > 2011/08/23 14:42:07.653 INFO [ZkClient] [main-EventThread] [kafka]
> zookeeper
> > state changed (Disconnected)
> >
> > On the ZK server, I saw lots of entries like these:
> > 2011-08-23 14:34:28,802 - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket
> > connection from /172.18.98.95:345072011-08-23 14:34:28,803 - INFO
> > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client
> > attempting to renew session 0x231db9b5f3f010e at /172.18.98.95:34507
> > 2011-08-23 14:34:28,803 - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client:
> > 1581489222990563982011-08-23 14:34:28,803 - INFO
> > [QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established
> session
> > 0x231db9b5f3f010e with negotiated timeout 6000 for client /
> 172.18.98.95:345
> > 072011-08-23 14:34:28,805 - WARN  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable
> to
> > read additional data from client sessionid 0x231db9b5f3
> > f010e, likely client has closed socket
> > 2011-08-23 14:34:28,805 - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for
> > client /172.18.98.95:34507 which had sessionid 0x231db9b5
> > f3f010e
> > 2011-08-23 14:34:29,512 - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket
> > connection from /172.18.98.94:588812011-08-23 14:34:29,512 - INFO
> > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client
> > attempting to renew session 0x231db9b5f3f010c at /172.18.98.94:58881
> > 2011-08-23 14:34:29,512 - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client:
> > 1581489222990563962011-08-23 14:34:29,513 - INFO
> > [QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established
> session
> > 0x231db9b5f3f010c with negotiated timeout 6000 for client /
> 172.18.98.94:588
> > 812011-08-23 14:34:29,514 - WARN  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable
> to
> > read additional data from client sessionid 0x231db9b5f3
> > f010c, likely client has closed socket
> > 2011-08-23 14:34:29,514 - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for
> > client /172.18.98.94:58881 which had sessionid 0x231db9b5
> > f3f010c
> >
> > Anybody knows what could be the problem? A few other things that may help
> > understand the problem: each ZK client is also connecting to another
> cluster
> > of ZK servers in the same JVM and all the ZK connections use the ZK
> > namespace (chroot).
> >
> > Thanks,
> >
> > Jun
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message