Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F8D97313 for ; Tue, 23 Aug 2011 21:59:27 +0000 (UTC) Received: (qmail 53101 invoked by uid 500); 23 Aug 2011 21:59:27 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 52989 invoked by uid 500); 23 Aug 2011 21:59:26 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 52981 invoked by uid 500); 23 Aug 2011 21:59:26 -0000 Delivered-To: apmail-hadoop-zookeeper-user@hadoop.apache.org Received: (qmail 52974 invoked by uid 99); 23 Aug 2011 21:59:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Aug 2011 21:59:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of junrao@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Aug 2011 21:59:18 +0000 Received: by wwj26 with SMTP id 26so642742wwj.29 for ; Tue, 23 Aug 2011 14:58:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=tVZgLc0XkhW44ZFc9mWoEopUdjQ68V1fMxsCjybuV3o=; b=hb78inpnjV39JiVrBYC2qvS1eTa6L3lPFVKacfBZTOb+cRryvUTXDp0paJG/B4E7lc xfbgszaOEa6k/BRSwoolujWetj8otSS/JjYCwzjFX5ubb4AsG3bbfr4VX4YUqGO/5zex TQqsIz/wiiDKVythXsZu91mQva7pCbZ6PYHEw= MIME-Version: 1.0 Received: by 10.216.65.203 with SMTP id f53mr1195619wed.54.1314136737609; Tue, 23 Aug 2011 14:58:57 -0700 (PDT) Received: by 10.216.53.193 with HTTP; Tue, 23 Aug 2011 14:58:57 -0700 (PDT) Date: Tue, 23 Aug 2011 14:58:57 -0700 Message-ID: Subject: zk keeps disconnecting and reconnecting From: Jun Rao To: zookeeper-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0ce0b4ea6e77eb04ab334d99 X-Virus-Checked: Checked by ClamAV on apache.org --000e0ce0b4ea6e77eb04ab334d99 Content-Type: text/plain; charset=ISO-8859-1 I have a ZK server cluster running on 4 nodes (version 3.3.3) and a few ZK clients (version 3.3.0). After the clients have been running for a while, each of them starts to constantly disconnect and reconnect to the ZK server. On the client, I saw lots of entries like the following: 2011/08/23 14:42:06.579 INFO [ClientCnxn] [main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection to server esv4-app28.stg/172.18.98.101:12913 2011/08/23 14:42:06.579 INFO [ClientCnxn] [main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection established to esv4-app28.stg/172.18.98.101:12913, initiating session 2011/08/23 14:42:06.581 INFO [ClientCnxn] [main-SendThread(esv4-app28.stg:12913)] [kafka] Session establishment complete on server esv4-app28.stg/172.18.98.101:12913, sessionid = 0x331f77a1ed80004, negotiated timeout = 6000 2011/08/23 14:42:06.581 INFO [ZkClient] [main-EventThread] [kafka] zookeeper state changed (SyncConnected) 2011/08/23 14:42:06.583 WARN [ClientCnxn] [main-SendThread(esv4-app28.stg:12913)] [kafka] Session 0x331f77a1ed80004 for server esv4-app28.stg/172.18.98.101:12913, unexpected error, closing socket connection and attempting reconnect java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.String.substring(String.java:1937) at java.lang.String.substring(String.java:1904) at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089) 2011/08/23 14:42:06.683 INFO [ZkClient] [main-EventThread] [kafka] zookeeper state changed (Disconnected) 2011/08/23 14:42:07.510 INFO [ClientCnxn] [main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection to server esv4-app29.stg/172.18.98.89:12913 2011/08/23 14:42:07.511 INFO [ClientCnxn] [main-SendThread(esv4-app29.stg:12913)] [kafka] Socket connection established to esv4-app29.stg/172.18.98.89:12913, initiating session 2011/08/23 14:42:07.512 INFO [ClientCnxn] [main-SendThread(esv4-app29.stg:12913)] [kafka] Session establishment complete on server esv4-app29.stg/172.18.98.89:12913, sessionid = 0x331f 77a1ed80004, negotiated timeout = 6000 2011/08/23 14:42:07.513 INFO [ZkClient] [main-EventThread] [kafka] zookeeper state changed (SyncConnected) 2011/08/23 14:42:07.552 WARN [ClientCnxn] [main-SendThread(esv4-app29.stg:12913)] [kafka] Session 0x331f77a1ed80004 for server esv4-app29.stg/172.18.98.89:12913, unexpected error, clos ing socket connection and attempting reconnect java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.String.substring(String.java:1937) at java.lang.String.substring(String.java:1904) at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:753) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:840) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1089) 2011/08/23 14:42:07.653 INFO [ZkClient] [main-EventThread] [kafka] zookeeper state changed (Disconnected) On the ZK server, I saw lots of entries like these: 2011-08-23 14:34:28,802 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket connection from /172.18.98.95:345072011-08-23 14:34:28,803 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x231db9b5f3f010e at /172.18.98.95:34507 2011-08-23 14:34:28,803 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client: 1581489222990563982011-08-23 14:34:28,803 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established session 0x231db9b5f3f010e with negotiated timeout 6000 for client /172.18.98.95:345 072011-08-23 14:34:28,805 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable to read additional data from client sessionid 0x231db9b5f3 f010e, likely client has closed socket 2011-08-23 14:34:28,805 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.18.98.95:34507 which had sessionid 0x231db9b5 f3f010e 2011-08-23 14:34:29,512 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:12913:NIOServerCnxn$Factory@251] - Accepted socket connection from /172.18.98.94:588812011-08-23 14:34:29,512 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x231db9b5f3f010c at /172.18.98.94:58881 2011-08-23 14:34:29,512 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:12913:Learner@103] - Revalidating client: 1581489222990563962011-08-23 14:34:29,513 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:12913:NIOServerCnxn@1580] - Established session 0x231db9b5f3f010c with negotiated timeout 6000 for client /172.18.98.94:588 812011-08-23 14:34:29,514 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@634] - EndOfStreamException: Unable to read additional data from client sessionid 0x231db9b5f3 f010c, likely client has closed socket 2011-08-23 14:34:29,514 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.18.98.94:58881 which had sessionid 0x231db9b5 f3f010c Anybody knows what could be the problem? A few other things that may help understand the problem: each ZK client is also connecting to another cluster of ZK servers in the same JVM and all the ZK connections use the ZK namespace (chroot). Thanks, Jun --000e0ce0b4ea6e77eb04ab334d99--