Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5B0E1C2C0 for ; Tue, 18 Jun 2013 20:10:45 +0000 (UTC) Received: (qmail 29558 invoked by uid 500); 18 Jun 2013 20:10:45 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 29525 invoked by uid 500); 18 Jun 2013 20:10:45 -0000 Mailing-List: contact dev-help@curator.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.incubator.apache.org Delivered-To: mailing list dev@curator.incubator.apache.org Received: (qmail 29517 invoked by uid 99); 18 Jun 2013 20:10:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jun 2013 20:10:45 +0000 X-ASF-Spam-Status: No, hits=-2001.1 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 18 Jun 2013 20:10:40 +0000 Received: (qmail 28255 invoked by uid 99); 18 Jun 2013 20:10:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jun 2013 20:10:20 +0000 Date: Tue, 18 Jun 2013 20:10:20 +0000 (UTC) From: "Eric Tschetter (JIRA)" To: dev@curator.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CURATOR-36) Bad session, infinite connection loop from Curator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org Eric Tschetter created CURATOR-36: ------------------------------------- Summary: Bad session, infinite connection loop from Curator Key: CURATOR-36 URL: https://issues.apache.org/jira/browse/CURATOR-36 Project: Apache Curator Issue Type: Bug Components: Framework Affects Versions: 2.0.1-incubating Reporter: Eric Tschetter On the ZK clients that I am running Curator on, we sometimes see reconnect loops like the following. These are infinite and happen until the process is restarted. 2013-06-18 19:57:28,660 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: RECONNECTED 2013-06-18 19:57:28,660 WARN [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - ConnectionStateManager queue full - dropping events to make room 2013-06-18 19:57:28,786 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED 2013-06-18 19:57:28,786 WARN [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - ConnectionStateManager queue full - dropping events to make room 2013-06-18 19:57:29,048 INFO [main-SendThread(ip-10:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-10/10.:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-06-18 19:57:29,049 INFO [main-SendThread(ip-10:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to ip-10/10.:2181, initiating session 2013-06-18 19:57:29,160 WARN [main-SendThread(ip-10:2181)] org.apache.zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable 2013-06-18 19:57:29,160 INFO [main-SendThread(ip-10:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server ip-10/10.:2181, sessionid = 0x63f5865925e0010, negotiated timeout = 30000 2013-06-18 19:57:29,177 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: RECONNECTED Looking on the ZK side, it looks like 2013-06-18 20:07:31,215 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1580] - Established session 0x63f5865925e0010 with negotiated timeout 30000 for client /10.:56263 2013-06-18 20:07:31,324 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@639] - Exception causing close of session 0x63f5865925e0010 due to java.io.IOException: Len error 6736057 2013-06-18 20:07:31,325 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /10.:56263 which had sessionid 0x63f5865925e0010 So, there appears to be some issue with trying to recover the session. I don't know exactly what is causing that issue recovering the session, but it would be awesome if Curator were able to notice that it's failing at getting its session back and just try to make a brand new connection. It appears like this might be doable in reaction to the ConnectionStateManager queue filling up? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira