Return-Path: X-Original-To: apmail-curator-dev-archive@minotaur.apache.org Delivered-To: apmail-curator-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E349183CA for ; Fri, 18 Dec 2015 14:37:47 +0000 (UTC) Received: (qmail 18001 invoked by uid 500); 18 Dec 2015 14:37:47 -0000 Delivered-To: apmail-curator-dev-archive@curator.apache.org Received: (qmail 17762 invoked by uid 500); 18 Dec 2015 14:37:47 -0000 Mailing-List: contact dev-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@curator.apache.org Delivered-To: mailing list dev@curator.apache.org Received: (qmail 17689 invoked by uid 99); 18 Dec 2015 14:37:47 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Dec 2015 14:37:47 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BDEDC2C1F6E for ; Fri, 18 Dec 2015 14:37:46 +0000 (UTC) Date: Fri, 18 Dec 2015 14:37:46 +0000 (UTC) From: "David Hay (JIRA)" To: dev@curator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CURATOR-36) Bad session, infinite connection loop from Curator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CURATOR-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064007#comment-15064007 ] David Hay commented on CURATOR-36: ---------------------------------- Is there a Zookeeper issue number you could reference here? We're seeing this issue on one of our applications as well and would like to see if a later version of ZK fixes the issue. > Bad session, infinite connection loop from Curator > -------------------------------------------------- > > Key: CURATOR-36 > URL: https://issues.apache.org/jira/browse/CURATOR-36 > Project: Apache Curator > Issue Type: Bug > Components: Framework > Affects Versions: 2.0.1-incubating > Reporter: Eric Tschetter > Assignee: Eric Tschetter > Fix For: awaiting-response > > > On the ZK clients that I am running Curator on, we sometimes see reconnect loops like the following. These are infinite and happen until the process is restarted. > 2013-06-18 19:57:28,660 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: RECONNECTED > 2013-06-18 19:57:28,660 WARN [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - ConnectionStateManager queue full - dropping events to make room > 2013-06-18 19:57:28,786 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED > 2013-06-18 19:57:28,786 WARN [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - ConnectionStateManager queue full - dropping events to make room > 2013-06-18 19:57:29,048 INFO [main-SendThread(ip-10:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-10/10.:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) > 2013-06-18 19:57:29,049 INFO [main-SendThread(ip-10:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to ip-10/10.:2181, initiating session > 2013-06-18 19:57:29,160 WARN [main-SendThread(ip-10:2181)] org.apache.zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable > 2013-06-18 19:57:29,160 INFO [main-SendThread(ip-10:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server ip-10/10.:2181, sessionid = 0x63f5865925e0010, negotiated timeout = 30000 > 2013-06-18 19:57:29,177 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: RECONNECTED > Looking on the ZK side, it looks like > 2013-06-18 20:07:31,215 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1580] - Established session 0x63f5865925e0010 with negotiated timeout 30000 for client /10.:56263 > 2013-06-18 20:07:31,324 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@639] - Exception causing close of session 0x63f5865925e0010 due to java.io.IOException: Len error 6736057 > 2013-06-18 20:07:31,325 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /10.:56263 which had sessionid 0x63f5865925e0010 > So, there appears to be some issue with trying to recover the session. I don't know exactly what is causing that issue recovering the session, but it would be awesome if Curator were able to notice that it's failing at getting its session back and just try to make a brand new connection. > It appears like this might be doable in reaction to the ConnectionStateManager queue filling up? -- This message was sent by Atlassian JIRA (v6.3.4#6332)