Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C0043CD4 for ; Tue, 3 May 2011 00:22:34 +0000 (UTC) Received: (qmail 42480 invoked by uid 500); 3 May 2011 00:22:34 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 42455 invoked by uid 500); 3 May 2011 00:22:34 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 42445 invoked by uid 99); 3 May 2011 00:22:34 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 May 2011 00:22:34 +0000 Received: from localhost (HELO mail-fx0-f42.google.com) (127.0.0.1) (smtp-auth username mahadev, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 May 2011 00:22:33 +0000 Received: by fxm1 with SMTP id 1so5150258fxm.15 for ; Mon, 02 May 2011 17:22:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.15.141 with SMTP id k13mr2764891faa.30.1304382152146; Mon, 02 May 2011 17:22:32 -0700 (PDT) Received: by 10.223.144.197 with HTTP; Mon, 2 May 2011 17:22:32 -0700 (PDT) In-Reply-To: References: Date: Mon, 2 May 2011 17:22:32 -0700 Message-ID: Subject: Re: c client connection issue question From: Mahadev Konar To: user@zookeeper.apache.org Content-Type: text/plain; charset=ISO-8859-1 Woody, That seems to be a bug. Can you please open a jira for this? Is it reproducible on a linux box? Ill try it out on a linux box to see if i can duplicate this, though a 5 min timeout seems a little high. thanks mahadev On Wed, Apr 27, 2011 at 11:20 PM, Woody Anderson wrote: > Hello, I'm a contributor for the node.js zookeeper module: > https://github.com/yfinkelstein/node-zookeeper > i'm using zk 3.3.3 for the purposes of this issue: > > i'm having an issue when trying to connect when one of my zookeeper servers > is offline. > if the first server attempted is online, all is good. > > if the offline server is attempted first, then the client is never able to > connect to _any_ server. > inside zookeeper.c a connection loss (-4) is received, the socket is closed > and buffers are cleaned up, it then attempts the next server in the list, > creates a new socket (which gets the same fd as the previously closed > socket) and connecting fails, and it continues to fail seemingly forever. > The nature of this "fail" is not that it gets -4 connection loss errors, but > that zookeeper_interest doesn't find anything going on on the socket before > the user provided timeout kicks things out. I don't want to have to wait 5 > minutes, even if i could make myself. > > this is the message that follows the connection loss: > 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket > [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection > timed out (exceeded timeout by 3ms) > 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest > returned error: -7 - operation timeout > > > While investigating, i decided to comment out close(zh->fd) in handle_error > (zookeeper.c#1153) > now everything works (obviously i'm leaking an fd). Connection the the > second host works immediately. > this is the behavior i'm looking for, though i clearly don't want to leak > the fd, so i'm wondering why the fd re-use is causing this issue. > close() is not returning an error (i checked even though current code > assumes success). > > i'm on osx 10.6.7 > i tried adding a setsockopt so_linger (though i didn't want that to be a > solution), it didn't work. > > i'm stumped. thoughts? > there's full debug trace info here: > https://github.com/yfinkelstein/node-zookeeper/issues/6 > -w > -- thanks mahadev @mahadevkonar