zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@apache.org>
Subject Re: c client connection issue question
Date Tue, 03 May 2011 00:22:32 GMT
Woody,
  That seems to be a bug. Can you please open a jira for this? Is it
reproducible on a linux box? Ill try it out on a linux box to see if i
can duplicate this, though a 5 min timeout seems a little high.

thanks
mahadev
On Wed, Apr 27, 2011 at 11:20 PM, Woody Anderson
<woody.anderson@gmail.com> wrote:
> Hello, I'm a contributor for the node.js zookeeper module:
> https://github.com/yfinkelstein/node-zookeeper
> i'm using zk 3.3.3 for the purposes of this issue:
>
> i'm having an issue when trying to connect when one of my zookeeper servers
> is offline.
> if the first server attempted is online, all is good.
>
> if the offline server is attempted first, then the client is never able to
> connect to _any_ server.
> inside zookeeper.c a connection loss (-4) is received, the socket is closed
> and buffers are cleaned up, it then attempts the next server in the list,
> creates a new socket (which gets the same fd as the previously closed
> socket) and connecting fails, and it continues to fail seemingly forever.
> The nature of this "fail" is not that it gets -4 connection loss errors, but
> that zookeeper_interest doesn't find anything going on on the socket before
> the user provided timeout kicks things out. I don't want to have to wait 5
> minutes, even if i could make myself.
>
> this is the message that follows the connection loss:
> 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket
> [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection
> timed out (exceeded timeout by 3ms)
> 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest
> returned error: -7 - operation timeout
>
>
> While investigating, i decided to comment out close(zh->fd) in handle_error
> (zookeeper.c#1153)
> now everything works (obviously i'm leaking an fd). Connection the the
> second host works immediately.
> this is the behavior i'm looking for, though i clearly don't want to leak
> the fd, so i'm wondering why the fd re-use is causing this issue.
> close() is not returning an error (i checked even though current code
> assumes success).
>
> i'm on osx 10.6.7
> i tried adding a setsockopt so_linger (though i didn't want that to be a
> solution), it didn't work.
>
> i'm stumped. thoughts?
> there's full debug trace info here:
> https://github.com/yfinkelstein/node-zookeeper/issues/6
> -w
>



-- 
thanks
mahadev
@mahadevkonar

Mime
View raw message