zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Woody Anderson <woody.ander...@gmail.com>
Subject Re: c client connection issue question
Date Tue, 03 May 2011 01:20:30 GMT
done. the issue was initially reported on linux/ubuntu, i reproduced on mac
10.6.7
https://issues.apache.org/jira/browse/ZOOKEEPER-1057

On Mon, May 2, 2011 at 5:22 PM, Mahadev Konar <mahadev@apache.org> wrote:

> Woody,
>  That seems to be a bug. Can you please open a jira for this? Is it
> reproducible on a linux box? Ill try it out on a linux box to see if i
> can duplicate this, though a 5 min timeout seems a little high.
>
> thanks
> mahadev
> On Wed, Apr 27, 2011 at 11:20 PM, Woody Anderson
> <woody.anderson@gmail.com> wrote:
> > Hello, I'm a contributor for the node.js zookeeper module:
> > https://github.com/yfinkelstein/node-zookeeper
> > i'm using zk 3.3.3 for the purposes of this issue:
> >
> > i'm having an issue when trying to connect when one of my zookeeper
> servers
> > is offline.
> > if the first server attempted is online, all is good.
> >
> > if the offline server is attempted first, then the client is never able
> to
> > connect to _any_ server.
> > inside zookeeper.c a connection loss (-4) is received, the socket is
> closed
> > and buffers are cleaned up, it then attempts the next server in the list,
> > creates a new socket (which gets the same fd as the previously closed
> > socket) and connecting fails, and it continues to fail seemingly forever.
> > The nature of this "fail" is not that it gets -4 connection loss errors,
> but
> > that zookeeper_interest doesn't find anything going on on the socket
> before
> > the user provided timeout kicks things out. I don't want to have to wait
> 5
> > minutes, even if i could make myself.
> >
> > this is the message that follows the connection loss:
> > 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530:
> Socket
> > [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out):
> connection
> > timed out (exceeded timeout by 3ms)
> > 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213:
> yield:zookeeper_interest
> > returned error: -7 - operation timeout
> >
> >
> > While investigating, i decided to comment out close(zh->fd) in
> handle_error
> > (zookeeper.c#1153)
> > now everything works (obviously i'm leaking an fd). Connection the the
> > second host works immediately.
> > this is the behavior i'm looking for, though i clearly don't want to leak
> > the fd, so i'm wondering why the fd re-use is causing this issue.
> > close() is not returning an error (i checked even though current code
> > assumes success).
> >
> > i'm on osx 10.6.7
> > i tried adding a setsockopt so_linger (though i didn't want that to be a
> > solution), it didn't work.
> >
> > i'm stumped. thoughts?
> > there's full debug trace info here:
> > https://github.com/yfinkelstein/node-zookeeper/issues/6
> > -w
> >
>
>
>
> --
> thanks
> mahadev
> @mahadevkonar
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message