zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: zk keeps disconnecting and reconnecting
Date Wed, 31 Aug 2011 17:02:27 GMT
Here is a simple test case that reproduces this error.
 public void testChroot() throws Exception
    {
        Watcher watcher = new Watcher()
        {
            @Override
            public void process(WatchedEvent event)
            {
                System.out.println("Event:" + event);
            }
        };
        ZooKeeper zk = new ZooKeeper("localhost:2181/foo", 6000, watcher);
        //uncommenting this line will not cause infinite connect/disconnect
        //zk.create("/", new byte[0], Ids.OPEN_ACL_UNSAFE,
CreateMode.PERSISTENT);

        zk.exists("/", true);
        System.out.println("Stop the server and restart it when you see this
message");
        Thread.currentThread().join();
    }

As pointed out earlier in this thread, setting an watch on a non-existent
path triggers this.

Is some one working on a patch for 961 and the issues described in this
thread. Any pointers on what needs to be fixed for both the issues? I can
take a look and submit a patch if I can fix it.

thanks,
Kishore G

On Tue, Aug 30, 2011 at 6:48 PM, Jun Rao <junrao@gmail.com> wrote:

> I was also wondering why our clients get disconnected in the first place
> since the ZK servers are all up. The following are the logs when the first
> disconnect happens. Does anyone know why the client can't seem to connect
> to
> most servers? Also, is  "Session 0x1320765b0ac002e for server nulll"
> normal?
> Thanks,
>
> 2011/08/29 07:33:51.824 INFO [ClientCnxn]
> [main-SendThread(esv4-app27.stg:12913)] [kafka] Unable to read additional
> data from server sessionid 0x1320765b0ac002f, likely server h
> as closed socket, closing socket connection and attempting
> reconnect2011/08/29 07:33:51.824 INFO [ClientCnxn]
> [main-SendThread(esv4-app27.stg:12913)] [kafka] Unable to read additional
> data from server sessionid 0x1320765b0ac002e, likely server h
> as closed socket, closing socket connection and attempting reconnect
> 2011/08/29 07:33:51.990 INFO [ZkClient] [main-EventThread] [kafka]
> zookeeper
> state changed (Disconnected)2011/08/29 07:33:52.019 INFO [ZkClient]
> [main-EventThread] [kafka] zookeeper state changed (Disconnected)
> 2011/08/29 07:33:52.092 INFO [ClientCnxn]
> [main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection
> to
> server esv4-app29.stg/172.18.98.89:12913
> 2011/08/29 07:33:52.093 INFO [ClientCnxn]
> [main-SendThread(esv4-app29.stg:12913)] [kafka] Socket connection
> established to esv4-app29.stg/172.18.98.89:12913, initiating session
> 2011/08/29 07:33:52.094 INFO [ClientCnxn]
> [main-SendThread(esv4-app29.stg:12913)] [kafka] Unable to read additional
> data from server sessionid 0x1320765b0ac002f, likely server h
> as closed socket, closing socket connection and attempting reconnect
> 2011/08/29 07:33:52.652 INFO [ClientCnxn]
> [main-SendThread(esv4-app27.stg:12913)] [kafka] Opening socket connection
> to
> server esv4-app28.stg/172.18.98.101:12913
> 2011/08/29 07:33:52.652 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection
> established to esv4-app28.stg/172.18.98.101:12913, initiating
> session2011/08/29 07:33:53.075 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Unable to read additional
> data from server sessionid 0x1320765b0ac002e, likely server h
> as closed socket, closing socket connection and attempting reconnect
> 2011/08/29 07:33:53.108 INFO [ClientCnxn]
> [main-SendThread(esv4-app29.stg:12913)] [kafka] Opening socket connection
> to
> server esv4-app28.stg/172.18.98.101:12913
> 2011/08/29 07:33:53.108 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Socket connection
> established to esv4-app28.stg/172.18.98.101:12913, initiating session
> 2011/08/29 07:33:53.109 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Unable to read additional
> data from server sessionid 0x1320765b0ac002f, likely server h
> as closed socket, closing socket connection and attempting reconnect
> 2011/08/29 07:33:53.577 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection
> to
> server esv4-app30.stg/172.18.98.90:12913
> 2011/08/29 07:33:53.578 WARN [ClientCnxn]
> [main-SendThread(esv4-app30.stg:12913)] [kafka] Session 0x1320765b0ac002e
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 2011/08/29 07:33:53.683 INFO [ClientCnxn]
> [main-SendThread(esv4-app28.stg:12913)] [kafka] Opening socket connection
> to
> server esv4-app30.stg/172.18.98.90:12913
> 2011/08/29 07:33:53.683 WARN [ClientCnxn]
> [main-SendThread(esv4-app30.stg:12913)] [kafka] Session 0x1320765b0ac002f
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
>
> Jun
>
> On Mon, Aug 29, 2011 at 10:50 AM, Fournier, Camille F. <
> Camille.Fournier@gs.com> wrote:
>
> > Well, it causes the problem you are seeing. If you set any watchers with
> a
> > chroot and then your client gets disconnected with these watches
> > outstanding, when you reconnect you will try to reset them and they are
> > probably on paths that don't exist (if you are creating everything under
> > path /kafka-tracking). So you get a notification about the watches
> > immediately after resetting them, which causes the string out of bounds
> > exception.
> >
> > The only fix is to disable auto watch reset, and then have your own
> client
> > reset watches when it gets a reconnected event. I suspect it would be
> easier
> > for you to take a shot at fixing the bug than to rewrite your client to
> > handle this. Thomas provided a patch with tests that presumably show the
> > error, so all you need is a fix to make them pass.
> >
> >
> > C
> >
> > -----Original Message-----
> > From: Jun Rao [mailto:junrao@gmail.com]
> > Sent: Monday, August 29, 2011 12:39 PM
> > To: user@zookeeper.apache.org; thomas@koch.ro
> > Subject: Re: zk keeps disconnecting and reconnecting
> >
> > What's the impact of ZOOKEEPER-961? If it shows up, does that mean the
> > client won't get any watcher events afterwards? If so, this sounds like a
> > blocker for 3.4 release to me. What's the temporary solution for 3.3.3?
> >
> > Also, for the very first time that the ZK client gets disconnected, I saw
> > the following entry in the log. It seems that the client can't ping the
> > server for 4 seconds. The ZK server was up at that time and the load was
> > minimal. What could cause the time out? Client GC pauses?
> >
> > 2011/08/26 10:58:22.306 INFO [ClientCnxn]
> > [main-SendThread(esv4-app27.stg:12913)] [kafka] Client session timed out,
> > have not heard from server in 4001ms for sessionid 0x131f
> > ddd84bc0006, closing socket connection and attempting reconnect
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Aug 29, 2011 at 7:54 AM, Thomas Koch <thomas@koch.ro> wrote:
> >
> > > Fournier, Camille F.:
> > > > Did anyone ever check resetting watches at client reconnect on a
> client
> > > > with a chroot? Looking at the code, we store the watches associated
> > with
> > > > the non-chroot path, but they are set by the original request
> > prepending
> > > > chroot to the request. However, it looks like the SetWatches request
> on
> > > > reconnect just calls get on the various watch lists from ZooKeeper,
> > which
> > > > don't have the prepended chroot.
> > > >
> > > > I haven't written a test but I would bet dollars to donuts this is
> the
> > > > problem.
> > > >
> > > > C
> > > seems to be this:
> > > ZOOKEEPER-961, ZOOKEEPER-1091
> > >
> > > Regards,
> > >
> > > Thomas Koch, http://www.koch.ro
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message