zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Rose <ianr...@fullstory.com>
Subject Re: Heartbeats not being received / responded to?
Date Mon, 29 Sep 2014 13:24:38 GMT
Perhaps I'm simply misunderstanding what the expected behavior would be.
Why would my client by disconnected?  Does zookeeper drop idle clients?
And note that this isn't a spurious disconnect; my client is *always* dropped
at that time.

On Friday zkCli seemed to be working just fine for me, but now I am getting
disconnections similar to my kazoo-based client.

Here is a session showing `ls /' failing.  This behavior is reproducible
for me currently.

$ ./bin/zkCli.sh
> Connecting to localhost:2181
> Welcome to ZooKeeper!
> JLine support is enabled
> [zk: localhost:2181(CONNECTING) 0]
> WATCHER::
> WatchedEvent state:SyncConnected type:None path:null
> [zk: localhost:2181(CONNECTED) 0]
> [zk: localhost:2181(CONNECTED) 0] ls /
> WATCHER::
> WatchedEvent state:Disconnected type:None path:null
> Exception in thread "main"
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
> at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:720)
> at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:588)
> at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:360)
> at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
> at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)


And here is a session where after connecting I just sit and wait for a bit
(without sending any commands), whereupon I get disconnected and then
immediately reconnected.  Interestingly, after the reconnecting I am able
to issue multiple commands without any issues.  This behavior is
reproducible for me currently.

$ ./bin/zkCli.sh
> Connecting to localhost:2181
> Welcome to ZooKeeper!
> JLine support is enabled
> [zk: localhost:2181(CONNECTING) 0]
> [zk: localhost:2181(CONNECTING) 0]
> WATCHER::
> WatchedEvent state:SyncConnected type:None path:null
> [zk: localhost:2181(CONNECTED) 0]
> WATCHER::
> WatchedEvent state:Disconnected type:None path:null
> WATCHER::
> WatchedEvent state:SyncConnected type:None path:null
> [zk: localhost:2181(CONNECTED) 0]
> [zk: localhost:2181(CONNECTED) 0] ls /
> [zookeeper]
> [zk: localhost:2181(CONNECTED) 1]
> [zk: localhost:2181(CONNECTED) 1]
> [zk: localhost:2181(CONNECTED) 1] ls /zookeeper
> [quota]
> [zk: localhost:2181(CONNECTED) 2] ls /zookeeper/quota
> []



Thanks,
- Ian


On Sun, Sep 28, 2014 at 2:22 PM, Camille Fournier <camille@apache.org>
wrote:

> Sorry but with what you've sent us I don't really see what the problem is.
> It does look like you connect and then nothing happens for 20s and then the
> connection is dropped. If you use the zkCli script to connect via the
> command line do you see the same problem?
>
> C
>
> On Fri, Sep 26, 2014 at 12:48 PM, Ian Rose <ianrose@fullstory.com> wrote:
>
> > Hi all -
> >
> > I've just gotten started using SolrCloud, which uses Zookeeper for
> > coordination; I am otherwise completely new to Zookeeper.  Now I am
> trying
> > to query Zookeeper directly for some simple information.  I am finding,
> > however, that although my clients are able to connect they very
> frequently
> > receive timeouts.  It almost seems like the server isn't receiving the
> > heartbeat messages at all (or isn't responding to them).  I've seen
> similar
> > behavior both when using the go-zookeeper
> > <https://godoc.org/github.com/samuel/go-zookeeper/zk> and kazoo
> > <https://github.com/python-zk/kazoo> (python) client libraries (I wanted
> > to
> > try >1 to ensure that it wasn't a client lib problem).
> >
> > My config is very simple: I am running the client and a single Zookeeper
> > node on the same machine (my laptop).  There are no other clients of the
> > Zookeeper node while I am running these tests, so there is no practical
> > possibility that the JVM is overloaded or GCing.
> >
> > Here is a very basic kazoo client that I am using.  Obviously it isn't
> > doing any "real" work right now - this is just to demonstrate the
> > disconnects.
> >
> > #!/usr/bin/env python2.7
> >
> > from kazoo.client import KazooClient, KazooState
> > import logging
> > import time
> >
> > def my_listener(state):
> >   if state == KazooState.LOST:
> >     # Register somewhere that the session was lost
> >     logging.warning('handle lost')
> >   elif state == KazooState.SUSPENDED:
> >     # Handle being disconnected from Zookeeper
> >     logging.debug('handle being disconnected')
> >   else:
> >     # Handle being connected/reconnected to Zookeeper
> >     logging.debug('handle being (re)connected')
> >
> >
> > if __name__ == '__main__':
> >   logging.basicConfig(format='%(asctime)-15s %(levelname)s %(message)s',
> > level=logging.DEBUG)
> >
> >   logging.debug('starting...')
> >   zk = KazooClient(hosts='127.0.0.1:2181', timeout=30)
> >   zk.start()
> >   zk.add_listener(my_listener)
> >
> >   time.sleep(35)
> >
> >   zk.stop()
> >
> > --------------------
> >
> > Here is my zookeeper config:
> >
> > # The number of milliseconds of each tick
> > tickTime=2000
> >
> > # The number of ticks that the initial
> > # synchronization phase can take
> > initLimit=10
> >
> > # The number of ticks that can pass between
> > # sending a request and getting an acknowledgement
> > syncLimit=5
> >
> > # the directory where the snapshot is stored.
> > dataDir=/Users/ianrose/Code/zookeeper/var/data
> >
> > # the port at which the clients will connect
> > clientPort=2181
> >
> > # The number of snapshots to retain in dataDir
> > autopurge.snapRetainCount=5
> >
> > # Purge task interval in hours
> > # Set to "0" to disable auto purge feature
> > autopurge.purgeInterval=1
> >
> >
> > --------------------
> >
> >
> > Here is the output I get on the client:
> >
> > 2014-09-26 12:43:20,603 DEBUG starting...
> > 2014-09-26 12:43:20,604 INFO Connecting to 127.0.0.1:2181
> > 2014-09-26 12:43:20,605 DEBUG Sending request(xid=None):
> > Connect(protocol_version=0, last_zxid_seen=0, time_out=30000,
> session_id=0,
> >
> passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
> > read_only=None)
> > 2014-09-26 12:43:20,609 INFO Zookeeper connection established, state:
> > CONNECTED
> > 2014-09-26 12:43:40,222 WARNING Connection dropped: outstanding heartbeat
> > ping not received
> > 2014-09-26 12:43:40,222 WARNING Transition to CONNECTING
> > 2014-09-26 12:43:40,222 INFO Zookeeper connection lost
> > 2014-09-26 12:43:40,222 DEBUG handle being disconnected
> > 2014-09-26 12:43:40,747 INFO Connecting to 127.0.0.1:2181
> > 2014-09-26 12:43:40,748 DEBUG Sending request(xid=None):
> > Connect(protocol_version=0, last_zxid_seen=0, time_out=30000,
> > session_id=92520388231233540,
> > passwd='\xd1M\xb9\xb3\xae\xab\xa1!@x\x06nv\xb7\xe3*', read_only=None)
> > 2014-09-26 12:43:40,750 INFO Zookeeper connection established, state:
> > CONNECTED
> > 2014-09-26 12:43:40,751 DEBUG handle being (re)connected
> > 2014-09-26 12:43:55,611 DEBUG Sending request(xid=1): Close()
> > 2014-09-26 12:43:55,614 INFO Closing connection to 127.0.0.1:2181
> > 2014-09-26 12:43:55,615 INFO Zookeeper session lost, state: CLOSED
> > 2014-09-26 12:43:55,615 WARNING handle lost
> >
> >
> > And here is trace-level logging from the server side:
> >
> > 2014-09-26 12:43:20,605 [myid:] - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket
> > connection from /127.0.0.1:58959
> > 2014-09-26 12:43:20,605 [myid:] - DEBUG [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment
> > request
> > from client /127.0.0.1:58959 client's lastZxid is 0x0
> > 2014-09-26 12:43:20,605 [myid:] - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868][] - Client attempting to
> > establish new session at /127.0.0.1:58959
> > 2014-09-26 12:43:20,605 [myid:] - TRACE [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - SessionTrackerImpl --- Adding
> > session
> > 0x148b2cd8b010004 30000
> > 2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0
> > cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004
> > type:createSession cxid:0x0 zxid:0xfffffffffffffffe txntype:unknown
> > reqpath:n/a
> > 2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0
> > cport:-1)::ZooTrace@71][] - SessionTrackerImpl --- Existing session
> > 0x148b2cd8b010004 30000
> > 2014-09-26 12:43:20,608 [myid:] - DEBUG
> > [SyncThread:0:FinalRequestProcessor@88][] - Processing request::
> > sessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f
> > txntype:-10 reqpath:n/a
> > 2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@90][] -
> > :Esessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f
> > txntype:-10 reqpath:n/a
> > 2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@71][] -
> > SessionTrackerImpl --- Existing session 0x148b2cd8b010004 30000
> > 2014-09-26 12:43:20,608 [myid:] - DEBUG
> > [SyncThread:0:FinalRequestProcessor@160][] - sessionid:0x148b2cd8b010004
> > type:createSession cxid:0x0 zxid:0x78f txntype:-10 reqpath:n/a
> > 2014-09-26 12:43:20,609 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617
> > ][]
> > - Established session 0x148b2cd8b010004 with negotiated timeout 30000 for
> > client /127.0.0.1:58959
> > 2014-09-26 12:43:40,748 [myid:] - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket
> > connection from /127.0.0.1:58960
> > 2014-09-26 12:43:40,749 [myid:] - DEBUG [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment
> > request
> > from client /127.0.0.1:58960 client's lastZxid is 0x0
> > 2014-09-26 12:43:40,749 [myid:] - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861][] - Client attempting to renew
> > session 0x148b2cd8b010004 at /127.0.0.1:58960
> > 2014-09-26 12:43:40,749 [myid:] - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket connection
> for
> > client /127.0.0.1:58959 which had sessionid 0x148b2cd8b010004
> > 2014-09-26 12:43:40,750 [myid:] - TRACE [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - Session 0x148b2cd8b010004 is
> valid:
> > true
> > 2014-09-26 12:43:40,750 [myid:] - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617][] - Established session
> > 0x148b2cd8b010004 with negotiated timeout 30000 for client /
> > 127.0.0.1:58960
> > 2014-09-26 12:43:50,742 [myid:] - DEBUG
> > [SyncThread:0:FinalRequestProcessor@88][] - Processing request::
> > sessionid:0x148b2cd8b010004 type:ping cxid:0xfffffffffffffffe
> > zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
> > 2014-09-26 12:43:50,742 [myid:] - DEBUG
> > [SyncThread:0:FinalRequestProcessor@160][] - sessionid:0x148b2cd8b010004
> > type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown
> > reqpath:n/a
> > 2014-09-26 12:43:55,612 [myid:] - TRACE [ProcessThread(sid:0
> > cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004
> > type:closeSession
> > cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
> > 2014-09-26 12:43:55,612 [myid:] - INFO  [ProcessThread(sid:0
> > cport:-1)::SessionTrackerImpl@198][] - Session closing:
> 0x148b2cd8b010004
> > 2014-09-26 12:43:55,612 [myid:] - INFO  [ProcessThread(sid:0
> > cport:-1)::PrepRequestProcessor@494][] - Processed session termination
> for
> > sessionid: 0x148b2cd8b010004
> > 2014-09-26 12:43:55,613 [myid:] - DEBUG
> > [SyncThread:0:FinalRequestProcessor@88][] - Processing request::
> > sessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790
> > txntype:-11 reqpath:n/a
> > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@90][] -
> > :Esessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790
> > txntype:-11 reqpath:n/a
> > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@71][] -
> > SessionTrackerImpl --- Removing session 0x148b2cd8b010004
> > 2014-09-26 12:43:55,614 [myid:] - DEBUG
> > [SyncThread:0:FinalRequestProcessor@160][] - sessionid:0x148b2cd8b010004
> > type:closeSession cxid:0x1 zxid:0x790 txntype:-11 reqpath:n/a
> > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:NIOServerCnxn@168
> ][]
> > - Add a buffer to outgoingBuffers, sk
> sun.nio.ch.SelectionKeyImpl@28cfdeb5
> > is valid: true
> > 2014-09-26 12:43:55,615 [myid:] - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket connection
> for
> > client /127.0.0.1:58960 which had sessionid 0x148b2cd8b010004
> >
> >
> > I'm assuming that this is due to a misunderstanding on my part.  Can
> anyone
> > point me in the right direction?
> >
> > Much appreciated!
> > - Ian
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message