zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Rose <ianr...@fullstory.com>
Subject Re: Heartbeats not being received / responded to?
Date Mon, 29 Sep 2014 20:51:42 GMT
Well unfortunately although I am able to repro this problem with a
completely new, clean ZK install on my laptop, I *cannot* repro with the
same on my coworkers laptop.  So unfortunately I am forced to conclude that
there is something strange going on locally.

Thanks for the help anyhow!

- Ian


On Mon, Sep 29, 2014 at 9:40 AM, Camille Fournier <camille@apache.org>
wrote:

> No that is not expected. Odd that you get disconnected once and then
> reconnect fine. Does the same thing happen in your kazoo clients, one
> disconnect but then the second connect is ok?
> Which version of ZK are you running? Are you running this with some sort of
> auth or password to the zk server?
>
> Thanks,
> C
>
> On Mon, Sep 29, 2014 at 9:24 AM, Ian Rose <ianrose@fullstory.com> wrote:
>
> > Perhaps I'm simply misunderstanding what the expected behavior would be.
> > Why would my client by disconnected?  Does zookeeper drop idle clients?
> > And note that this isn't a spurious disconnect; my client is *always*
> > dropped
> > at that time.
> >
> > On Friday zkCli seemed to be working just fine for me, but now I am
> getting
> > disconnections similar to my kazoo-based client.
> >
> > Here is a session showing `ls /' failing.  This behavior is reproducible
> > for me currently.
> >
> > $ ./bin/zkCli.sh
> > > Connecting to localhost:2181
> > > Welcome to ZooKeeper!
> > > JLine support is enabled
> > > [zk: localhost:2181(CONNECTING) 0]
> > > WATCHER::
> > > WatchedEvent state:SyncConnected type:None path:null
> > > [zk: localhost:2181(CONNECTED) 0]
> > > [zk: localhost:2181(CONNECTED) 0] ls /
> > > WATCHER::
> > > WatchedEvent state:Disconnected type:None path:null
> > > Exception in thread "main"
> > > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > > KeeperErrorCode = ConnectionLoss for /
> > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
> > > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
> > > at
> > org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:720)
> > > at
> org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:588)
> > > at
> org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:360)
> > > at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
> > > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)
> >
> >
> > And here is a session where after connecting I just sit and wait for a
> bit
> > (without sending any commands), whereupon I get disconnected and then
> > immediately reconnected.  Interestingly, after the reconnecting I am able
> > to issue multiple commands without any issues.  This behavior is
> > reproducible for me currently.
> >
> > $ ./bin/zkCli.sh
> > > Connecting to localhost:2181
> > > Welcome to ZooKeeper!
> > > JLine support is enabled
> > > [zk: localhost:2181(CONNECTING) 0]
> > > [zk: localhost:2181(CONNECTING) 0]
> > > WATCHER::
> > > WatchedEvent state:SyncConnected type:None path:null
> > > [zk: localhost:2181(CONNECTED) 0]
> > > WATCHER::
> > > WatchedEvent state:Disconnected type:None path:null
> > > WATCHER::
> > > WatchedEvent state:SyncConnected type:None path:null
> > > [zk: localhost:2181(CONNECTED) 0]
> > > [zk: localhost:2181(CONNECTED) 0] ls /
> > > [zookeeper]
> > > [zk: localhost:2181(CONNECTED) 1]
> > > [zk: localhost:2181(CONNECTED) 1]
> > > [zk: localhost:2181(CONNECTED) 1] ls /zookeeper
> > > [quota]
> > > [zk: localhost:2181(CONNECTED) 2] ls /zookeeper/quota
> > > []
> >
> >
> >
> > Thanks,
> > - Ian
> >
> >
> > On Sun, Sep 28, 2014 at 2:22 PM, Camille Fournier <camille@apache.org>
> > wrote:
> >
> > > Sorry but with what you've sent us I don't really see what the problem
> > is.
> > > It does look like you connect and then nothing happens for 20s and then
> > the
> > > connection is dropped. If you use the zkCli script to connect via the
> > > command line do you see the same problem?
> > >
> > > C
> > >
> > > On Fri, Sep 26, 2014 at 12:48 PM, Ian Rose <ianrose@fullstory.com>
> > wrote:
> > >
> > > > Hi all -
> > > >
> > > > I've just gotten started using SolrCloud, which uses Zookeeper for
> > > > coordination; I am otherwise completely new to Zookeeper.  Now I am
> > > trying
> > > > to query Zookeeper directly for some simple information.  I am
> finding,
> > > > however, that although my clients are able to connect they very
> > > frequently
> > > > receive timeouts.  It almost seems like the server isn't receiving
> the
> > > > heartbeat messages at all (or isn't responding to them).  I've seen
> > > similar
> > > > behavior both when using the go-zookeeper
> > > > <https://godoc.org/github.com/samuel/go-zookeeper/zk> and kazoo
> > > > <https://github.com/python-zk/kazoo> (python) client libraries (I
> > wanted
> > > > to
> > > > try >1 to ensure that it wasn't a client lib problem).
> > > >
> > > > My config is very simple: I am running the client and a single
> > Zookeeper
> > > > node on the same machine (my laptop).  There are no other clients of
> > the
> > > > Zookeeper node while I am running these tests, so there is no
> practical
> > > > possibility that the JVM is overloaded or GCing.
> > > >
> > > > Here is a very basic kazoo client that I am using.  Obviously it
> isn't
> > > > doing any "real" work right now - this is just to demonstrate the
> > > > disconnects.
> > > >
> > > > #!/usr/bin/env python2.7
> > > >
> > > > from kazoo.client import KazooClient, KazooState
> > > > import logging
> > > > import time
> > > >
> > > > def my_listener(state):
> > > >   if state == KazooState.LOST:
> > > >     # Register somewhere that the session was lost
> > > >     logging.warning('handle lost')
> > > >   elif state == KazooState.SUSPENDED:
> > > >     # Handle being disconnected from Zookeeper
> > > >     logging.debug('handle being disconnected')
> > > >   else:
> > > >     # Handle being connected/reconnected to Zookeeper
> > > >     logging.debug('handle being (re)connected')
> > > >
> > > >
> > > > if __name__ == '__main__':
> > > >   logging.basicConfig(format='%(asctime)-15s %(levelname)s
> > %(message)s',
> > > > level=logging.DEBUG)
> > > >
> > > >   logging.debug('starting...')
> > > >   zk = KazooClient(hosts='127.0.0.1:2181', timeout=30)
> > > >   zk.start()
> > > >   zk.add_listener(my_listener)
> > > >
> > > >   time.sleep(35)
> > > >
> > > >   zk.stop()
> > > >
> > > > --------------------
> > > >
> > > > Here is my zookeeper config:
> > > >
> > > > # The number of milliseconds of each tick
> > > > tickTime=2000
> > > >
> > > > # The number of ticks that the initial
> > > > # synchronization phase can take
> > > > initLimit=10
> > > >
> > > > # The number of ticks that can pass between
> > > > # sending a request and getting an acknowledgement
> > > > syncLimit=5
> > > >
> > > > # the directory where the snapshot is stored.
> > > > dataDir=/Users/ianrose/Code/zookeeper/var/data
> > > >
> > > > # the port at which the clients will connect
> > > > clientPort=2181
> > > >
> > > > # The number of snapshots to retain in dataDir
> > > > autopurge.snapRetainCount=5
> > > >
> > > > # Purge task interval in hours
> > > > # Set to "0" to disable auto purge feature
> > > > autopurge.purgeInterval=1
> > > >
> > > >
> > > > --------------------
> > > >
> > > >
> > > > Here is the output I get on the client:
> > > >
> > > > 2014-09-26 12:43:20,603 DEBUG starting...
> > > > 2014-09-26 12:43:20,604 INFO Connecting to 127.0.0.1:2181
> > > > 2014-09-26 12:43:20,605 DEBUG Sending request(xid=None):
> > > > Connect(protocol_version=0, last_zxid_seen=0, time_out=30000,
> > > session_id=0,
> > > >
> > >
> >
> passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
> > > > read_only=None)
> > > > 2014-09-26 12:43:20,609 INFO Zookeeper connection established, state:
> > > > CONNECTED
> > > > 2014-09-26 12:43:40,222 WARNING Connection dropped: outstanding
> > heartbeat
> > > > ping not received
> > > > 2014-09-26 12:43:40,222 WARNING Transition to CONNECTING
> > > > 2014-09-26 12:43:40,222 INFO Zookeeper connection lost
> > > > 2014-09-26 12:43:40,222 DEBUG handle being disconnected
> > > > 2014-09-26 12:43:40,747 INFO Connecting to 127.0.0.1:2181
> > > > 2014-09-26 12:43:40,748 DEBUG Sending request(xid=None):
> > > > Connect(protocol_version=0, last_zxid_seen=0, time_out=30000,
> > > > session_id=92520388231233540,
> > > > passwd='\xd1M\xb9\xb3\xae\xab\xa1!@x\x06nv\xb7\xe3*', read_only=None)
> > > > 2014-09-26 12:43:40,750 INFO Zookeeper connection established, state:
> > > > CONNECTED
> > > > 2014-09-26 12:43:40,751 DEBUG handle being (re)connected
> > > > 2014-09-26 12:43:55,611 DEBUG Sending request(xid=1): Close()
> > > > 2014-09-26 12:43:55,614 INFO Closing connection to 127.0.0.1:2181
> > > > 2014-09-26 12:43:55,615 INFO Zookeeper session lost, state: CLOSED
> > > > 2014-09-26 12:43:55,615 WARNING handle lost
> > > >
> > > >
> > > > And here is trace-level logging from the server side:
> > > >
> > > > 2014-09-26 12:43:20,605 [myid:] - INFO  [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket
> > > > connection from /127.0.0.1:58959
> > > > 2014-09-26 12:43:20,605 [myid:] - DEBUG [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment
> > > > request
> > > > from client /127.0.0.1:58959 client's lastZxid is 0x0
> > > > 2014-09-26 12:43:20,605 [myid:] - INFO  [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868][] - Client attempting to
> > > > establish new session at /127.0.0.1:58959
> > > > 2014-09-26 12:43:20,605 [myid:] - TRACE [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - SessionTrackerImpl --- Adding
> > > > session
> > > > 0x148b2cd8b010004 30000
> > > > 2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0
> > > > cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004
> > > > type:createSession cxid:0x0 zxid:0xfffffffffffffffe txntype:unknown
> > > > reqpath:n/a
> > > > 2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0
> > > > cport:-1)::ZooTrace@71][] - SessionTrackerImpl --- Existing session
> > > > 0x148b2cd8b010004 30000
> > > > 2014-09-26 12:43:20,608 [myid:] - DEBUG
> > > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request::
> > > > sessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f
> > > > txntype:-10 reqpath:n/a
> > > > 2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@90][]
> -
> > > > :Esessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f
> > > > txntype:-10 reqpath:n/a
> > > > 2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@71][]
> -
> > > > SessionTrackerImpl --- Existing session 0x148b2cd8b010004 30000
> > > > 2014-09-26 12:43:20,608 [myid:] - DEBUG
> > > > [SyncThread:0:FinalRequestProcessor@160][] -
> > sessionid:0x148b2cd8b010004
> > > > type:createSession cxid:0x0 zxid:0x78f txntype:-10 reqpath:n/a
> > > > 2014-09-26 12:43:20,609 [myid:] - INFO
> > [SyncThread:0:ZooKeeperServer@617
> > > > ][]
> > > > - Established session 0x148b2cd8b010004 with negotiated timeout 30000
> > for
> > > > client /127.0.0.1:58959
> > > > 2014-09-26 12:43:40,748 [myid:] - INFO  [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket
> > > > connection from /127.0.0.1:58960
> > > > 2014-09-26 12:43:40,749 [myid:] - DEBUG [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment
> > > > request
> > > > from client /127.0.0.1:58960 client's lastZxid is 0x0
> > > > 2014-09-26 12:43:40,749 [myid:] - INFO  [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861][] - Client attempting to
> > renew
> > > > session 0x148b2cd8b010004 at /127.0.0.1:58960
> > > > 2014-09-26 12:43:40,749 [myid:] - INFO  [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket
> connection
> > > for
> > > > client /127.0.0.1:58959 which had sessionid 0x148b2cd8b010004
> > > > 2014-09-26 12:43:40,750 [myid:] - TRACE [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - Session 0x148b2cd8b010004 is
> > > valid:
> > > > true
> > > > 2014-09-26 12:43:40,750 [myid:] - INFO  [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617][] - Established session
> > > > 0x148b2cd8b010004 with negotiated timeout 30000 for client /
> > > > 127.0.0.1:58960
> > > > 2014-09-26 12:43:50,742 [myid:] - DEBUG
> > > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request::
> > > > sessionid:0x148b2cd8b010004 type:ping cxid:0xfffffffffffffffe
> > > > zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
> > > > 2014-09-26 12:43:50,742 [myid:] - DEBUG
> > > > [SyncThread:0:FinalRequestProcessor@160][] -
> > sessionid:0x148b2cd8b010004
> > > > type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
> > txntype:unknown
> > > > reqpath:n/a
> > > > 2014-09-26 12:43:55,612 [myid:] - TRACE [ProcessThread(sid:0
> > > > cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004
> > > > type:closeSession
> > > > cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
> > > > 2014-09-26 12:43:55,612 [myid:] - INFO  [ProcessThread(sid:0
> > > > cport:-1)::SessionTrackerImpl@198][] - Session closing:
> > > 0x148b2cd8b010004
> > > > 2014-09-26 12:43:55,612 [myid:] - INFO  [ProcessThread(sid:0
> > > > cport:-1)::PrepRequestProcessor@494][] - Processed session
> termination
> > > for
> > > > sessionid: 0x148b2cd8b010004
> > > > 2014-09-26 12:43:55,613 [myid:] - DEBUG
> > > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request::
> > > > sessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790
> > > > txntype:-11 reqpath:n/a
> > > > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@90][]
> -
> > > > :Esessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790
> > > > txntype:-11 reqpath:n/a
> > > > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@71][]
> -
> > > > SessionTrackerImpl --- Removing session 0x148b2cd8b010004
> > > > 2014-09-26 12:43:55,614 [myid:] - DEBUG
> > > > [SyncThread:0:FinalRequestProcessor@160][] -
> > sessionid:0x148b2cd8b010004
> > > > type:closeSession cxid:0x1 zxid:0x790 txntype:-11 reqpath:n/a
> > > > 2014-09-26 12:43:55,614 [myid:] - TRACE
> [SyncThread:0:NIOServerCnxn@168
> > > ][]
> > > > - Add a buffer to outgoingBuffers, sk
> > > sun.nio.ch.SelectionKeyImpl@28cfdeb5
> > > > is valid: true
> > > > 2014-09-26 12:43:55,615 [myid:] - INFO  [NIOServerCxn.Factory:
> > > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket
> connection
> > > for
> > > > client /127.0.0.1:58960 which had sessionid 0x148b2cd8b010004
> > > >
> > > >
> > > > I'm assuming that this is due to a misunderstanding on my part.  Can
> > > anyone
> > > > point me in the right direction?
> > > >
> > > > Much appreciated!
> > > > - Ian
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message