zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: Heartbeats not being received / responded to?
Date Mon, 29 Sep 2014 13:40:32 GMT
No that is not expected. Odd that you get disconnected once and then
reconnect fine. Does the same thing happen in your kazoo clients, one
disconnect but then the second connect is ok?
Which version of ZK are you running? Are you running this with some sort of
auth or password to the zk server?

Thanks,
C

On Mon, Sep 29, 2014 at 9:24 AM, Ian Rose <ianrose@fullstory.com> wrote:

> Perhaps I'm simply misunderstanding what the expected behavior would be.
> Why would my client by disconnected?  Does zookeeper drop idle clients?
> And note that this isn't a spurious disconnect; my client is *always*
> dropped
> at that time.
>
> On Friday zkCli seemed to be working just fine for me, but now I am getting
> disconnections similar to my kazoo-based client.
>
> Here is a session showing `ls /' failing.  This behavior is reproducible
> for me currently.
>
> $ ./bin/zkCli.sh
> > Connecting to localhost:2181
> > Welcome to ZooKeeper!
> > JLine support is enabled
> > [zk: localhost:2181(CONNECTING) 0]
> > WATCHER::
> > WatchedEvent state:SyncConnected type:None path:null
> > [zk: localhost:2181(CONNECTED) 0]
> > [zk: localhost:2181(CONNECTED) 0] ls /
> > WATCHER::
> > WatchedEvent state:Disconnected type:None path:null
> > Exception in thread "main"
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /
> > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
> > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
> > at
> org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:720)
> > at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:588)
> > at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:360)
> > at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
> > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)
>
>
> And here is a session where after connecting I just sit and wait for a bit
> (without sending any commands), whereupon I get disconnected and then
> immediately reconnected.  Interestingly, after the reconnecting I am able
> to issue multiple commands without any issues.  This behavior is
> reproducible for me currently.
>
> $ ./bin/zkCli.sh
> > Connecting to localhost:2181
> > Welcome to ZooKeeper!
> > JLine support is enabled
> > [zk: localhost:2181(CONNECTING) 0]
> > [zk: localhost:2181(CONNECTING) 0]
> > WATCHER::
> > WatchedEvent state:SyncConnected type:None path:null
> > [zk: localhost:2181(CONNECTED) 0]
> > WATCHER::
> > WatchedEvent state:Disconnected type:None path:null
> > WATCHER::
> > WatchedEvent state:SyncConnected type:None path:null
> > [zk: localhost:2181(CONNECTED) 0]
> > [zk: localhost:2181(CONNECTED) 0] ls /
> > [zookeeper]
> > [zk: localhost:2181(CONNECTED) 1]
> > [zk: localhost:2181(CONNECTED) 1]
> > [zk: localhost:2181(CONNECTED) 1] ls /zookeeper
> > [quota]
> > [zk: localhost:2181(CONNECTED) 2] ls /zookeeper/quota
> > []
>
>
>
> Thanks,
> - Ian
>
>
> On Sun, Sep 28, 2014 at 2:22 PM, Camille Fournier <camille@apache.org>
> wrote:
>
> > Sorry but with what you've sent us I don't really see what the problem
> is.
> > It does look like you connect and then nothing happens for 20s and then
> the
> > connection is dropped. If you use the zkCli script to connect via the
> > command line do you see the same problem?
> >
> > C
> >
> > On Fri, Sep 26, 2014 at 12:48 PM, Ian Rose <ianrose@fullstory.com>
> wrote:
> >
> > > Hi all -
> > >
> > > I've just gotten started using SolrCloud, which uses Zookeeper for
> > > coordination; I am otherwise completely new to Zookeeper.  Now I am
> > trying
> > > to query Zookeeper directly for some simple information.  I am finding,
> > > however, that although my clients are able to connect they very
> > frequently
> > > receive timeouts.  It almost seems like the server isn't receiving the
> > > heartbeat messages at all (or isn't responding to them).  I've seen
> > similar
> > > behavior both when using the go-zookeeper
> > > <https://godoc.org/github.com/samuel/go-zookeeper/zk> and kazoo
> > > <https://github.com/python-zk/kazoo> (python) client libraries (I
> wanted
> > > to
> > > try >1 to ensure that it wasn't a client lib problem).
> > >
> > > My config is very simple: I am running the client and a single
> Zookeeper
> > > node on the same machine (my laptop).  There are no other clients of
> the
> > > Zookeeper node while I am running these tests, so there is no practical
> > > possibility that the JVM is overloaded or GCing.
> > >
> > > Here is a very basic kazoo client that I am using.  Obviously it isn't
> > > doing any "real" work right now - this is just to demonstrate the
> > > disconnects.
> > >
> > > #!/usr/bin/env python2.7
> > >
> > > from kazoo.client import KazooClient, KazooState
> > > import logging
> > > import time
> > >
> > > def my_listener(state):
> > >   if state == KazooState.LOST:
> > >     # Register somewhere that the session was lost
> > >     logging.warning('handle lost')
> > >   elif state == KazooState.SUSPENDED:
> > >     # Handle being disconnected from Zookeeper
> > >     logging.debug('handle being disconnected')
> > >   else:
> > >     # Handle being connected/reconnected to Zookeeper
> > >     logging.debug('handle being (re)connected')
> > >
> > >
> > > if __name__ == '__main__':
> > >   logging.basicConfig(format='%(asctime)-15s %(levelname)s
> %(message)s',
> > > level=logging.DEBUG)
> > >
> > >   logging.debug('starting...')
> > >   zk = KazooClient(hosts='127.0.0.1:2181', timeout=30)
> > >   zk.start()
> > >   zk.add_listener(my_listener)
> > >
> > >   time.sleep(35)
> > >
> > >   zk.stop()
> > >
> > > --------------------
> > >
> > > Here is my zookeeper config:
> > >
> > > # The number of milliseconds of each tick
> > > tickTime=2000
> > >
> > > # The number of ticks that the initial
> > > # synchronization phase can take
> > > initLimit=10
> > >
> > > # The number of ticks that can pass between
> > > # sending a request and getting an acknowledgement
> > > syncLimit=5
> > >
> > > # the directory where the snapshot is stored.
> > > dataDir=/Users/ianrose/Code/zookeeper/var/data
> > >
> > > # the port at which the clients will connect
> > > clientPort=2181
> > >
> > > # The number of snapshots to retain in dataDir
> > > autopurge.snapRetainCount=5
> > >
> > > # Purge task interval in hours
> > > # Set to "0" to disable auto purge feature
> > > autopurge.purgeInterval=1
> > >
> > >
> > > --------------------
> > >
> > >
> > > Here is the output I get on the client:
> > >
> > > 2014-09-26 12:43:20,603 DEBUG starting...
> > > 2014-09-26 12:43:20,604 INFO Connecting to 127.0.0.1:2181
> > > 2014-09-26 12:43:20,605 DEBUG Sending request(xid=None):
> > > Connect(protocol_version=0, last_zxid_seen=0, time_out=30000,
> > session_id=0,
> > >
> >
> passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
> > > read_only=None)
> > > 2014-09-26 12:43:20,609 INFO Zookeeper connection established, state:
> > > CONNECTED
> > > 2014-09-26 12:43:40,222 WARNING Connection dropped: outstanding
> heartbeat
> > > ping not received
> > > 2014-09-26 12:43:40,222 WARNING Transition to CONNECTING
> > > 2014-09-26 12:43:40,222 INFO Zookeeper connection lost
> > > 2014-09-26 12:43:40,222 DEBUG handle being disconnected
> > > 2014-09-26 12:43:40,747 INFO Connecting to 127.0.0.1:2181
> > > 2014-09-26 12:43:40,748 DEBUG Sending request(xid=None):
> > > Connect(protocol_version=0, last_zxid_seen=0, time_out=30000,
> > > session_id=92520388231233540,
> > > passwd='\xd1M\xb9\xb3\xae\xab\xa1!@x\x06nv\xb7\xe3*', read_only=None)
> > > 2014-09-26 12:43:40,750 INFO Zookeeper connection established, state:
> > > CONNECTED
> > > 2014-09-26 12:43:40,751 DEBUG handle being (re)connected
> > > 2014-09-26 12:43:55,611 DEBUG Sending request(xid=1): Close()
> > > 2014-09-26 12:43:55,614 INFO Closing connection to 127.0.0.1:2181
> > > 2014-09-26 12:43:55,615 INFO Zookeeper session lost, state: CLOSED
> > > 2014-09-26 12:43:55,615 WARNING handle lost
> > >
> > >
> > > And here is trace-level logging from the server side:
> > >
> > > 2014-09-26 12:43:20,605 [myid:] - INFO  [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket
> > > connection from /127.0.0.1:58959
> > > 2014-09-26 12:43:20,605 [myid:] - DEBUG [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment
> > > request
> > > from client /127.0.0.1:58959 client's lastZxid is 0x0
> > > 2014-09-26 12:43:20,605 [myid:] - INFO  [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868][] - Client attempting to
> > > establish new session at /127.0.0.1:58959
> > > 2014-09-26 12:43:20,605 [myid:] - TRACE [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - SessionTrackerImpl --- Adding
> > > session
> > > 0x148b2cd8b010004 30000
> > > 2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0
> > > cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004
> > > type:createSession cxid:0x0 zxid:0xfffffffffffffffe txntype:unknown
> > > reqpath:n/a
> > > 2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0
> > > cport:-1)::ZooTrace@71][] - SessionTrackerImpl --- Existing session
> > > 0x148b2cd8b010004 30000
> > > 2014-09-26 12:43:20,608 [myid:] - DEBUG
> > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request::
> > > sessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f
> > > txntype:-10 reqpath:n/a
> > > 2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@90][] -
> > > :Esessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f
> > > txntype:-10 reqpath:n/a
> > > 2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@71][] -
> > > SessionTrackerImpl --- Existing session 0x148b2cd8b010004 30000
> > > 2014-09-26 12:43:20,608 [myid:] - DEBUG
> > > [SyncThread:0:FinalRequestProcessor@160][] -
> sessionid:0x148b2cd8b010004
> > > type:createSession cxid:0x0 zxid:0x78f txntype:-10 reqpath:n/a
> > > 2014-09-26 12:43:20,609 [myid:] - INFO
> [SyncThread:0:ZooKeeperServer@617
> > > ][]
> > > - Established session 0x148b2cd8b010004 with negotiated timeout 30000
> for
> > > client /127.0.0.1:58959
> > > 2014-09-26 12:43:40,748 [myid:] - INFO  [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket
> > > connection from /127.0.0.1:58960
> > > 2014-09-26 12:43:40,749 [myid:] - DEBUG [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment
> > > request
> > > from client /127.0.0.1:58960 client's lastZxid is 0x0
> > > 2014-09-26 12:43:40,749 [myid:] - INFO  [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861][] - Client attempting to
> renew
> > > session 0x148b2cd8b010004 at /127.0.0.1:58960
> > > 2014-09-26 12:43:40,749 [myid:] - INFO  [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket connection
> > for
> > > client /127.0.0.1:58959 which had sessionid 0x148b2cd8b010004
> > > 2014-09-26 12:43:40,750 [myid:] - TRACE [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - Session 0x148b2cd8b010004 is
> > valid:
> > > true
> > > 2014-09-26 12:43:40,750 [myid:] - INFO  [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617][] - Established session
> > > 0x148b2cd8b010004 with negotiated timeout 30000 for client /
> > > 127.0.0.1:58960
> > > 2014-09-26 12:43:50,742 [myid:] - DEBUG
> > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request::
> > > sessionid:0x148b2cd8b010004 type:ping cxid:0xfffffffffffffffe
> > > zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
> > > 2014-09-26 12:43:50,742 [myid:] - DEBUG
> > > [SyncThread:0:FinalRequestProcessor@160][] -
> sessionid:0x148b2cd8b010004
> > > type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe
> txntype:unknown
> > > reqpath:n/a
> > > 2014-09-26 12:43:55,612 [myid:] - TRACE [ProcessThread(sid:0
> > > cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004
> > > type:closeSession
> > > cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
> > > 2014-09-26 12:43:55,612 [myid:] - INFO  [ProcessThread(sid:0
> > > cport:-1)::SessionTrackerImpl@198][] - Session closing:
> > 0x148b2cd8b010004
> > > 2014-09-26 12:43:55,612 [myid:] - INFO  [ProcessThread(sid:0
> > > cport:-1)::PrepRequestProcessor@494][] - Processed session termination
> > for
> > > sessionid: 0x148b2cd8b010004
> > > 2014-09-26 12:43:55,613 [myid:] - DEBUG
> > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request::
> > > sessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790
> > > txntype:-11 reqpath:n/a
> > > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@90][] -
> > > :Esessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790
> > > txntype:-11 reqpath:n/a
> > > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@71][] -
> > > SessionTrackerImpl --- Removing session 0x148b2cd8b010004
> > > 2014-09-26 12:43:55,614 [myid:] - DEBUG
> > > [SyncThread:0:FinalRequestProcessor@160][] -
> sessionid:0x148b2cd8b010004
> > > type:closeSession cxid:0x1 zxid:0x790 txntype:-11 reqpath:n/a
> > > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:NIOServerCnxn@168
> > ][]
> > > - Add a buffer to outgoingBuffers, sk
> > sun.nio.ch.SelectionKeyImpl@28cfdeb5
> > > is valid: true
> > > 2014-09-26 12:43:55,615 [myid:] - INFO  [NIOServerCxn.Factory:
> > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket connection
> > for
> > > client /127.0.0.1:58960 which had sessionid 0x148b2cd8b010004
> > >
> > >
> > > I'm assuming that this is due to a misunderstanding on my part.  Can
> > anyone
> > > point me in the right direction?
> > >
> > > Much appreciated!
> > > - Ian
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message