zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Rose <ianr...@fullstory.com>
Subject Heartbeats not being received / responded to?
Date Fri, 26 Sep 2014 16:48:35 GMT
Hi all -

I've just gotten started using SolrCloud, which uses Zookeeper for
coordination; I am otherwise completely new to Zookeeper.  Now I am trying
to query Zookeeper directly for some simple information.  I am finding,
however, that although my clients are able to connect they very frequently
receive timeouts.  It almost seems like the server isn't receiving the
heartbeat messages at all (or isn't responding to them).  I've seen similar
behavior both when using the go-zookeeper
<https://godoc.org/github.com/samuel/go-zookeeper/zk> and kazoo
<https://github.com/python-zk/kazoo> (python) client libraries (I wanted to
try >1 to ensure that it wasn't a client lib problem).

My config is very simple: I am running the client and a single Zookeeper
node on the same machine (my laptop).  There are no other clients of the
Zookeeper node while I am running these tests, so there is no practical
possibility that the JVM is overloaded or GCing.

Here is a very basic kazoo client that I am using.  Obviously it isn't
doing any "real" work right now - this is just to demonstrate the
disconnects.

#!/usr/bin/env python2.7

from kazoo.client import KazooClient, KazooState
import logging
import time

def my_listener(state):
  if state == KazooState.LOST:
    # Register somewhere that the session was lost
    logging.warning('handle lost')
  elif state == KazooState.SUSPENDED:
    # Handle being disconnected from Zookeeper
    logging.debug('handle being disconnected')
  else:
    # Handle being connected/reconnected to Zookeeper
    logging.debug('handle being (re)connected')


if __name__ == '__main__':
  logging.basicConfig(format='%(asctime)-15s %(levelname)s %(message)s',
level=logging.DEBUG)

  logging.debug('starting...')
  zk = KazooClient(hosts='127.0.0.1:2181', timeout=30)
  zk.start()
  zk.add_listener(my_listener)

  time.sleep(35)

  zk.stop()

--------------------

Here is my zookeeper config:

# The number of milliseconds of each tick
tickTime=2000

# The number of ticks that the initial
# synchronization phase can take
initLimit=10

# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5

# the directory where the snapshot is stored.
dataDir=/Users/ianrose/Code/zookeeper/var/data

# the port at which the clients will connect
clientPort=2181

# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=5

# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=1


--------------------


Here is the output I get on the client:

2014-09-26 12:43:20,603 DEBUG starting...
2014-09-26 12:43:20,604 INFO Connecting to 127.0.0.1:2181
2014-09-26 12:43:20,605 DEBUG Sending request(xid=None):
Connect(protocol_version=0, last_zxid_seen=0, time_out=30000, session_id=0,
passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
read_only=None)
2014-09-26 12:43:20,609 INFO Zookeeper connection established, state:
CONNECTED
2014-09-26 12:43:40,222 WARNING Connection dropped: outstanding heartbeat
ping not received
2014-09-26 12:43:40,222 WARNING Transition to CONNECTING
2014-09-26 12:43:40,222 INFO Zookeeper connection lost
2014-09-26 12:43:40,222 DEBUG handle being disconnected
2014-09-26 12:43:40,747 INFO Connecting to 127.0.0.1:2181
2014-09-26 12:43:40,748 DEBUG Sending request(xid=None):
Connect(protocol_version=0, last_zxid_seen=0, time_out=30000,
session_id=92520388231233540,
passwd='\xd1M\xb9\xb3\xae\xab\xa1!@x\x06nv\xb7\xe3*', read_only=None)
2014-09-26 12:43:40,750 INFO Zookeeper connection established, state:
CONNECTED
2014-09-26 12:43:40,751 DEBUG handle being (re)connected
2014-09-26 12:43:55,611 DEBUG Sending request(xid=1): Close()
2014-09-26 12:43:55,614 INFO Closing connection to 127.0.0.1:2181
2014-09-26 12:43:55,615 INFO Zookeeper session lost, state: CLOSED
2014-09-26 12:43:55,615 WARNING handle lost


And here is trace-level logging from the server side:

2014-09-26 12:43:20,605 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket
connection from /127.0.0.1:58959
2014-09-26 12:43:20,605 [myid:] - DEBUG [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment request
from client /127.0.0.1:58959 client's lastZxid is 0x0
2014-09-26 12:43:20,605 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868][] - Client attempting to
establish new session at /127.0.0.1:58959
2014-09-26 12:43:20,605 [myid:] - TRACE [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - SessionTrackerImpl --- Adding session
0x148b2cd8b010004 30000
2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0
cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004
type:createSession cxid:0x0 zxid:0xfffffffffffffffe txntype:unknown
reqpath:n/a
2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0
cport:-1)::ZooTrace@71][] - SessionTrackerImpl --- Existing session
0x148b2cd8b010004 30000
2014-09-26 12:43:20,608 [myid:] - DEBUG
[SyncThread:0:FinalRequestProcessor@88][] - Processing request::
sessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f
txntype:-10 reqpath:n/a
2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@90][] -
:Esessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f
txntype:-10 reqpath:n/a
2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@71][] -
SessionTrackerImpl --- Existing session 0x148b2cd8b010004 30000
2014-09-26 12:43:20,608 [myid:] - DEBUG
[SyncThread:0:FinalRequestProcessor@160][] - sessionid:0x148b2cd8b010004
type:createSession cxid:0x0 zxid:0x78f txntype:-10 reqpath:n/a
2014-09-26 12:43:20,609 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617][]
- Established session 0x148b2cd8b010004 with negotiated timeout 30000 for
client /127.0.0.1:58959
2014-09-26 12:43:40,748 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket
connection from /127.0.0.1:58960
2014-09-26 12:43:40,749 [myid:] - DEBUG [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment request
from client /127.0.0.1:58960 client's lastZxid is 0x0
2014-09-26 12:43:40,749 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861][] - Client attempting to renew
session 0x148b2cd8b010004 at /127.0.0.1:58960
2014-09-26 12:43:40,749 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket connection for
client /127.0.0.1:58959 which had sessionid 0x148b2cd8b010004
2014-09-26 12:43:40,750 [myid:] - TRACE [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - Session 0x148b2cd8b010004 is valid:
true
2014-09-26 12:43:40,750 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617][] - Established session
0x148b2cd8b010004 with negotiated timeout 30000 for client /127.0.0.1:58960
2014-09-26 12:43:50,742 [myid:] - DEBUG
[SyncThread:0:FinalRequestProcessor@88][] - Processing request::
sessionid:0x148b2cd8b010004 type:ping cxid:0xfffffffffffffffe
zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2014-09-26 12:43:50,742 [myid:] - DEBUG
[SyncThread:0:FinalRequestProcessor@160][] - sessionid:0x148b2cd8b010004
type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown
reqpath:n/a
2014-09-26 12:43:55,612 [myid:] - TRACE [ProcessThread(sid:0
cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004 type:closeSession
cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a
2014-09-26 12:43:55,612 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::SessionTrackerImpl@198][] - Session closing: 0x148b2cd8b010004
2014-09-26 12:43:55,612 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@494][] - Processed session termination for
sessionid: 0x148b2cd8b010004
2014-09-26 12:43:55,613 [myid:] - DEBUG
[SyncThread:0:FinalRequestProcessor@88][] - Processing request::
sessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790
txntype:-11 reqpath:n/a
2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@90][] -
:Esessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790
txntype:-11 reqpath:n/a
2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@71][] -
SessionTrackerImpl --- Removing session 0x148b2cd8b010004
2014-09-26 12:43:55,614 [myid:] - DEBUG
[SyncThread:0:FinalRequestProcessor@160][] - sessionid:0x148b2cd8b010004
type:closeSession cxid:0x1 zxid:0x790 txntype:-11 reqpath:n/a
2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:NIOServerCnxn@168][]
- Add a buffer to outgoingBuffers, sk sun.nio.ch.SelectionKeyImpl@28cfdeb5
is valid: true
2014-09-26 12:43:55,615 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket connection for
client /127.0.0.1:58960 which had sessionid 0x148b2cd8b010004


I'm assuming that this is due to a misunderstanding on my part.  Can anyone
point me in the right direction?

Much appreciated!
- Ian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message