zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Tyree <tyree...@gmail.com>
Subject Random unexpected exception in followers every so often
Date Thu, 25 Jul 2013 17:58:52 GMT
Hello all,

In our development Zookeeper cluster, about once a week we've started
seeing two of the followers have an Unexpected Exception with Zookeeper
3.4.5. Looking at the metrics from the mntr four letter word, there is no
particular amount of load or number of sessions that leads to this
behavior, it just seems totally out of the blue. Here is a snippet of INFO
level logs from that period of time (IP addresses have been scrubbed):

2013-07-25 04:19:20,066 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxnFactory@197] - Accepted socket
connection from /xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,066 - WARN  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@793] - Connection request
from old client /xxx.xxx.xxx.xxx:xxxxx; will be dropped if server is in r-o
mode
2013-07-25 04:19:20,067 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@839] - Client attempting to
establish new session at /xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,068 - INFO  [CommitProcessor:6:ZooKeeperServer@595] -
Established session 0x63ffea596ee912b with negotiated timeout 30000 for
client /xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,068 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@863] - got auth packet
/xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,068 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@897] - auth success
/xxx.xxx.xxx.xxx:xxxxx
(--------lots of normal activity----------)
2013-07-25 04:19:20,317 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@839] - Client attempting to
establish new session at /xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,318 - INFO  [CommitProcessor:6:ZooKeeperServer@595] -
Established session 0x63ffea596ee912c with negotiated timeout 10000 for
client /xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,318 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@863] - got auth packet
/xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,318 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:ZooKeeperServer@897] - auth success
/xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,319 - WARN
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
following the leader
2013-07-25 04:19:20,319 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
2013-07-25 04:19:20,319 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1001] - Closed
socket connection for client /xxx.xxx.xxx.xxx:xxxxx which had sessionid
0x63ffea596ee8bab
(-----------One of these for each active session it seems-----------------))
2013-07-25 04:19:20,375 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1001] - Closed
socket connection for client /xxx.xxx.xxx.xxx:xxxxx which had sessionid
0x83fdf0b85832d61
2013-07-25 04:19:20,376 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] -
Shutting down
2013-07-25 04:19:20,376 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - shutting
down
2013-07-25 04:19:20,376 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@105] -
Shutting down
2013-07-25 04:19:20,376 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] - Shutting
down
2013-07-25 04:19:20,376 - INFO
 [FollowerRequestProcessor:6:FollowerRequestProcessor@95] -
FollowerRequestProcessor exited loop!
2013-07-25 04:19:20,376 - INFO  [CommitProcessor:6:CommitProcessor@150] -
CommitProcessor exited loop!
2013-07-25 04:19:20,376 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415] -
shutdown of request processor complete
2013-07-25 04:19:20,376 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@175] -
Shutting down
2013-07-25 04:19:20,376 - INFO  [SyncThread:6:SyncRequestProcessor@155] -
SyncRequestProcessor exited!
2013-07-25 04:19:20,376 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING
2013-07-25 04:19:20,377 - INFO
 [QuorumPeer[myid=6]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot
/var/lib/zookeeper/data/version-2/snapshot.4270965c70
2013-07-25 04:19:20,707 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxnFactory@197] - Accepted socket
connection from /xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,707 - WARN  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@354] - Exception causing close
of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2013-07-25 04:19:20,707 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@1001] - Closed socket
connection for client /xxx.xxx.xxx.xxx:xxxxx (no session established for
client)
2013-07-25 04:19:20,708 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxnFactory@197] - Accepted socket
connection from /xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,708 - WARN  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@354] - Exception causing close
of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2013-07-25 04:19:20,708 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@1001] - Closed socket
connection for client /xxx.xxx.xxx.xxx:xxxxx (no session established for
client)
2013-07-25 04:19:20,709 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxnFactory@197] - Accepted socket
connection from /xxx.xxx.xxx.xxx:xxxxx
2013-07-25 04:19:20,710 - WARN  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@354] - Exception causing close
of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2013-07-25 04:19:20,710 - INFO  [NIOServerCxn.Factory:
0.0.0.0/xxx.xxx.xxx.xxx:xxxxx:NIOServerCnxn@1001] - Closed socket
connection for client /xxx.xxx.xxx.xxx:xxxxx (no session established for
client)

Whenever this event happens on one of the followers, none of the other
Zookeeper servers (there are 5 in the cluster) seem to notice. Does anyone
have any idea what might be the issue here? Not sure where to start with
this one.

Thanks,
Steve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message