zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Swaminathan Gnanaskandan <gsw...@gmail.com>
Subject Observers taking a long time to recover after network outage
Date Fri, 03 Feb 2017 14:48:21 GMT
Hi,
We have a 5 node ZK-3.4.9-cluster running in DC1 and a couple of observers
across WAN in DC2. There was a network outage between DC1 and DC2 for about
1 minute after which the observers took almost 15 minutes to connect to the
leader in DC1.
How can I ensure that observer recovers faster? My expectation was that it
would reconnect once the network healed.

zookeeper-leader.log
----------------------
zookeeper.log.3:2017-02-02 05:37:22,045 - ERROR
[LearnerHandler-/10.110.11.16:55076:LearnerHandler@631] - Unexpected
exception causing shutdown while sock still open
zookeeper.log.3:2017-02-02 05:37:22,045 - WARN
[LearnerHandler-/10.110.11.16:55076:LearnerHandler@644] - ******* GOODBYE /
10.110.11.16:55076 ********
zookeeper.log.3:2017-02-02 05:54:05,069 - INFO  [zk-02/10.100.104.82:3888
:QuorumCnxManager$Listener@541] - Received connection request /
10.110.11.16:34573
zookeeper.log.3:2017-02-02 05:54:05,249 - INFO
[LearnerHandler-/10.110.11.16:40257:LearnerHandler@329] - Follower sid: 17
: info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@35bc2f85
zookeeper.log.3:2017-02-02 05:54:05,320 - INFO
[LearnerHandler-/10.110.11.16:40257:LearnerHandler@384] - Synchronizing
with Follower sid: 17 maxCommittedLog=0x104c66a22
minCommittedLog=0x104c6682e peerLastZxid=0x100919196
zookeeper.log.3:2017-02-02 05:54:05,320 - WARN
[LearnerHandler-/10.110.11.16:40257:LearnerHandler@451] - Unhandled
proposal scenario
zookeeper.log.3:2017-02-02 05:54:05,321 - INFO
[LearnerHandler-/10.110.11.16:40257:LearnerHandler@458] - Sending SNAP
zookeeper.log.3:2017-02-02 05:54:05,321 - INFO
[LearnerHandler-/10.110.11.16:40257:LearnerHandler@482] - Sending snapshot
last zxid of peer is 0x100919196  zxid of leader is 0x104c66a22sent zxid of
db as 0x104c66a22
zookeeper.log.3:2017-02-02 05:54:06,208 - INFO
[LearnerHandler-/10.110.11.16:40257:LearnerHandler@518] - Received
NEWLEADER-ACK message from 17


zookeeper-observer.log
----------------------
zookeeper.log.2:2017-02-02 05:37:22,637 - WARN
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:Observer@77] - Exception when
observing the leader
zookeeper.log.2: at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:833)
zookeeper.log.2:2017-02-02 05:37:22,638 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:Observer@135] - shutdown called
zookeeper.log.2: at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:837)
zookeeper.log.2:2017-02-02 05:37:22,672 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@497] - shutting
down
zookeeper.log.2:2017-02-02 05:37:22,672 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:ObserverRequestProcessor@121] -
Shutting down
zookeeper.log.2:2017-02-02 05:37:22,672 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
down
zookeeper.log.2:2017-02-02 05:37:22,672 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
shutdown of request processor complete
zookeeper.log.2:2017-02-02 05:37:22,672 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:QuorumPeer@774] - LOOKING
zookeeper.log.2:2017-02-02 05:37:22,673 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
election. My id =  17, proposed zxid=0x8000000000000000
zookeeper.log.2:2017-02-02 05:37:22,874 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 400
zookeeper.log.2:2017-02-02 05:37:23,274 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 800
zookeeper.log.2:2017-02-02 05:37:24,075 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 1600
zookeeper.log.2:2017-02-02 05:37:25,675 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 3200
zookeeper.log.2:2017-02-02 05:37:28,876 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 6400
zookeeper.log.2:2017-02-02 05:37:35,276 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 12800
zookeeper.log.2:2017-02-02 05:37:48,076 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 25600
zookeeper.log.2:2017-02-02 05:38:13,677 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 51200
zookeeper.log.2:2017-02-02 05:39:04,877 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.2:2017-02-02 05:40:04,878 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.2:2017-02-02 05:41:04,878 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.2:2017-02-02 05:42:04,879 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.2:2017-02-02 05:43:04,879 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.2:2017-02-02 05:44:04,879 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:45:04,880 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:46:04,880 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:47:04,881 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:48:04,881 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:49:04,881 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:50:04,882 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:51:04,882 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:52:04,882 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:53:04,883 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:54:04,883 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
Notification time out: 60000
zookeeper.log.1:2017-02-02 05:54:05,103 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:QuorumPeer@831] - OBSERVING
zookeeper.log.1:2017-02-02 05:54:05,103 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@173] - Created
server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 60000
datadir /var/lib/zookeeper/data/version-2 snapdir
/var/lib/zookeeper/data/version-2
zookeeper.log.1:2017-02-02 05:54:05,103 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:ObserverZooKeeperServer@64] -
syncEnabled =false
zookeeper.log.1:2017-02-02 05:54:05,108 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@149] -
Resolved hostname: zk-02 to address: zk-02/10.100.104.82
zookeeper.log.1:2017-02-02 05:54:05,108 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:Observer@65] - Observing zk-02/
10.100.104.82:2888
zookeeper.log.1:2017-02-02 05:54:05,182 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot
/var/lib/zookeeper/data/version-2/snapshot.100919196
zookeeper.log.1:2017-02-02 05:54:05,353 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:Learner@329] - Getting a snapshot
from leader
zookeeper.log.1:2017-02-02 05:54:06,109 - INFO
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] -
Snapshotting: 0x104c66a22 to
/var/lib/zookeeper/data/version-2/snapshot.104c66a22
zookeeper.log.1:2017-02-02 05:54:06,169 - WARN
[QuorumPeer[myid=17]/0:0:0:0:0:0:0:0:2181:Learner@408] - Got zxid
0x104c66a23 expected 0x1

Should I have some monitor process on the observer that checks if
"Zookeeper is not serving requests" and restart it?

Regards,
Swami

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message