zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elastic search <elastic....@gmail.com>
Subject Observers Unusable
Date Wed, 14 Oct 2015 00:51:50 GMT
Hello Experts

We have 2 Observers running in AWS connecting over to local ZK Ensemble in
our own DataCenter.

There have been instances where we see network drop for a minute between
the networks.
However the Observers take around 15 minutes to recover even if the network
outage is for a minute.

>From the logs
java.net.SocketTimeoutException: Read timed out
2015-10-13 22:26:03,927 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 400
2015-10-13 22:26:04,328 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 800
2015-10-13 22:26:05,129 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 1600
2015-10-13 22:26:06,730 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 3200
2015-10-13 22:26:09,931 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 6400
2015-10-13 22:26:16,332 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 12800
2015-10-13 22:26:29,133 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 25600
2015-10-13 22:26:54,734 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 51200
2015-10-13 22:27:45,935 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:28:45,936 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:29:45,937 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:30:45,938 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:31:45,939 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:32:45,940 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:33:45,941 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:34:45,942 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:35:45,943 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:36:45,944 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:37:45,945 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:38:45,946 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:39:45,947 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:40:45,948 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2015-10-13 22:41:45,949 [myid:4] - INFO
 [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000

And then finally exits the QuorumCnxManager run loop with the following
message
WARN  [RecvWorker:2:QuorumCnxManager$RecvWorker@780] - Connection broken
for id 2

How can we ensure the observer does not go out for service such a long
duration ?

Attached the full logs

Please help
Thanks

Mime
View raw message