zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@apache.org>
Subject Re: Observers Unusable
Date Wed, 14 Oct 2015 08:28:00 GMT
Can you tell why the server wasn't responding to the notifications from the observer? The log
file is from the observer and it sounds like it is being able to send messages out, but it
isn't clear why the server isn't responding.

-Flavio

> On 14 Oct 2015, at 01:51, elastic search <elastic.l.k@gmail.com> wrote:
> 
> 
> Hello Experts
> 
> We have 2 Observers running in AWS connecting over to local ZK Ensemble in our own DataCenter.
> 
> There have been instances where we see network drop for a minute between the networks.
> However the Observers take around 15 minutes to recover even if the network outage is
for a minute.
> 
> From the logs
> java.net.SocketTimeoutException: Read timed out
> 2015-10-13 22:26:03,927 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 400
> 2015-10-13 22:26:04,328 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 800
> 2015-10-13 22:26:05,129 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 1600
> 2015-10-13 22:26:06,730 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 3200
> 2015-10-13 22:26:09,931 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 6400
> 2015-10-13 22:26:16,332 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 12800
> 2015-10-13 22:26:29,133 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 25600
> 2015-10-13 22:26:54,734 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 51200
> 2015-10-13 22:27:45,935 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:28:45,936 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:29:45,937 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:30:45,938 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:31:45,939 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:32:45,940 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:33:45,941 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:34:45,942 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:35:45,943 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:36:45,944 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:37:45,945 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:38:45,946 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:39:45,947 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:40:45,948 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 2015-10-13 22:41:45,949 [myid:4] - INFO  [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849]
- Notification time out: 60000
> 
> And then finally exits the QuorumCnxManager run loop with the following message
> WARN  [RecvWorker:2:QuorumCnxManager$RecvWorker@780] - Connection broken for id 2
> 
> How can we ensure the observer does not go out for service such a long duration ?
> 
> Attached the full logs 
> 
> Please help
> Thanks
> 
> <zookeeper.log.zip>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message