zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Wong <...@airtime.com>
Subject Re: ZooKeeper Cluster Failed (and did not recover)
Date Thu, 30 Nov 2017 18:15:28 GMT
Abe,

Thanks for responding. We are currently on ZooKeeper 3.4.10; do you think going to 3.4.11
would make a difference?

We can certainly turn on debug logs, but because the problem has only occurred twice in the
last several (probably 8-9) months and we’re unable to reproduce it intentionally, it might
take a while.

Is there anything else you suggest we do to troubleshoot in the meantime?


> On Nov 29, 2017, at 4:52 PM, Abraham Fine <afine@apache.org> wrote:
> 
> Hi Jim-
> 
> What version of ZooKeeper are you using?
> 
> It would be great to get a better understanding of what is causing:
> Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason:
> Not sufficient followers synced, only synced with sids: [ 3,4 ]
> Nov 29 19:32:12 localhost     at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
> Nov 29 19:32:12 localhost     at
> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)
> 
> This failure appears to align with:
> Nov 29 19:32:12 localhost zookeeper: WARN 
> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> following the leader
> 
> Is there any possibility debug logs could be enabled?
> 
> Thanks,
> Abe
> 
> Would you be able to turn on debug logging, I think that might give us
> much more visibility?
> On Wed, Nov 29, 2017, at 18:06, Jim Wong wrote:
>> Hey, folks.  We’re running a 5-node ZooKeeper cluster to manage server
>> configuration and discovery.  This morning, we experienced an outage in
>> which the entire cluster went offline for approximately 7 minutes. 
>> Restarting the ZooKeeper daemons on each node in the cluster got things
>> working again, so there doesn’t appear to have been an underlying network
>> or hardware fault that would explain the cluster’s inability to recover.
>> 
>> This is actually the second time this has happened in the last several
>> months, and all of our attempts to reproduce the behavior by
>> intentionally doing bad things to the cluster or the leader have failed.
>> 
>> With all that in mind, we were wondering if anyone had any thoughts as to
>> what might be going on, as we’re out of ideas.  If anyone can provide any
>> hints or guidance, we’d be very grateful.
>> 
>> I’ve included brief excerpts from logs from some of the nodes; the
>> problem started around 17:32:12 and wasn’t resolved until 17:39, after
>> we’d restarted ZooKeeper.  This was was originally the leader:
>> 
>> Nov 29 19:31:44 localhost zookeeper: INFO  [ProcessThread(sid:4
>> cport:-1)::PrepRequestProcessor@486] - Processed session termination for
>> sessionid: 0x15fdd1d1e0c0313
>> Nov 29 19:32:11 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:11 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:11 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:12 localhost zookeeper: ERROR
>> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@648] - Unexpected
>> exception causing shutdown while sock still open
>> Nov 29 19:32:12 localhost zookeeper: WARN 
>> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@661] - ******* GOODBYE
>> /10.1.29.76:49198 ********
>> Nov 29 19:32:12 localhost zookeeper: ERROR
>> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@648] - Unexpected
>> exception causing shutdown while sock still open
>> Nov 29 19:32:12 localhost zookeeper: ERROR
>> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@648] - Unexpected
>> exception causing shutdown while sock still open
>> Nov 29 19:32:12 localhost zookeeper: WARN 
>> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@661] - ******* GOODBYE
>> /10.1.0.252:50932 ********
>> Nov 29 19:32:12 localhost zookeeper: WARN 
>> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@661] - ******* GOODBYE
>> /10.1.48.67:44882 ********
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my
>> state)
>> Nov 29 19:32:12 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:12 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:12 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my
>> state)
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@511] - Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@517] - Shutdown called
>> Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason:
>> Not sufficient followers synced, only synced with sids: [ 3,4 ]
>> Nov 29 19:32:12 localhost     at
>> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
>> Nov 29 19:32:12 localhost     at
>> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [LearnerCnxAcceptor-zookeeper-4-internal.prod.airtime.com/10.1.72.145:2888:Leader$LearnerCnxAcceptor@331]
>> - exception while shutting down acceptor: java.net.SocketException:
>> Socket closed
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed
>> socket connection for client /10.1.10.8:42938 which had sessionid
>> 0x45f3a424dd20208
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting
>> down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@764] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO  [ProcessThread(sid:4
>> cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
>> down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProcessor@676]
>> - Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
>> shutdown of request processor complete
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [SyncThread:4:SyncRequestProcessor@186] - SyncRequestProcessor exited!
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [CommitProcessor:4:CommitProcessor@153] - CommitProcessor exited loop!
>> Nov 29 19:32:12 localhost zookeeper: WARN 
>> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@661] - ******* GOODBYE
>> /10.1.37.117:52956 ********
>> Nov 29 19:32:12 localhost zookeeper: WARN 
>> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@673] - Ignoring
>> unexpected exception
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
>> election. My id =  4, proposed zxid=0x2b0260b5a8
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPREQUEST on eth0 to
>> 10.1.64.1 port 67 (xid=0x79592ad4)
>> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPACK from 10.1.64.1
>> (xid=0x79592ad4)
>> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: bound to 10.1.72.145 --
>> renewal in 1608 seconds.
>> Nov 29 19:32:12 zookeeper-4 ec2net: [get_meta] Trying to get
>> http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:c0:01:e2:a6:b2/local-ipv4s
>> Nov 29 19:32:12 zookeeper-4 ec2net: [rewrite_aliases] Rewriting aliases
>> of eth0
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 400
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:13 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:13 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 800
>> Nov 29 19:32:13 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:13 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:13 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:14 localhost zookeeper: INFO 
>> [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop!
>> Nov 29 19:32:14 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 1600
>> Nov 29 19:32:14 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:15 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 3200
>> Nov 29 19:32:15 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:18 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 6400
>> Nov 29 19:32:18 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:23 zookeeper-4 dhclient[2295]: XMT: Solicit on eth0,
>> interval 125280ms.
>> Nov 29 19:32:25 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 12800
>> Nov 29 19:32:25 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:38 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 25600
>> Nov 29 19:32:38 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:33:03 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 51200
>> Nov 29 19:33:03 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:33:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:33:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:34:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:34:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:35:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:35:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:36:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:36:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:37:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:37:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:38:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:38:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: WARN 
>> [RecvWorker:5:QuorumCnxManager$RecvWorker@1028] - Connection broken for
>> id 5, my id = 4, error = 
>> Nov 29 19:39:45 localhost zookeeper: WARN 
>> [RecvWorker:5:QuorumCnxManager$RecvWorker@1031] - Interrupting SendWorker
>> Nov 29 19:39:45 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: last message repeated 7 times
>> Nov 29 19:39:45 localhost zookeeper: WARN 
>> [SendWorker:5:QuorumCnxManager$SendWorker@949] - Exception when using
>> channel: for id 5 my id = 4 error = java.net.SocketException: Socket
>> closed
>> Nov 29 19:39:45 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: last message repeated 2 times
>> Nov 29 19:39:45 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 localhost zookeeper: WARN 
>> [SendWorker:5:QuorumCnxManager$SendWorker@954] - Send worker leaving
>> thread
>> Nov 29 19:39:47 localhost zookeeper: INFO 
>> [zookeeper-4-internal.prod.airtime.com/10.1.72.145:3888:QuorumCnxManager$Listener@746]
>> - Received connection request /10.1.48.67:58398
>> Nov 29 19:39:47 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x1
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:47 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:47 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
>> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:47 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:47 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:51 zookeeper-4 monit[7081]: restart service
>> 'zookeeper-server' on user request
>> 
>> 
>> And here are logs from one of the followers:
>> 
>> Nov 29 19:32:11 localhost zookeeper: INFO 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating
>> client: 0x15fc2e9562e0187
>> Nov 29 19:32:12 localhost zookeeper: WARN 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>> following the leader
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed
>> socket connection for client /10.1.29.33:57934 which had sessionid
>> 0x35f3a424dc802a2
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting
>> down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
>> down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [FollowerRequestProcessor:5:FollowerRequestProcessor@97] -
>> FollowerRequestProcessor exited loop!
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
>> shutdown of request processor complete
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [CommitProcessor:5:CommitProcessor@153] - CommitProcessor exited loop!
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] -
>> Shutting down
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [SyncThread:5:SyncRequestProcessor@186] - SyncRequestProcessor exited!
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
>> election. My id =  5, proposed zxid=0x2b0260b5a8
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:12 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 400
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 800
>> Nov 29 19:32:12 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:13 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 1600
>> Nov 29 19:32:13 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:15 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 3200
>> Nov 29 19:32:15 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:18 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 6400
>> Nov 29 19:32:18 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:24 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 12800
>> Nov 29 19:32:24 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:32:37 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 25600
>> Nov 29 19:32:37 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:33:03 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 51200
>> Nov 29 19:33:03 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:33:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:33:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:34:42 localhost zookeeper: WARN 
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
>> end of stream exception
>> Nov 29 19:34:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:34:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:35:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:35:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:36:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:36:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:37:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:37:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:38:54 localhost zookeeper: INFO 
>> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
>> Notification time out: 60000
>> Nov 29 19:38:54 localhost zookeeper: INFO 
>> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
>> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
>> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
>> state)
>> Nov 29 19:39:45 zookeeper-5 monit[7157]: restart service
>> 'zookeeper-server' on user request
>> 
>> 


Mime
View raw message