zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Fine <af...@apache.org>
Subject Re: ZooKeeper Cluster Failed (and did not recover)
Date Thu, 30 Nov 2017 18:58:05 GMT
Hi Jim-

I don't think moving to 3.4.11 should make too much of a difference, it
just helps to know the version so I can map the log messages to code.

There are a couple things you could look at in the meantime. I would
start with heap usage to make sure this is not the result of GC pauses.
You may also want to reevaluate the initLimit if you are storing a large
amount of data in ZK.

Hopefully this helps.

Abe

On Thu, Nov 30, 2017, at 10:15, Jim Wong wrote:
> Abe,
> 
> Thanks for responding. We are currently on ZooKeeper 3.4.10; do you think
> going to 3.4.11 would make a difference?
> 
> We can certainly turn on debug logs, but because the problem has only
> occurred twice in the last several (probably 8-9) months and we’re unable
> to reproduce it intentionally, it might take a while.
> 
> Is there anything else you suggest we do to troubleshoot in the meantime?
> 
> 
> > On Nov 29, 2017, at 4:52 PM, Abraham Fine <afine@apache.org> wrote:
> > 
> > Hi Jim-
> > 
> > What version of ZooKeeper are you using?
> > 
> > It would be great to get a better understanding of what is causing:
> > Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason:
> > Not sufficient followers synced, only synced with sids: [ 3,4 ]
> > Nov 29 19:32:12 localhost     at
> > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
> > Nov 29 19:32:12 localhost     at
> > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)
> > 
> > This failure appears to align with:
> > Nov 29 19:32:12 localhost zookeeper: WARN 
> > [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> > following the leader
> > 
> > Is there any possibility debug logs could be enabled?
> > 
> > Thanks,
> > Abe
> > 
> > Would you be able to turn on debug logging, I think that might give us
> > much more visibility?
> > On Wed, Nov 29, 2017, at 18:06, Jim Wong wrote:
> >> Hey, folks.  We’re running a 5-node ZooKeeper cluster to manage server
> >> configuration and discovery.  This morning, we experienced an outage in
> >> which the entire cluster went offline for approximately 7 minutes. 
> >> Restarting the ZooKeeper daemons on each node in the cluster got things
> >> working again, so there doesn’t appear to have been an underlying network
> >> or hardware fault that would explain the cluster’s inability to recover.
> >> 
> >> This is actually the second time this has happened in the last several
> >> months, and all of our attempts to reproduce the behavior by
> >> intentionally doing bad things to the cluster or the leader have failed.
> >> 
> >> With all that in mind, we were wondering if anyone had any thoughts as to
> >> what might be going on, as we’re out of ideas.  If anyone can provide any
> >> hints or guidance, we’d be very grateful.
> >> 
> >> I’ve included brief excerpts from logs from some of the nodes; the
> >> problem started around 17:32:12 and wasn’t resolved until 17:39, after
> >> we’d restarted ZooKeeper.  This was was originally the leader:
> >> 
> >> Nov 29 19:31:44 localhost zookeeper: INFO  [ProcessThread(sid:4
> >> cport:-1)::PrepRequestProcessor@486] - Processed session termination for
> >> sessionid: 0x15fdd1d1e0c0313
> >> Nov 29 19:32:11 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:11 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:11 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:12 localhost zookeeper: ERROR
> >> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@648] - Unexpected
> >> exception causing shutdown while sock still open
> >> Nov 29 19:32:12 localhost zookeeper: WARN 
> >> [LearnerHandler-/10.1.29.76:49198:LearnerHandler@661] - ******* GOODBYE
> >> /10.1.29.76:49198 ********
> >> Nov 29 19:32:12 localhost zookeeper: ERROR
> >> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@648] - Unexpected
> >> exception causing shutdown while sock still open
> >> Nov 29 19:32:12 localhost zookeeper: ERROR
> >> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@648] - Unexpected
> >> exception causing shutdown while sock still open
> >> Nov 29 19:32:12 localhost zookeeper: WARN 
> >> [LearnerHandler-/10.1.0.252:50932:LearnerHandler@661] - ******* GOODBYE
> >> /10.1.0.252:50932 ********
> >> Nov 29 19:32:12 localhost zookeeper: WARN 
> >> [LearnerHandler-/10.1.48.67:44882:LearnerHandler@661] - ******* GOODBYE
> >> /10.1.48.67:44882 ********
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my
> >> state)
> >> Nov 29 19:32:12 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:12 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:12 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my
> >> state)
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@511] - Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader@517] - Shutdown called
> >> Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason:
> >> Not sufficient followers synced, only synced with sids: [ 3,4 ]
> >> Nov 29 19:32:12 localhost     at
> >> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517)
> >> Nov 29 19:32:12 localhost     at
> >> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492)
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [LearnerCnxAcceptor-zookeeper-4-internal.prod.airtime.com/10.1.72.145:2888:Leader$LearnerCnxAcceptor@331]
> >> - exception while shutting down acceptor: java.net.SocketException:
> >> Socket closed
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed
> >> socket connection for client /10.1.10.8:42938 which had sessionid
> >> 0x45f3a424dd20208
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting
> >> down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@764] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO  [ProcessThread(sid:4
> >> cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop!
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
> >> down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProcessor@676]
> >> - Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
> >> shutdown of request processor complete
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [SyncThread:4:SyncRequestProcessor@186] - SyncRequestProcessor exited!
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [CommitProcessor:4:CommitProcessor@153] - CommitProcessor exited loop!
> >> Nov 29 19:32:12 localhost zookeeper: WARN 
> >> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@661] - ******* GOODBYE
> >> /10.1.37.117:52956 ********
> >> Nov 29 19:32:12 localhost zookeeper: WARN 
> >> [LearnerHandler-/10.1.37.117:52956:LearnerHandler@673] - Ignoring
> >> unexpected exception
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
> >> election. My id =  4, proposed zxid=0x2b0260b5a8
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPREQUEST on eth0 to
> >> 10.1.64.1 port 67 (xid=0x79592ad4)
> >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPACK from 10.1.64.1
> >> (xid=0x79592ad4)
> >> Nov 29 19:32:12 zookeeper-4 dhclient[2170]: bound to 10.1.72.145 --
> >> renewal in 1608 seconds.
> >> Nov 29 19:32:12 zookeeper-4 ec2net: [get_meta] Trying to get
> >> http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:c0:01:e2:a6:b2/local-ipv4s
> >> Nov 29 19:32:12 zookeeper-4 ec2net: [rewrite_aliases] Rewriting aliases
> >> of eth0
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 400
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:13 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:13 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 800
> >> Nov 29 19:32:13 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:13 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:13 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:14 localhost zookeeper: INFO 
> >> [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop!
> >> Nov 29 19:32:14 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 1600
> >> Nov 29 19:32:14 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:15 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 3200
> >> Nov 29 19:32:15 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:18 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 6400
> >> Nov 29 19:32:18 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:23 zookeeper-4 dhclient[2295]: XMT: Solicit on eth0,
> >> interval 125280ms.
> >> Nov 29 19:32:25 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 12800
> >> Nov 29 19:32:25 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:38 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 25600
> >> Nov 29 19:32:38 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:33:03 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 51200
> >> Nov 29 19:33:03 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:33:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:33:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:34:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:34:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:35:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:35:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:36:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:36:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:37:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:37:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:38:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:38:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: WARN 
> >> [RecvWorker:5:QuorumCnxManager$RecvWorker@1028] - Connection broken for
> >> id 5, my id = 4, error = 
> >> Nov 29 19:39:45 localhost zookeeper: WARN 
> >> [RecvWorker:5:QuorumCnxManager$RecvWorker@1031] - Interrupting SendWorker
> >> Nov 29 19:39:45 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: last message repeated 7 times
> >> Nov 29 19:39:45 localhost zookeeper: WARN 
> >> [SendWorker:5:QuorumCnxManager$SendWorker@949] - Exception when using
> >> channel: for id 5 my id = 4 error = java.net.SocketException: Socket
> >> closed
> >> Nov 29 19:39:45 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: last message repeated 2 times
> >> Nov 29 19:39:45 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 localhost zookeeper: WARN 
> >> [SendWorker:5:QuorumCnxManager$SendWorker@954] - Send worker leaving
> >> thread
> >> Nov 29 19:39:47 localhost zookeeper: INFO 
> >> [zookeeper-4-internal.prod.airtime.com/10.1.72.145:3888:QuorumCnxManager$Listener@746]
> >> - Received connection request /10.1.48.67:58398
> >> Nov 29 19:39:47 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x1
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:47 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:47 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> >> (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:47 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:47 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=4]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:51 zookeeper-4 monit[7081]: restart service
> >> 'zookeeper-server' on user request
> >> 
> >> 
> >> And here are logs from one of the followers:
> >> 
> >> Nov 29 19:32:11 localhost zookeeper: INFO 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating
> >> client: 0x15fc2e9562e0187
> >> Nov 29 19:32:12 localhost zookeeper: WARN 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> >> following the leader
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed
> >> socket connection for client /10.1.29.33:57934 which had sessionid
> >> 0x35f3a424dc802a2
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting
> >> down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting
> >> down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [FollowerRequestProcessor:5:FollowerRequestProcessor@97] -
> >> FollowerRequestProcessor exited loop!
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] -
> >> shutdown of request processor complete
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [CommitProcessor:5:CommitProcessor@153] - CommitProcessor exited loop!
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] -
> >> Shutting down
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [SyncThread:5:SyncRequestProcessor@186] - SyncRequestProcessor exited!
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New
> >> election. My id =  5, proposed zxid=0x2b0260b5a8
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:12 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 400
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 800
> >> Nov 29 19:32:12 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:13 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 1600
> >> Nov 29 19:32:13 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:15 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 3200
> >> Nov 29 19:32:15 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:18 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 6400
> >> Nov 29 19:32:18 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:24 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 12800
> >> Nov 29 19:32:24 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:32:37 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 25600
> >> Nov 29 19:32:37 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:33:03 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 51200
> >> Nov 29 19:33:03 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:33:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:33:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:34:42 localhost zookeeper: WARN 
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught
> >> end of stream exception
> >> Nov 29 19:34:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:34:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:35:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:35:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:36:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:36:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:37:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:37:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:38:54 localhost zookeeper: INFO 
> >> [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] -
> >> Notification time out: 60000
> >> Nov 29 19:38:54 localhost zookeeper: INFO 
> >> [WorkerReceiver[myid=5]:FastLeaderElection@600] - Notification: 1
> >> (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40
> >> (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my
> >> state)
> >> Nov 29 19:39:45 zookeeper-5 monit[7157]: restart service
> >> 'zookeeper-server' on user request
> >> 
> >> 
> 

Mime
View raw message