Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 64A43200D5A for ; Thu, 30 Nov 2017 01:52:40 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 62F9C160C16; Thu, 30 Nov 2017 00:52:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0B568160C04 for ; Thu, 30 Nov 2017 01:52:38 +0100 (CET) Received: (qmail 96102 invoked by uid 500); 30 Nov 2017 00:52:37 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 96091 invoked by uid 99); 30 Nov 2017 00:52:37 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Nov 2017 00:52:37 +0000 Received: from auth2-smtp.messagingengine.com (auth2-smtp.messagingengine.com [66.111.4.228]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id EEB981A008C for ; Thu, 30 Nov 2017 00:52:36 +0000 (UTC) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 6E41A20A79 for ; Wed, 29 Nov 2017 19:52:34 -0500 (EST) Received: from web5 ([10.202.2.215]) by compute6.internal (MEProxy); Wed, 29 Nov 2017 19:52:34 -0500 X-ME-Sender: Received: by mailuser.nyi.internal (Postfix, from userid 99) id 3B9F89E0C4; Wed, 29 Nov 2017 19:52:34 -0500 (EST) Message-Id: <1512003154.382502.1188840616.0573410E@webmail.messagingengine.com> From: Abraham Fine To: user@zookeeper.apache.org MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-Mailer: MessagingEngine.com Webmail Interface - ajax-a169161c Date: Wed, 29 Nov 2017 19:52:34 -0500 References: <30C79B66-AF60-4A0B-8BC5-769C5963DE51@airtime.com> Subject: Re: ZooKeeper Cluster Failed (and did not recover) In-Reply-To: <30C79B66-AF60-4A0B-8BC5-769C5963DE51@airtime.com> archived-at: Thu, 30 Nov 2017 00:52:40 -0000 Hi Jim- What version of ZooKeeper are you using? It would be great to get a better understanding of what is causing: Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 3,4 ] Nov 29 19:32:12 localhost at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517) Nov 29 19:32:12 localhost at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492) This failure appears to align with: Nov 29 19:32:12 localhost zookeeper: WARN=20 [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader Is there any possibility debug logs could be enabled? Thanks, Abe Would you be able to turn on debug logging, I think that might give us much more visibility? On Wed, Nov 29, 2017, at 18:06, Jim Wong wrote: > Hey, folks. We=E2=80=99re running a 5-node ZooKeeper cluster to manage s= erver > configuration and discovery. This morning, we experienced an outage in > which the entire cluster went offline for approximately 7 minutes.=20 > Restarting the ZooKeeper daemons on each node in the cluster got things > working again, so there doesn=E2=80=99t appear to have been an underlying= network > or hardware fault that would explain the cluster=E2=80=99s inability to r= ecover. >=20 > This is actually the second time this has happened in the last several > months, and all of our attempts to reproduce the behavior by > intentionally doing bad things to the cluster or the leader have failed. >=20 > With all that in mind, we were wondering if anyone had any thoughts as to > what might be going on, as we=E2=80=99re out of ideas. If anyone can pro= vide any > hints or guidance, we=E2=80=99d be very grateful. >=20 > I=E2=80=99ve included brief excerpts from logs from some of the nodes; the > problem started around 17:32:12 and wasn=E2=80=99t resolved until 17:39, = after > we=E2=80=99d restarted ZooKeeper. This was was originally the leader: >=20 > Nov 29 19:31:44 localhost zookeeper: INFO [ProcessThread(sid:4 > cport:-1)::PrepRequestProcessor@486] - Processed session termination for > sessionid: 0x15fdd1d1e0c0313 > Nov 29 19:32:11 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:32:11 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:32:11 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:32:12 localhost zookeeper: ERROR > [LearnerHandler-/10.1.29.76:49198:LearnerHandler@648] - Unexpected > exception causing shutdown while sock still open > Nov 29 19:32:12 localhost zookeeper: WARN=20 > [LearnerHandler-/10.1.29.76:49198:LearnerHandler@661] - ******* GOODBYE > /10.1.29.76:49198 ******** > Nov 29 19:32:12 localhost zookeeper: ERROR > [LearnerHandler-/10.1.48.67:44882:LearnerHandler@648] - Unexpected > exception causing shutdown while sock still open > Nov 29 19:32:12 localhost zookeeper: ERROR > [LearnerHandler-/10.1.0.252:50932:LearnerHandler@648] - Unexpected > exception causing shutdown while sock still open > Nov 29 19:32:12 localhost zookeeper: WARN=20 > [LearnerHandler-/10.1.0.252:50932:LearnerHandler@661] - ******* GOODBYE > /10.1.0.252:50932 ******** > Nov 29 19:32:12 localhost zookeeper: WARN=20 > [LearnerHandler-/10.1.48.67:44882:LearnerHandler@661] - ******* GOODBYE > /10.1.48.67:44882 ******** > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my > state) > Nov 29 19:32:12 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:32:12 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:32:12 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my > state) > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:Leader@511] - Shutting down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:Leader@517] - Shutdown called > Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason: > Not sufficient followers synced, only synced with sids: [ 3,4 ] > Nov 29 19:32:12 localhost at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517) > Nov 29 19:32:12 localhost at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492) > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [LearnerCnxAcceptor-zookeeper-4-internal.prod.airtime.com/10.1.72.145:288= 8:Leader$LearnerCnxAcceptor@331] > - exception while shutting down acceptor: java.net.SocketException: > Socket closed > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed > socket connection for client /10.1.10.8:42938 which had sessionid > 0x45f3a424dd20208 > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting > down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] - > Shutting down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@764] - > Shutting down > Nov 29 19:32:12 localhost zookeeper: INFO [ProcessThread(sid:4 > cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop! > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88] - > Shutting down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting > down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProce= ssor@676] > - Shutting down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - > shutdown of request processor complete > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - > Shutting down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [SyncThread:4:SyncRequestProcessor@186] - SyncRequestProcessor exited! > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [CommitProcessor:4:CommitProcessor@153] - CommitProcessor exited loop! > Nov 29 19:32:12 localhost zookeeper: WARN=20 > [LearnerHandler-/10.1.37.117:52956:LearnerHandler@661] - ******* GOODBYE > /10.1.37.117:52956 ******** > Nov 29 19:32:12 localhost zookeeper: WARN=20 > [LearnerHandler-/10.1.37.117:52956:LearnerHandler@673] - Ignoring > unexpected exception > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New > election. My id =3D 4, proposed zxid=3D0x2b0260b5a8 > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPREQUEST on eth0 to > 10.1.64.1 port 67 (xid=3D0x79592ad4) > Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPACK from 10.1.64.1 > (xid=3D0x79592ad4) > Nov 29 19:32:12 zookeeper-4 dhclient[2170]: bound to 10.1.72.145 -- > renewal in 1608 seconds. > Nov 29 19:32:12 zookeeper-4 ec2net: [get_meta] Trying to get > http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:c0:01:= e2:a6:b2/local-ipv4s > Nov 29 19:32:12 zookeeper-4 ec2net: [rewrite_aliases] Rewriting aliases > of eth0 > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 400 > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:13 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:32:13 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 800 > Nov 29 19:32:13 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:13 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:32:13 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:32:14 localhost zookeeper: INFO=20 > [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited loop! > Nov 29 19:32:14 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 1600 > Nov 29 19:32:14 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:15 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 3200 > Nov 29 19:32:15 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:18 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 6400 > Nov 29 19:32:18 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:23 zookeeper-4 dhclient[2295]: XMT: Solicit on eth0, > interval 125280ms. > Nov 29 19:32:25 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 12800 > Nov 29 19:32:25 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:38 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 25600 > Nov 29 19:32:38 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:33:03 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 51200 > Nov 29 19:33:03 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:33:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:33:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:34:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:34:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:35:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:35:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:36:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:36:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:37:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:37:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:38:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:38:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:45 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:45 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:45 localhost zookeeper: WARN=20 > [RecvWorker:5:QuorumCnxManager$RecvWorker@1028] - Connection broken for > id 5, my id =3D 4, error =3D=20 > Nov 29 19:39:45 localhost zookeeper: WARN=20 > [RecvWorker:5:QuorumCnxManager$RecvWorker@1031] - Interrupting SendWorker > Nov 29 19:39:45 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:45 localhost zookeeper: last message repeated 7 times > Nov 29 19:39:45 localhost zookeeper: WARN=20 > [SendWorker:5:QuorumCnxManager$SendWorker@949] - Exception when using > channel: for id 5 my id =3D 4 error =3D java.net.SocketException: Socket > closed > Nov 29 19:39:45 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:45 localhost zookeeper: last message repeated 2 times > Nov 29 19:39:45 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:45 localhost zookeeper: WARN=20 > [SendWorker:5:QuorumCnxManager$SendWorker@954] - Send worker leaving > thread > Nov 29 19:39:47 localhost zookeeper: INFO=20 > [zookeeper-4-internal.prod.airtime.com/10.1.72.145:3888:QuorumCnxManager$= Listener@746] > - Received connection request /10.1.48.67:58398 > Nov 29 19:39:47 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x1 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:47 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:47 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 > (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:47 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:47 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:51 zookeeper-4 monit[7081]: restart service > 'zookeeper-server' on user request >=20 >=20 > And here are logs from one of the followers: >=20 > Nov 29 19:32:11 localhost zookeeper: INFO=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating > client: 0x15fc2e9562e0187 > Nov 29 19:32:12 localhost zookeeper: WARN=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when > following the leader > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed > socket connection for client /10.1.29.33:57934 which had sessionid > 0x35f3a424dc802a2 > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] - > Shutting down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - shutting > down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] - > Shutting down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - Shutting > down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [FollowerRequestProcessor:5:FollowerRequestProcessor@97] - > FollowerRequestProcessor exited loop! > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - > shutdown of request processor complete > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [CommitProcessor:5:CommitProcessor@153] - CommitProcessor exited loop! > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - > Shutting down > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [SyncThread:5:SyncRequestProcessor@186] - SyncRequestProcessor exited! > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New > election. My id =3D 5, proposed zxid=3D0x2b0260b5a8 > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:12 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 400 > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 800 > Nov 29 19:32:12 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:13 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 1600 > Nov 29 19:32:13 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:15 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 3200 > Nov 29 19:32:15 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:18 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 6400 > Nov 29 19:32:18 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:24 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 12800 > Nov 29 19:32:24 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:32:37 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 25600 > Nov 29 19:32:37 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:33:03 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 51200 > Nov 29 19:33:03 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:33:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:33:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:34:42 localhost zookeeper: WARN=20 > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught > end of stream exception > Nov 29 19:34:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:34:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:35:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:35:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:36:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:36:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:37:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:37:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:38:54 localhost zookeeper: INFO=20 > [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - > Notification time out: 60000 > Nov 29 19:38:54 localhost zookeeper: INFO=20 > [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 > (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 > (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my > state) > Nov 29 19:39:45 zookeeper-5 monit[7157]: restart service > 'zookeeper-server' on user request >=20 >=20