Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8CB2B200D5A for ; Thu, 30 Nov 2017 00:06:23 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 89377160C16; Wed, 29 Nov 2017 23:06:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 346B4160C04 for ; Thu, 30 Nov 2017 00:06:22 +0100 (CET) Received: (qmail 24164 invoked by uid 500); 29 Nov 2017 23:06:21 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 24152 invoked by uid 99); 29 Nov 2017 23:06:20 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Nov 2017 23:06:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 3F81DC3D7A for ; Wed, 29 Nov 2017 23:06:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.29 X-Spam-Level: ** X-Spam-Status: No, score=2.29 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, KAM_BADIPHTTP=2, KB_WAM_FROM_NAME_SINGLEWORD=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIM_INVALID=0.01, URI_TRY_3LD=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=fail (1024-bit key) reason="fail (bad RSA signature)" header.d=airtime.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Gu2IqjCLjJYe for ; Wed, 29 Nov 2017 23:06:17 +0000 (UTC) Received: from mail-pl0-f51.google.com (mail-pl0-f51.google.com [209.85.160.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A1A185F4AC for ; Wed, 29 Nov 2017 23:06:16 +0000 (UTC) Received: by mail-pl0-f51.google.com with SMTP id d21so3014949pll.1 for ; Wed, 29 Nov 2017 15:06:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=airtime.com; s=airtime; h=from:content-transfer-encoding:mime-version:subject:message-id:date :to; bh=+FCk+xp3J3dteYdFJ2uY72ifY3CNGfwjZiSigSRYmGg=; b=LhimFu4OeqlQg82TTvcHVRZstvL/+Ex76LLn6j/bkYOk14+Ruj5Ot7JhPPEBVrXbXT DMPZfGBjdvAe2pKIx/z2tLYzQAIblh3hZXSt0M6bjVqh4Fjpyw37W0hx+dncgp2vGL1E +b2GgHrqf579LgGogCPHMjoLranqp/xK2VeKE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:message-id:date:to; bh=+FCk+xp3J3dteYdFJ2uY72ifY3CNGfwjZiSigSRYmGg=; b=Hb1pc/6KY7RRV+mkPIk10FpUHjFylkCTWABha/hLsNUwzAA9VaM0tRQNBc8sWXqy7n WNb79IDdtQk1l/jzBOplndy7jnPR2YuAqN/ccdpNpBidTtjTxsnKFBT3KjWQ3j0WeYp6 GuPD9P29C2w9F9Bs3oIKMidYejfM129UPIfXkij4Kxq6aJTuRg/jJkVsJ8nDXCsqdmcX BmOYOWCLI6tAH7N03aYwlF6l4Oqu9l+CtP2kX27XPS0yU2cflCJbtF2C+dPMJE13aIGF 3Kx0DkvLbhe0FW2NpoSBL5E0FWr9p5Yo1h5LtEn3irnIz4k8/lyymh8F4W3i0ThCXd+L vLnQ== X-Gm-Message-State: AJaThX6VSI0IWIJGVlbJnkAhUPv+suMOJIio18SdKq75zOZivimKmcI5 c/cCccPHqWWVGJwHORi0mF52cC+ETjk= X-Google-Smtp-Source: AGs4zMZ3A6b2ZH9nR450HIdgTOJ2ZmHbpZhYilbTwFka51sSaU7/BO1+Gg59g/vu+mEM6LJSnDVrTw== X-Received: by 10.84.130.104 with SMTP id 95mr472639plc.151.1511996769915; Wed, 29 Nov 2017 15:06:09 -0800 (PST) Received: from [192.168.128.39] ([66.201.54.106]) by smtp.gmail.com with ESMTPSA id j62sm4261478pgc.35.2017.11.29.15.06.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 29 Nov 2017 15:06:09 -0800 (PST) From: Jim Wong Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 11.1 \(3445.4.7\)) Subject: ZooKeeper Cluster Failed (and did not recover) Message-Id: <30C79B66-AF60-4A0B-8BC5-769C5963DE51@airtime.com> Date: Wed, 29 Nov 2017 15:06:08 -0800 To: user@zookeeper.apache.org X-Mailer: Apple Mail (2.3445.4.7) archived-at: Wed, 29 Nov 2017 23:06:23 -0000 Hey, folks. We=E2=80=99re running a 5-node ZooKeeper cluster to manage = server configuration and discovery. This morning, we experienced an = outage in which the entire cluster went offline for approximately 7 = minutes. Restarting the ZooKeeper daemons on each node in the cluster = got things working again, so there doesn=E2=80=99t appear to have been = an underlying network or hardware fault that would explain the = cluster=E2=80=99s inability to recover. This is actually the second time this has happened in the last several = months, and all of our attempts to reproduce the behavior by = intentionally doing bad things to the cluster or the leader have failed. With all that in mind, we were wondering if anyone had any thoughts as = to what might be going on, as we=E2=80=99re out of ideas. If anyone can = provide any hints or guidance, we=E2=80=99d be very grateful. I=E2=80=99ve included brief excerpts from logs from some of the nodes; = the problem started around 17:32:12 and wasn=E2=80=99t resolved until = 17:39, after we=E2=80=99d restarted ZooKeeper. This was was originally = the leader: Nov 29 19:31:44 localhost zookeeper: INFO [ProcessThread(sid:4 = cport:-1)::PrepRequestProcessor@486] - Processed session termination for = sessionid: 0x15fdd1d1e0c0313 Nov 29 19:32:11 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:32:11 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:32:11 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:32:12 localhost zookeeper: ERROR = [LearnerHandler-/10.1.29.76:49198:LearnerHandler@648] - Unexpected = exception causing shutdown while sock still open Nov 29 19:32:12 localhost zookeeper: WARN = [LearnerHandler-/10.1.29.76:49198:LearnerHandler@661] - ******* GOODBYE = /10.1.29.76:49198 ******** Nov 29 19:32:12 localhost zookeeper: ERROR = [LearnerHandler-/10.1.48.67:44882:LearnerHandler@648] - Unexpected = exception causing shutdown while sock still open Nov 29 19:32:12 localhost zookeeper: ERROR = [LearnerHandler-/10.1.0.252:50932:LearnerHandler@648] - Unexpected = exception causing shutdown while sock still open Nov 29 19:32:12 localhost zookeeper: WARN = [LearnerHandler-/10.1.0.252:50932:LearnerHandler@661] - ******* GOODBYE = /10.1.0.252:50932 ******** Nov 29 19:32:12 localhost zookeeper: WARN = [LearnerHandler-/10.1.48.67:44882:LearnerHandler@661] - ******* GOODBYE = /10.1.48.67:44882 ******** Nov 29 19:32:12 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my = state) Nov 29 19:32:12 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:32:12 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:32:12 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:32:12 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LEADING (my = state) Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:Leader@511] - Shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:Leader@517] - Shutdown called Nov 29 19:32:12 localhost java.lang.Exception: shutdown Leader! reason: = Not sufficient followers synced, only synced with sids: [ 3,4 ] Nov 29 19:32:12 localhost at = org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:517) Nov 29 19:32:12 localhost at = org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:492) Nov 29 19:32:12 localhost zookeeper: INFO = [LearnerCnxAcceptor-zookeeper-4-internal.prod.airtime.com/10.1.72.145:2888= :Leader$LearnerCnxAcceptor@331] - exception while shutting down = acceptor: java.net.SocketException: Socket closed Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed = socket connection for client /10.1.10.8:42938 which had sessionid = 0x45f3a424dd20208 Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - = shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] - = Shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@764] - = Shutting down Nov 29 19:32:12 localhost zookeeper: INFO [ProcessThread(sid:4 = cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited loop! Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88] = - Shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - = Shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProces= sor@676] - Shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - = shutdown of request processor complete Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - = Shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [SyncThread:4:SyncRequestProcessor@186] - SyncRequestProcessor exited! Nov 29 19:32:12 localhost zookeeper: INFO = [CommitProcessor:4:CommitProcessor@153] - CommitProcessor exited loop! Nov 29 19:32:12 localhost zookeeper: WARN = [LearnerHandler-/10.1.37.117:52956:LearnerHandler@661] - ******* GOODBYE = /10.1.37.117:52956 ******** Nov 29 19:32:12 localhost zookeeper: WARN = [LearnerHandler-/10.1.37.117:52956:LearnerHandler@673] - Ignoring = unexpected exception Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New = election. My id =3D 4, proposed zxid=3D0x2b0260b5a8 Nov 29 19:32:12 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPREQUEST on eth0 to = 10.1.64.1 port 67 (xid=3D0x79592ad4) Nov 29 19:32:12 zookeeper-4 dhclient[2170]: DHCPACK from 10.1.64.1 = (xid=3D0x79592ad4) Nov 29 19:32:12 zookeeper-4 dhclient[2170]: bound to 10.1.72.145 -- = renewal in 1608 seconds. Nov 29 19:32:12 zookeeper-4 ec2net: [get_meta] Trying to get = http://169.254.169.254/latest/meta-data/network/interfaces/macs/02:c0:01:e= 2:a6:b2/local-ipv4s Nov 29 19:32:12 zookeeper-4 ec2net: [rewrite_aliases] Rewriting aliases = of eth0 Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 400 Nov 29 19:32:12 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:13 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:32:13 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 800 Nov 29 19:32:13 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:13 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:32:13 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:32:14 localhost zookeeper: INFO = [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited = loop! Nov 29 19:32:14 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 1600 Nov 29 19:32:14 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:15 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 3200 Nov 29 19:32:15 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:18 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 6400 Nov 29 19:32:18 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:23 zookeeper-4 dhclient[2295]: XMT: Solicit on eth0, = interval 125280ms. Nov 29 19:32:25 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 12800 Nov 29 19:32:25 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:38 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 25600 Nov 29 19:32:38 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:33:03 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 51200 Nov 29 19:33:03 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:33:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:33:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:34:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:34:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:35:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:35:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:36:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:36:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:37:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:37:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:38:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:38:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 4 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:45 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:45 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:45 localhost zookeeper: WARN = [RecvWorker:5:QuorumCnxManager$RecvWorker@1028] - Connection broken for = id 5, my id =3D 4, error =3D=20 Nov 29 19:39:45 localhost zookeeper: WARN = [RecvWorker:5:QuorumCnxManager$RecvWorker@1031] - Interrupting = SendWorker Nov 29 19:39:45 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:45 localhost zookeeper: last message repeated 7 times Nov 29 19:39:45 localhost zookeeper: WARN = [SendWorker:5:QuorumCnxManager$SendWorker@949] - Exception when using = channel: for id 5 my id =3D 4 error =3D java.net.SocketException: Socket = closed Nov 29 19:39:45 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:45 localhost zookeeper: last message repeated 2 times Nov 29 19:39:45 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:45 localhost zookeeper: WARN = [SendWorker:5:QuorumCnxManager$SendWorker@954] - Send worker leaving = thread Nov 29 19:39:47 localhost zookeeper: INFO = [zookeeper-4-internal.prod.airtime.com/10.1.72.145:3888:QuorumCnxManager$L= istener@746] - Received connection request /10.1.48.67:58398 Nov 29 19:39:47 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x1 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:47 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:47 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 = (n.round), LOOKING (n.state), 4 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:47 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:47 localhost zookeeper: INFO = [WorkerReceiver[myid=3D4]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x41 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:51 zookeeper-4 monit[7081]: restart service = 'zookeeper-server' on user request And here are logs from one of the followers: Nov 29 19:32:11 localhost zookeeper: INFO = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating = client: 0x15fc2e9562e0187 Nov 29 19:32:12 localhost zookeeper: WARN = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when = following the leader Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown = called Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1044] - Closed = socket connection for client /10.1.29.33:57934 which had sessionid = 0x35f3a424dc802a2 Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@140] = - Shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@505] - = shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@107] = - Shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:CommitProcessor@184] - = Shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [FollowerRequestProcessor:5:FollowerRequestProcessor@97] - = FollowerRequestProcessor exited loop! Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@402] - = shutdown of request processor complete Nov 29 19:32:12 localhost zookeeper: INFO = [CommitProcessor:5:CommitProcessor@153] - CommitProcessor exited loop! Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@208] - = Shutting down Nov 29 19:32:12 localhost zookeeper: INFO = [SyncThread:5:SyncRequestProcessor@186] - SyncRequestProcessor exited! Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:QuorumPeer@865] - LOOKING Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@818] - New = election. My id =3D 5, proposed zxid=3D0x2b0260b5a8 Nov 29 19:32:12 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:12 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 400 Nov 29 19:32:12 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:12 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 800 Nov 29 19:32:12 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:13 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 1600 Nov 29 19:32:13 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:15 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 3200 Nov 29 19:32:15 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:18 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 6400 Nov 29 19:32:18 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:24 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 12800 Nov 29 19:32:24 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:32:37 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 25600 Nov 29 19:32:37 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:33:03 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 51200 Nov 29 19:33:03 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:33:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:33:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:34:42 localhost zookeeper: WARN = [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught = end of stream exception Nov 29 19:34:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:34:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:35:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:35:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:36:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:36:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:37:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:37:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:38:54 localhost zookeeper: INFO = [QuorumPeer[myid=3D5]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - = Notification time out: 60000 Nov 29 19:38:54 localhost zookeeper: INFO = [WorkerReceiver[myid=3D5]:FastLeaderElection@600] - Notification: 1 = (message format version), 5 (n.leader), 0x2b0260b5a8 (n.zxid), 0x40 = (n.round), LOOKING (n.state), 5 (n.sid), 0x2b (n.peerEpoch) LOOKING (my = state) Nov 29 19:39:45 zookeeper-5 monit[7157]: restart service = 'zookeeper-server' on user request