Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 64C60200D0A for ; Wed, 4 Oct 2017 20:17:52 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6305D1609DD; Wed, 4 Oct 2017 18:17:52 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A940A1609D6 for ; Wed, 4 Oct 2017 20:17:51 +0200 (CEST) Received: (qmail 26371 invoked by uid 500); 4 Oct 2017 18:17:50 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 26356 invoked by uid 99); 4 Oct 2017 18:17:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Oct 2017 18:17:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CC1F5C3F6E for ; Wed, 4 Oct 2017 18:17:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.099 X-Spam-Level: X-Spam-Status: No, score=0.099 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=avinetworks.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 6fuLG0awVpyk for ; Wed, 4 Oct 2017 18:17:45 +0000 (UTC) Received: from mail-wr0-f172.google.com (mail-wr0-f172.google.com [209.85.128.172]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 2D41D60D79 for ; Wed, 4 Oct 2017 18:17:44 +0000 (UTC) Received: by mail-wr0-f172.google.com with SMTP id l24so6335292wre.1 for ; Wed, 04 Oct 2017 11:17:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=avinetworks.com; s=google; h=mime-version:from:date:message-id:subject:to; bh=00Aw6iYRl026KC4L2gZqYYo1XF+wpEx0xLvydGF1Zwk=; b=UIG44P2XtbxoE8SeukqzqoCieSMy3Uu2OSYmfTgFVOLw6M3Tpet4F8HwqHcOic2kDB bTF6VpbCNznS5GKi9owGsDkb4mqNxcOc77PBS9K1xo/VLP1CusZFA2SQUbpDpTR5X+GF jnncEjXG5OH3HcOki/UACo3fwx7QliRwTTmL1PacBXEOG3gTxQGrJZEUJzhpSulBReKm ibNVHial4ZUIVnR5AQ/dxopUXNHBoYLLSCNBrvXaOmLfwQcezBWyDvCaz1AXggK9vtOr zj4ndtonAd+llkSDwXVICc4HM4zZSkoPQe+scxBi7CjqYSLY+tha1m19q7tsAT023R8E 1MEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=00Aw6iYRl026KC4L2gZqYYo1XF+wpEx0xLvydGF1Zwk=; b=MumlGYj0eqtuGtIZknD/xD1iqb30xodA1oR/B0/VSua9lB29k1m3Frk2cSVLUdbyD4 BqJZKQKxe9Yh+57l23glpniU1DHQJp5PUNuvpXgUPSEFAbwvgy5xk257H+U1RB8cu/jX s54VJHIUc+GdD3mE3NWtWrKxhya3x2I6obOCsHVSj7jHSscnTm4yR6dryFjCTEzKLPyr NYVR+jEfmpVeO4Frs9mKZgIB8YSdr9NgaiEkolMLARzEcaXFwT0RFywDo1oq2ywDJoQI VyJlqlbvGikDJqBIh6f3AOWE5pdi7kZFMmJv37Xy+zlILHRfolwnxfV8ZXQyHQhoFbYW X5xA== X-Gm-Message-State: AHPjjUhokitORtQyw3JHb2ybjBIafyNUP+FHePli67oGu+0phrWyb3gg H6QW7I7N5e4Wu2LFZn+oRlxWXhAxXdnz+zIs6g8CObBuotk= X-Google-Smtp-Source: AOwi7QDWDWmh3Jn7cH/spnSgSJPwqSlcHS5AsYdOjUfKDTBgnZzszy6a4uGlWQ8i/EsTI+nZtSudQGjXJIjn7cFxs6w= X-Received: by 10.223.159.77 with SMTP id f13mr22562760wrg.154.1507141058102; Wed, 04 Oct 2017 11:17:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.159.70 with HTTP; Wed, 4 Oct 2017 11:17:37 -0700 (PDT) From: Anand Parthasarathy Date: Wed, 4 Oct 2017 11:17:37 -0700 Message-ID: Subject: Zookeeper quorum goes down for no apparent reason in 3.4.5 To: UserZooKeeper Content-Type: multipart/mixed; boundary="089e0826d8f0661307055abc9fd3" archived-at: Wed, 04 Oct 2017 18:17:52 -0000 --089e0826d8f0661307055abc9fd3 Content-Type: multipart/alternative; boundary="089e0826d8f0661304055abc9fd1" --089e0826d8f0661304055abc9fd1 Content-Type: text/plain; charset="UTF-8" Hi, We have an issue with a 3-node zookeeper ensemble where the quorum goes down due to no apparent reason every once in a while. Here is what I see in the ZK leader: 2017-09-21 03:00:03,648 [myid:3] - INFO [QuorumPeer[myid=3]/127.0.0.1:5002 :Leader@493] - Shutting down 2017-09-21 03:00:03,648 [myid:3] - INFO [QuorumPeer[myid=3]/127.0.0.1:5002 :Leader@499] - Shutdown called java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced with sids: [ 3 ] at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:499) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:474) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:799) I have attached the logs from the 3 nodes around this time. Could you pls. help understand what the issue could be here. The only thing I see a little bit ahead of this timestamp is that all of them did a PurgeTask pretty much at the same time. Thanks, Anand. --089e0826d8f0661304055abc9fd1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

We have an issue with a 3-node zook= eeper ensemble where the quorum goes down due to no apparent reason every o= nce in a while. Here is what I see in the ZK leader:

2017-09-21 03:00:03,648 [myid:3] - INFO=C2=A0 [QuorumPeer[myid=3D3]= /127.0.0.1:5002:Leader@493] - Shutting down
2017-09-21 03:00:03,6= 48 [myid:3] - INFO=C2=A0 [QuorumPeer[myid=3D3]/127.0.0.1:5002:Leader@499] -= Shutdown called
java.lang.Exception: shutdown Leader! reason: No= t sufficient followers synced, only synced with sids: [ 3 ]
=C2= =A0 =C2=A0 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.jav= a:499)
=C2=A0 =C2=A0 at org.apache.zookeeper.server.quorum.Leader= .lead(Leader.java:474)
=C2=A0 =C2=A0 at org.apache.zookeeper.serv= er.quorum.QuorumPeer.run(QuorumPeer.java:799)

I have attached the logs from the 3 nodes around this time. Could you pls= . help understand what the issue could be here. The only thing I see a litt= le bit ahead of this timestamp is that all of them did a PurgeTask pretty m= uch at the same time.=C2=A0

Thanks,
Anan= d.
--089e0826d8f0661304055abc9fd1-- --089e0826d8f0661307055abc9fd3--