From user-return-12195-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Wed Oct 2 19:29:20 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 76F1F18064F for ; Wed, 2 Oct 2019 21:29:20 +0200 (CEST) Received: (qmail 2022 invoked by uid 500); 2 Oct 2019 19:29:18 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 2010 invoked by uid 99); 2 Oct 2019 19:29:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Oct 2019 19:29:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id F3479C02F8 for ; Wed, 2 Oct 2019 19:29:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.2 X-Spam-Level: X-Spam-Status: No, score=-0.2 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id NGPiRXqo1n7a for ; Wed, 2 Oct 2019 19:29:15 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::435; helo=mail-wr1-x435.google.com; envelope-from=jornfranke@gmail.com; receiver= Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 5BD7B7DE21 for ; Wed, 2 Oct 2019 19:29:15 +0000 (UTC) Received: by mail-wr1-x435.google.com with SMTP id o18so266867wrv.13 for ; Wed, 02 Oct 2019 12:29:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=content-transfer-encoding:from:mime-version:date:subject:message-id :references:in-reply-to:to; bh=t1kvlmPKHH46qx/eJfFhXZcBobrdSSVwFV8UszYh/AY=; b=nAvTdEwrKXt5PPHPQ/wHPVltNzbMeDupfNGTn8JE57KTcaiuNeFjFYaPx1dSLwDg/l Pk5tdkTNgqr9ZddljmQetFCyp2UdlN8NPFGvRuk6Iozpp6or+znJeq3MY9YAPnqGxtcg q8QSLg+H588muhOF46yNyPsYfCmMeBEjxKPaX46ZpSpV1nfexDjpaKQcpaa5xXzM1W4n oNMRFk6aElMYKbukKR/lcKJMnAGGbrLLbo4wVE368lE//dy0nFlQLONKhRVd3sYPTiWd Msc/ZC8Tkpf6zZG3J07NH4/xwmGFFX+L0g33m7MwzrptdM5/JRexBOWjk9TOcEan1TYd ey0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version:date :subject:message-id:references:in-reply-to:to; bh=t1kvlmPKHH46qx/eJfFhXZcBobrdSSVwFV8UszYh/AY=; b=mJmtZIh2zBT0UPxW438WD3DVZHht3brDotvN4vi5gz+cwVX4aYc53cccR4ktj8iwW0 B6fw8ekzK6LwWmOQBn8V0ZMBAkunMmGz4Q2MNaM9qMbhwV4WvPIcVipZL5otng3AZMVl LS7FkbzoR35Cq6eDi9yNpWX23q/ahQxaVId8iHXnQYZ0xea+IDU8NW8hbDgzsjtottiy 6/2mI5YPxD14pdTMgwCjW3aB7W8Z2/4cLaLMo+d4hvr3URgxbbYb2nVtVB96eBVII4sV wRRl82Sq7lYYMUqB14wrDQmtJrvHRTw4rxtLMXjLZEDZdk6htd013qHrCk3pUNMVKfvs VkiA== X-Gm-Message-State: APjAAAWh0NKBY83TFKR9FVM1Rf+JT0fBzLFwhcpgMwhOBobnoFXJO568 1r0m63noB3qaZ7uP7tS/EfDKQFn8 X-Google-Smtp-Source: APXvYqxjkkIVXuJPWlDSszLj4nYaWygq/RBhDEWh4PgDkuWgRI29rtl0EzFaC0bYwuuknKy0iYYb+w== X-Received: by 2002:a5d:42cf:: with SMTP id t15mr4226562wrr.64.1570044548355; Wed, 02 Oct 2019 12:29:08 -0700 (PDT) Received: from ?IPv6:2a01:598:80a1:5420:41ef:ee7a:7b86:c668? ([2a01:598:80a1:5420:41ef:ee7a:7b86:c668]) by smtp.gmail.com with ESMTPSA id v6sm242826wma.24.2019.10.02.12.29.06 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Oct 2019 12:29:06 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: =?utf-8?Q?J=C3=B6rn_Franke?= Mime-Version: 1.0 (1.0) Date: Wed, 2 Oct 2019 21:29:02 +0200 Subject: Re: One node crashing in 3.4.11 triggered a full ensemble restart Message-Id: References: In-Reply-To: To: user@zookeeper.apache.org X-Mailer: iPhone Mail (17A861) Have you tried to stop the node, delete the data and log directory, upgrade t= o 3.5.5 , start the node and wait until it is synchronized ? > Am 02.10.2019 um 20:14 schrieb Jerry Hebert : >=20 > =EF=BB=BFHi all, >=20 > My first post here! I'm hoping you all might be able to offer some guidanc= e > or redirect me to an existing ticket. We have a five node ensemble on > 3.4.11 that we're currently in the process of upgrading to 3.5.5. We > recently saw some bizarre behavior in our ensemble that I was hoping to > find some sort pre-existing ticket or discussion about but I was having > difficulty finding hits for this in Jira. >=20 > The behavior that we saw from our metrics is that one of our nodes (not > sure if it was a follower or a leader) started to demonstrate > instability (high CPU, high RAM) and it crashed. Not a big deal, but as > soon as it crashed, all of the other four nodes all immediately restarted,= > resulting in a short outage. One node crashing should never cause an > ensemble restart of course, so I assumed that this must be a bug in ZK. Th= e > nodes that restarted had no indication of errors in their logs, they just > simply restarted. Does this sound familiar to any of you? >=20 > Also, we are using Exhibitor on that ensemble so it's also possible that > the restart was caused by Exhibitor. >=20 > My hope is that this issue will be behind us once the 3.5.5 upgrade is > complete but I'd ideally like to find some concrete evidence of this. >=20 > Thanks! > Jerry