From user-return-12091-apmail-zookeeper-user-archive=zookeeper.apache.org@zookeeper.apache.org Wed Aug 21 18:43:19 2019 Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by minotaur.apache.org (Postfix) with SMTP id D98EA19267 for ; Wed, 21 Aug 2019 18:43:18 +0000 (UTC) Received: (qmail 18072 invoked by uid 500); 21 Aug 2019 18:43:16 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 18016 invoked by uid 500); 21 Aug 2019 18:43:16 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 17984 invoked by uid 99); 21 Aug 2019 18:43:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2019 18:43:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 4D6F3C2F3C for ; Wed, 21 Aug 2019 18:34:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.199 X-Spam-Level: X-Spam-Status: No, score=-0.199 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=oati.net Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id FWouYkUs11VK for ; Wed, 21 Aug 2019 18:34:43 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=216.234.93.232; helo=mxp2.oati.com; envelope-from=kathryn.hogg@oati.net; receiver= Received: from mxp2.oati.com (mxp2.oati.com [216.234.93.232]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 4AD50BDEA4 for ; Wed, 21 Aug 2019 18:34:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=oati.net; s=dkim1; c=relaxed/relaxed; q=dns/txt; i=@oati.net; t=1566412476; x=1569004476; h=From:Sender:Reply-To:Subject:Date:Message-ID:To:Cc:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=nZZrZn/kPGpqb78jtbso5D5piRKPEcdW6+GpFmkunqs=; b=A6qx1dYtl3SXIMG/5mbh9h0JMTJ28c1T18k7+2+r9Y2AGTtJgyfc54YSSDhOTmoV eVzJWBuvDhu1VsXCMknyexRJiqYfXRQNHQljLoQL3MaYMUZq7dPE7u2T6NQHWFYZ JJCXogRA/JDH1KZqkRZA96ndEqH5fof+hwrlbG/P+VmxwHDWHwSiOWvUcc/S4xqe EjvqAaJMauiHoVDYn/JC9ZSHgnUgAyx4udkjVy4oCQdGiAgzjWYO7ka/VvDpMYVG TGGAs+sZJ28sqOqWKrn5I0EJfoDaGRXwIaDcmmIYFwmO4VgvzAlPMzMjCGvHBdmN ndNXIQJfLbN9gfrORDghTw==; X-AuditID: d8ea5de8-a0fff700000071a0-87-5d5d8ebc24f8 Received: from EXNODEM1.dev.oati.local (Unknown_Domain [174.141.243.18]) by mxp2.oati.com (OATI Messaging Gateway) with SMTP id 78.AB.29088.CBE8D5D5; Wed, 21 Aug 2019 13:34:36 -0500 (CDT) Received: from EXNODEM1.dev.oati.local (10.100.215.10) by EXNODEM1.dev.oati.local (10.100.215.10) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Wed, 21 Aug 2019 13:34:35 -0500 Received: from EXNODEM1.dev.oati.local ([fe80::6426:ec1c:55c3:e680]) by EXNODEM1.dev.oati.local ([fe80::6426:ec1c:55c3:e680%26]) with mapi id 15.00.1395.000; Wed, 21 Aug 2019 13:34:35 -0500 From: Kathryn Hogg To: "user@zookeeper.apache.org" Subject: RE: About ZooKeeper Dynamic Reconfiguration Thread-Topic: About ZooKeeper Dynamic Reconfiguration Thread-Index: AdVYBYe5SlWxT5YzSkm3z/Hhr2pKCAAUv6GAAABzMAAAAO6RgAAGe/2AAAp1gtA= Date: Wed, 21 Aug 2019 18:34:35 +0000 Message-ID: <2cbe0f3ef40b4c0d9aeb779f9acc50a9@EXNODEM1.dev.oati.local> References: <16cb4ac9140.27db.495a588ebf64bb63541fbe4ec3b29808@gmail.com> <16cb56efac8.27db.495a588ebf64bb63541fbe4ec3b29808@gmail.com> In-Reply-To: <16cb56efac8.27db.495a588ebf64bb63541fbe4ec3b29808@gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.100.197.198] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrBLMWRmVeSWpSXmKPExsWyrvezkO6evthYg94bchY/l65ncWD0OLSw kyWAMYrLJiU1J7MstUjfLoErY96ac0wFPxQrFjddZG5g3CvdxcjJISFgIrF/yWXWLkYuDiGB vYwS/zovsYIkhAS2MUp86jaASBxklOh/N5cdJMEmoCVx78o8NhBbRMBaYvexLWBxYaBJzTOu MkHETSW6Hs9hhrD9JKbubAazWQRUJdr3LGQEsXkF3CRazt9ihFjwiknifesbsKGcAl4SB87u BxvKKCAm8f3UGrChzALiEreezGeCOFtAYsme88wQtqjEy8f/WCFsA4mtS/exQNhKEucadrBA 9OpILNj9iQ3C1pZYtvA1M8QRghInZz5hmcAoNgvJillIWmYhaZmFpGUBI8sqRt7cigIjvfzE kky95PzcTYzA+LjxKvbFDsa3H+wPMQpwMCrx8E7QjY0VYk0sK67MPcQowcGsJMJbMScqVog3 JbGyKrUoP76oNCe1+BCjNAeLkjivSLtvrJBAemJJanZqakFqEUyWiYNTqoExT3jh+S/hIfl6 L9SfXPcPTn3NXj4za2Pv7bvL19bUvAg/5Fz9S0H6zsLHXrxnso5xXz2QmPJAQXLakwmnBd1m Hw2+ZzODsTv7+jEm+5nGjbmC4j9WB4eYyjGdmWDT4FGiN2XDUemJXqvmZ3z8e/Tx+olaCY1L Nc9e7sq6Kf7zTohzYPGDP8tMlViKMxINtZiLihMBcGin54sCAAA= At my organization we solve that by running a 3rd site as mentioned in anot= her email. We run a 5 node ensemble with 2 nodes in each primary data cent= er and 1 node in the co-location facility. We try to minimize usage of the= 5th node so we explicitly exclude it from our clients' connection string. This way, if there is a network partition between datacenters, which ever o= ne can still talk to the node at the 3rd datacenter will maintain quorum. Ideally, if it was possible, we'd somehow like the node at the third datace= nter to never be elected as the leader and even better if there was some wa= y for it to be a voting member only and not bear any data (similar to mongo= db's arbiter). -----Original Message----- From: Cee Tee [mailto:c.turksema@gmail.com]=20 Sent: Wednesday, August 21, 2019 1:27 PM To: Alexander Shraer Cc: user@zookeeper.apache.org Subject: Re: About ZooKeeper Dynamic Reconfiguration {External email message: This email is from an external source. Please exer= cise caution prior to opening attachments, clicking on links, or providing = any sensitive information.} Yes, one side loses quorum and the other remains active. However we activel= y control which side that is, because our main application is active/passiv= e with 2 datacenters. We need Zookeeper to remain active in the application= s active datacenter. On 21 August 2019 17:22:00 Alexander Shraer wrote: > That's great! Thanks for sharing. > > >> Added benefit is that we can also control which data center gets the=20 >> quorum in case of a network outage between the two. > > > Can you explain how this works? In case of a network outage between=20 > two DCs, one of them has a quorum of participants and the other doesn't. > The participants in the smaller set should not be operational at this=20 > time, since they can't get quorum. no ? > > > > Thanks, > Alex > > > On Wed, Aug 21, 2019 at 7:55 AM Cee Tee wrote: > > We have solved this by implementing a 'zookeeper cluster balancer', it=20 > calls the admin server api of each zookeeper to get the current status=20 > and will issue dynamic reconfigure commands to change dead servers=20 > into observers so the quorum is not in danger. Once the dead servers=20 > reconnect, they take the observer role and are then reconfigured into par= ticipants again. > > Added benefit is that we can also control which data center gets the=20 > quorum in case of a network outage between the two. > Regards > Chris > > On 21 August 2019 16:42:37 Alexander Shraer wrote: > >> Hi, >> >> Reconfiguration, as implemented, is not automatic. In your case, when=20 >> failures happen, this doesn't change the ensemble membership. >> When 2 of 5 fail, this is still a minority, so everything should work=20 >> normally, you just won't be able to handle an additional failure. If=20 >> you'd like to remove them from the ensemble, you need to issue an=20 >> explicit reconfiguration command to do so. >> >> Please see details in the manual: >> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html >> >> Alex >> >> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei wrote: >> >>> Hi >>> I encounter a problem which blocks my development of load balance=20 >>> using ZooKeeper 3.5.5. >>> Actually, I have a ZooKeeper cluster which comprises of five zk=20 >>> servers. And the dynamic configuration file is as follows: >>> >>> server.1=3Dzk1:2888:3888:participant;0.0.0.0:2181 >>> server.2=3Dzk2:2888:3888:participant;0.0.0.0:2181 >>> server.3=3Dzk3:2888:3888:participant;0.0.0.0:2181 >>> server.4=3Dzk4:2888:3888:participant;0.0.0.0:2181 >>> server.5=3Dzk5:2888:3888:participant;0.0.0.0:2181 >>> >>> The zk cluster can work fine if every member works normally.=20 >>> However, if say two of them are suddenly down without previously=20 >>> being notified, the dynamic configuration file shown above will not=20 >>> be synchronized dynamically, which leads to the zk cluster fail to work= normally. >>> I think this is a very common case which may happen at any time.=20 >>> If so, how can we resolve it? >>> Really look forward to hearing from you! >>> Thanks >>>