From dev-return-76575-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Tue Dec 11 07:39:10 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6914D180671 for ; Tue, 11 Dec 2018 07:39:09 +0100 (CET) Received: (qmail 44532 invoked by uid 500); 11 Dec 2018 06:39:08 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 44514 invoked by uid 99); 11 Dec 2018 06:39:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2018 06:39:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5F248180A7D for ; Tue, 11 Dec 2018 06:39:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.798 X-Spam-Level: * X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id JKF0H85zX26e for ; Tue, 11 Dec 2018 06:39:05 +0000 (UTC) Received: from mail-it1-f169.google.com (mail-it1-f169.google.com [209.85.166.169]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 5924D5F52B for ; Tue, 11 Dec 2018 06:30:20 +0000 (UTC) Received: by mail-it1-f169.google.com with SMTP id x19so1934810itl.1 for ; Mon, 10 Dec 2018 22:30:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=lZJLYVvfOVB4hja/MI7uBTs3QHT6kPZIGF0RC+Ucq8M=; b=lGbR6uMr2RMQFGhsphn4iMVSZQLI8sDBdt3jQO2nvFZZZwyWNTIc15wWr/o5s0GMkZ iUJ/0p2EPQb5/9Q40vTw3eZNCPAJWpFCaG+UdT4m6cJv/MWyQM7INcedYzXyaMlmQFaH 0NbwqLgdutO0Mqlut/pChzjulKdN2w86zaN5tVz8fPs7xr4y7SSeczOnbNXTdtPCc8mS Q35kJdSfzlNOTmfYYWDCkw8YHEaWjOoqF+eE7uC+yjV3CdIiqBeY+b8JmdOajwn6cJjG xRQgVQ6KV11XgGjLmSaRCINvB6y8w81C5FfBgYhO8j7KnjcTp3dp1fIEqTfdKR2xvSLV j7+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=lZJLYVvfOVB4hja/MI7uBTs3QHT6kPZIGF0RC+Ucq8M=; b=ihPnW+JBHJIDGH74n4i6lFVza9LwzkdsZAyoceZbb/AjH5N7DJiuwAUmjzq7fh4mMa iWdo/xRtPnqy9pvalktCpZGwcu1esM7ZNwHQ7YpkGMGJAregcHHTfiHZBzAwndATS8jz zScvwXjhds853Mw/2oe6k+825I06IcE7is1SdKGDjSDLLkBVZC6utRlmjicHZHGQ5UIh 3pFni0JWJGc7wBEJ3Hw62c4dKA2tiY7/9dPKzdTwhckVZwouQ/FJ2OJhZ8/Ot2FbqOnk f9T1CWZc/HYO0VXwyJ8rngLVnRVohMKDnMFYXNj/jCQLBCHmd1oznXZCZ3+pobccpYFT 2l5A== X-Gm-Message-State: AA+aEWa31wbFAQNGsLFJPSvR2bxM/leGRbj7SyGQFRNEbYh04Ak1EPmQ dPdbKb4S6JcbfiZiv2IDsr15HBlRe1AJkoFi71xb8w== X-Google-Smtp-Source: AFSGD/XVRLVSxxTmJE/Aa/Ri44Cdy68IL1Apm4Qpq4C7KdsNMwyNtE8DIb06/P7nGiciK0jrUsr5hNepoqewCbmU7PY= X-Received: by 2002:a02:b424:: with SMTP id i33mr15285135jaj.37.1544509818589; Mon, 10 Dec 2018 22:30:18 -0800 (PST) MIME-Version: 1.0 References: <2AB495FA-0239-4293-94AB-F9F6CC425BEA@jordanzimmerman.com> In-Reply-To: From: Michael Borokhovich Date: Mon, 10 Dec 2018 22:29:52 -0800 Message-ID: Subject: Re: Leader election To: dev@zookeeper.apache.org Content-Type: multipart/alternative; boundary="00000000000016d647057cb93700" --00000000000016d647057cb93700 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Yes, I agree, our system should be able to tolerate two leaders for a short and bounded period of time. Thank you for the help! On Thu, Dec 6, 2018 at 11:09 AM Jordan Zimmerman wrote: > > it seems like the > > inconsistency may be caused by the partition of the Zookeeper cluster > > itself > > Yes - there are many ways in which you can end up with 2 leaders. However= , > if properly tuned and configured, it will be for a few seconds at most. > During a GC pause no work is being done anyway. But, this stuff is very > tricky. Requiring an atomically unique leader is actually a design smell > and you should reconsider your architecture. > > > Maybe we can use a more > > lightweight Hazelcast for example? > > There is no distributed system that can guarantee a single leader. Instea= d > you need to adjust your design and algorithms to deal with this (using > optimistic locking, etc.). > > -Jordan > > > On Dec 6, 2018, at 1:52 PM, Michael Borokhovich > wrote: > > > > Thanks Jordan, > > > > Yes, I will try Curator. > > Also, beyond the problem described in the Tech Note, it seems like the > > inconsistency may be caused by the partition of the Zookeeper cluster > > itself. E.g., if a "leader" client is connected to the partitioned ZK > node, > > it may be notified not at the same time as the other clients connected = to > > the other ZK nodes. So, another client may take leadership while the > > current leader still unaware of the change. Is it true? > > > > Another follow up question. If Zookeeper can guarantee a single leader, > is > > it worth using it just for leader election? Maybe we can use a more > > lightweight Hazelcast for example? > > > > Michael. > > > > > > On Thu, Dec 6, 2018 at 4:50 AM Jordan Zimmerman < > jordan@jordanzimmerman.com> > > wrote: > > > >> It is not possible to achieve the level of consistency you're after in > an > >> eventually consistent system such as ZooKeeper. There will always be a= n > >> edge case where two ZooKeeper clients will believe they are leaders > (though > >> for a short period of time). In terms of how it affects Apache Curator= , > we > >> have this Tech Note on the subject: > >> https://cwiki.apache.org/confluence/display/CURATOR/TN10 < > >> https://cwiki.apache.org/confluence/display/CURATOR/TN10> (the > >> description is true for any ZooKeeper client, not just Curator > clients). If > >> you do still intend to use a ZooKeeper lock/leader I suggest you try > Apache > >> Curator as writing these "recipes" is not trivial and have many gotcha= s > >> that aren't obvious. > >> > >> -Jordan > >> > >> http://curator.apache.org > >> > >> > >>> On Dec 5, 2018, at 6:20 PM, Michael Borokhovich > >> wrote: > >>> > >>> Hello, > >>> > >>> We have a service that runs on 3 hosts for high availability. However= , > at > >>> any given time, exactly one instance must be active. So, we are > thinking > >> to > >>> use Leader election using Zookeeper. > >>> To this goal, on each service host we also start a ZK server, so we > have > >> a > >>> 3-nodes ZK cluster and each service instance is a client to its > dedicated > >>> ZK server. > >>> Then, we implement a leader election on top of Zookeeper using a basi= c > >>> recipe: > >>> https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_leaderElectio= n > . > >>> > >>> I have the following questions doubts regarding the approach: > >>> > >>> 1. It seems like we can run into inconsistency issues when network > >>> partition occurs. Zookeeper documentation says that the inconsistency > >>> period may last =E2=80=9Ctens of seconds=E2=80=9D. Am I understanding= correctly that > >> during > >>> this time we may have 0 or 2 leaders? > >>> 2. Is it possible to reduce this inconsistency time (let's say to 3 > >>> seconds) by tweaking tickTime and syncLimit parameters? > >>> 3. Is there a way to guarantee exactly one leader all the time? Shoul= d > we > >>> implement a more complex leader election algorithm than the one > suggested > >>> in the recipe (using ephemeral_sequential nodes)? > >>> > >>> Thanks, > >>> Michael. > >> > >> > > --00000000000016d647057cb93700--