From dev-return-76524-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Fri Dec 7 08:03:55 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id C5B1E18067A for ; Fri, 7 Dec 2018 08:03:54 +0100 (CET) Received: (qmail 78380 invoked by uid 500); 7 Dec 2018 07:03:53 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 78058 invoked by uid 99); 7 Dec 2018 07:03:52 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Dec 2018 07:03:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 888F0CC290 for ; Fri, 7 Dec 2018 07:03:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.797 X-Spam-Level: * X-Spam-Status: No, score=1.797 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id G4t_bZ98YfwY for ; Fri, 7 Dec 2018 07:03:50 +0000 (UTC) Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B676F60EAB for ; Fri, 7 Dec 2018 07:03:49 +0000 (UTC) Received: by mail-qk1-f175.google.com with SMTP id q70so1891678qkh.6 for ; Thu, 06 Dec 2018 23:03:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=K7SvcsJ9k+wixYlssIpbvpcBBpYuQDnK04igRdpnjuo=; b=QxLOxcp0TmRDGEEndU+uiLVU/FVr2pogY0ALtRZrqeIraltpgsSAoMQvbXtOl3Cb5x tmgSyYiQoFQrrOMQJr3FeEtfStvsLBQ0OuRGTstyEzuBXMQd68YtkHeIFjkKZ4rgKFlI +nTGGqBNBkEhsaeX9OIlH1Plv7yySvdtvAA0dz5+ZWp155Q+ULBG2HM0tMOsDgahGMwy apa1bUYb5VSGCb2TMzHXAc44x0VvNg/fHNLTfG557Hwr7xp5SK5lh7suLnXReyiJTaMB /HXcHM3yLltwRjdI/Iby4+S7JOnciW4yi103/htBX+vVkLN/mcoMOyKTmj0KuoB7scEO uoNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=K7SvcsJ9k+wixYlssIpbvpcBBpYuQDnK04igRdpnjuo=; b=QMGzscVbTQuDxYtcMSbctm5s0M5CLEzAfCw3Wd6nQjaMqe5NyNU52t1BoDygOlqDl0 jRBvMVWFKkpVNO/0KZlGqgPChL5rCimB0ZNiWZTmgCD5W8AEnOXCoIuf0ozo4kJC+2Tz YqJAIVlzIsMSX/wJ53sr0l6dHjaCBKB9Psvuom6Xixrz/W0nBdYDBc0hs8N78uA0gxQM TaQBzVvuyhcqrRaXUdlsmn3BgUFWkmS0unSy8I4PCdlKyugWDykFXR25kCoTMD6WbhtZ XvHx8PbTjK8N5Lben1QU1ysKa1xTLpMYxUmVocn6EHtArot+pRWiFwOIJ9oBtm0rF6np PLww== X-Gm-Message-State: AA+aEWZ2R79nb+UGmqpTZA2vR2uL3U7AQ/pBvNQAXaWhASqE7s3B9GB9 jOsUWmSHrEdq5l3kVOZYgV3RxH1R6jrD6C9kJxJvH4A= X-Google-Smtp-Source: AFSGD/WWelDyz5gjztNYZEgMhPnATrxvuh4vkZHDSsbC8HdtbSzUFLz9BY9YgfScfTRWbPnux6cJLGvoG7fz8QtaHX4= X-Received: by 2002:a37:3409:: with SMTP id b9mr830242qka.337.1544166222276; Thu, 06 Dec 2018 23:03:42 -0800 (PST) MIME-Version: 1.0 References: <2AB495FA-0239-4293-94AB-F9F6CC425BEA@jordanzimmerman.com> <5464F3BD-71EE-497E-A72E-90BB9474D6EF@jordanzimmerman.com> In-Reply-To: From: Ted Dunning Date: Fri, 7 Dec 2018 16:03:13 +0900 Message-ID: Subject: Re: Leader election To: dev@zookeeper.apache.org Content-Type: multipart/alternative; boundary="000000000000272f3b057c693734" --000000000000272f3b057c693734 Content-Type: text/plain; charset="UTF-8" ZK is able to guarantee that there is only one leader for the purposes of updating ZK data. That is because all commits have to originate with the current quorum leader and then be acknowledged by a quorum of the current cluster. IF the leader can't get enough acks, then it has de facto lost leadership. The problem comes when there is another system that depends on ZK data. Such data might record which node is the leader for some other purposes. That leader will only assume that they have become leader if they succeed in writing to ZK, but if there is a partition, then the old leader may not be notified that another leader has established themselves until some time after it has happened. Of course, if the erstwhile leader tried to validate its position with a write to ZK, that write would fail, but you can't spend 100% of your time doing that. it all comes down to the fact that a ZK client determines that it is connected to a ZK cluster member by pinging and that cluster member sees heartbeats from the leader. The fact is, though, that you can't tune these pings to be faster than some level because you start to see lots of false positives for loss of connection. Remember that it isn't just loss of connection here that is the point. Any kind of delay would have the same effect. Getting these ping intervals below one second makes for a very twitchy system. On Fri, Dec 7, 2018 at 11:03 AM Michael Borokhovich wrote: > We are planning to run Zookeeper nodes embedded with the client nodes. > I.e., each client runs also a ZK node. So, network partition will > disconnect a ZK node and not only the client. > My concern is about the following statement from the ZK documentation: > > "Timeliness: The clients view of the system is guaranteed to be up-to-date > within a certain time bound. (*On the order of tens of seconds.*) Either > system changes will be seen by a client within this bound, or the client > will detect a service outage." > > What are these "*tens of seconds*"? Can we reduce this time by configuring > "syncLimit" and "tickTime" to let's say 5 seconds? Can we have a strong > guarantee on this time bound? > > > On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman < > jordan@jordanzimmerman.com> > wrote: > > > > Old service leader will detect network partition max 15 seconds after > it > > > happened. > > > > If the old service leader is in a very long GC it will not detect the > > partition. In the face of VM pauses, etc. it's not possible to avoid 2 > > leaders for a short period of time. > > > > -JZ > --000000000000272f3b057c693734--