Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 89A4A200D66 for ; Fri, 29 Dec 2017 08:55:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 881BD160C33; Fri, 29 Dec 2017 07:55:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A5EF9160C22 for ; Fri, 29 Dec 2017 08:55:04 +0100 (CET) Received: (qmail 62525 invoked by uid 500); 29 Dec 2017 07:55:03 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 62513 invoked by uid 99); 29 Dec 2017 07:55:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Dec 2017 07:55:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 8D59818096B for ; Fri, 29 Dec 2017 07:55:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id io-r2f5VJodW for ; Fri, 29 Dec 2017 07:55:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id EBE2E5F23D for ; Fri, 29 Dec 2017 07:55:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 620BDE0617 for ; Fri, 29 Dec 2017 07:55:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0D238212F8 for ; Fri, 29 Dec 2017 07:55:00 +0000 (UTC) Date: Fri, 29 Dec 2017 07:55:00 +0000 (UTC) From: "xiangyq000 (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ZOOKEEPER-2959) ignore epoch proposal and ack from observers when a newly elected leader computes new epoch MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 29 Dec 2017 07:55:05 -0000 [ https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyq000 updated ZOOKEEPER-2959: ---------------------------------- Description: Once the ZooKeeper cluster finishes the election for new leader, all learners report their accepted epoch to the leader for the computation of new cluster epoch. org.apache.zookeeper.server.quorum.Leader#getEpochToPropose {code:java} private final HashSet connectingFollowers = new HashSet(); public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException { synchronized(connectingFollowers) { if (!waitingForNewEpoch) { return epoch; } if (lastAcceptedEpoch >= epoch) { epoch = lastAcceptedEpoch+1; } connectingFollowers.add(sid); QuorumVerifier verifier = self.getQuorumVerifier(); if (connectingFollowers.contains(self.getId()) && verifier.containsQuorum(connectingFollowers)) { waitingForNewEpoch = false; self.setAcceptedEpoch(epoch); connectingFollowers.notifyAll(); } else { long start = Time.currentElapsedTime(); long cur = start; long end = start + self.getInitLimit()*self.getTickTime(); while(waitingForNewEpoch && cur < end) { connectingFollowers.wait(end - cur); cur = Time.currentElapsedTime(); } if (waitingForNewEpoch) { throw new InterruptedException("Timeout while waiting for epoch from quorum"); } } return epoch; } } {code} The computation will get an outcome once : # The leader has call method "getEpochToPropose" # The number of all reporters is greater than half of participants. The problem is, an observer server will also send its accepted epoch to the leader, while this procedure treat observers as participants. Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader and the observer have reported their accepted epochs while neither of the followers has. Thus, the connectingFollowers set consists of two elements, resulting in a size of 2, which is greater than half quorum, namely, 2. Then QuorumVerifier#containsQuorum will return true, because it does not check whether the elements of the parameter is a participant. was: Once the ZooKeeper cluster finishes the election for new leader, all learners report their accepted epoch to the leader for the computation of new cluster epoch. org.apache.zookeeper.server.quorum.Leader#getEpochToPropose {code:java} private final HashSet connectingFollowers = new HashSet(); public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException { synchronized(connectingFollowers) { if (!waitingForNewEpoch) { return epoch; } if (lastAcceptedEpoch >= epoch) { epoch = lastAcceptedEpoch+1; } connectingFollowers.add(sid); QuorumVerifier verifier = self.getQuorumVerifier(); if (connectingFollowers.contains(self.getId()) && verifier.containsQuorum(connectingFollowers)) { waitingForNewEpoch = false; self.setAcceptedEpoch(epoch); connectingFollowers.notifyAll(); } else { long start = Time.currentElapsedTime(); long cur = start; long end = start + self.getInitLimit()*self.getTickTime(); while(waitingForNewEpoch && cur < end) { connectingFollowers.wait(end - cur); cur = Time.currentElapsedTime(); } if (waitingForNewEpoch) { throw new InterruptedException("Timeout while waiting for epoch from quorum"); } } return epoch; } } {code} The computation will get an outcome once : # The leader has call method # The number of all reporters is greater than half quorum, i.e., half of PARTICIPANTS. The problem is, an observer server is not a PARTICIPANT, while this procedure treat observers as participants. Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader and the observer have reported their epoch while neither of the followers has. Thus, the connectingFollowers set consists of two elements, resulting in a size of 2, which is greater than half quorum, namely, 2. So the if condition is met. This procedure can be confusing. # The connectingFollowers set can contain elements of SID of observers. (In fact, at least it must contain the SID of the leader). # The intent of QuorumVerifier#containsQuorum is to check whether a set of PARTICIPANTS makes a quorum. However, here it just regards a set of peers as a set of participants. There are 2 candidate solutions. # Ignore epoch from observers. # require (number_of_reported_peers > number_of_all_peers / 2) , instead of existing (number_of_reported_peers > number_of_all_participants / 2). The similar confusion exists in the following procedure when the leader counts the ACKs for the new epoch from learners. > ignore epoch proposal and ack from observers when a newly elected leader computes new epoch > ------------------------------------------------------------------------------------------- > > Key: ZOOKEEPER-2959 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.4.11 > Reporter: xiangyq000 > > Once the ZooKeeper cluster finishes the election for new leader, all learners report their accepted epoch to the leader for the computation of new cluster epoch. > org.apache.zookeeper.server.quorum.Leader#getEpochToPropose > {code:java} > private final HashSet connectingFollowers = new HashSet(); > public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException { > synchronized(connectingFollowers) { > if (!waitingForNewEpoch) { > return epoch; > } > if (lastAcceptedEpoch >= epoch) { > epoch = lastAcceptedEpoch+1; > } > connectingFollowers.add(sid); > QuorumVerifier verifier = self.getQuorumVerifier(); > if (connectingFollowers.contains(self.getId()) && > verifier.containsQuorum(connectingFollowers)) { > waitingForNewEpoch = false; > self.setAcceptedEpoch(epoch); > connectingFollowers.notifyAll(); > } else { > long start = Time.currentElapsedTime(); > long cur = start; > long end = start + self.getInitLimit()*self.getTickTime(); > while(waitingForNewEpoch && cur < end) { > connectingFollowers.wait(end - cur); > cur = Time.currentElapsedTime(); > } > if (waitingForNewEpoch) { > throw new InterruptedException("Timeout while waiting for epoch from quorum"); > } > } > return epoch; > } > } > {code} > The computation will get an outcome once : > # The leader has call method "getEpochToPropose" > # The number of all reporters is greater than half of participants. > The problem is, an observer server will also send its accepted epoch to the leader, while this procedure treat observers as participants. > Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader and the observer have reported their accepted epochs while neither of the followers has. Thus, the connectingFollowers set consists of two elements, resulting in a size of 2, which is greater than half quorum, namely, 2. Then QuorumVerifier#containsQuorum will return true, because it does not check whether the elements of the parameter is a participant. -- This message was sent by Atlassian JIRA (v6.4.14#64029)