Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@zookeeper.apache.org
Date: Fri, 29 Dec 2017 08:23:00 +0000 (UTC)
From: "xiangyq000 (JIRA)" <jira@apache.org>
To: dev@zookeeper.apache.org
Message-ID: <JIRA.13126179.1513752398000.545587.1514535780045@Atlassian.JIRA>
In-Reply-To: <JIRA.13126179.1513752398000@Atlassian.JIRA>
References: <JIRA.13126179.1513752398000@Atlassian.JIRA> <JIRA.13126179.1513752398490@jira-lw-us.apache.org>
Subject: [jira] [Updated] (ZOOKEEPER-2959) ignore epoch proposal and ack
 from observers when a newly elected leader computes new epoch
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Fri, 29 Dec 2017 08:23:10 -0000


     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xiangyq000 updated ZOOKEEPER-2959:
----------------------------------
    Description: 
Once the ZooKeeper cluster finishes the election for new leader, all learners report their accepted epoch to the leader for the computation of new cluster epoch.

org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
{code:java}
    private final HashSet<Long> connectingFollowers = new HashSet<Long>();
    public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException {
        synchronized(connectingFollowers) {
            if (!waitingForNewEpoch) {
                return epoch;
            }
            if (lastAcceptedEpoch >= epoch) {
                epoch = lastAcceptedEpoch+1;
            }
            connectingFollowers.add(sid);
            QuorumVerifier verifier = self.getQuorumVerifier();
            if (connectingFollowers.contains(self.getId()) &&
                                            verifier.containsQuorum(connectingFollowers)) {
                waitingForNewEpoch = false;
                self.setAcceptedEpoch(epoch);
                connectingFollowers.notifyAll();
            } else {
                long start = Time.currentElapsedTime();
                long cur = start;
                long end = start + self.getInitLimit()*self.getTickTime();
                while(waitingForNewEpoch && cur < end) {
                    connectingFollowers.wait(end - cur);
                    cur = Time.currentElapsedTime();
                }
                if (waitingForNewEpoch) {
                    throw new InterruptedException("Timeout while waiting for epoch from quorum");
                }
            }
            return epoch;
        }
    }
{code}

The computation will get an outcome once :
# The leader has call method "getEpochToPropose"
# The number of all reporters is greater than half of participants.

The problem is, an observer server will also send its accepted epoch to the leader, while this procedure treat observers as participants.

Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader and the observer have reported their accepted epochs while neither of the followers has. Thus, the connectingFollowers set consists of two elements, resulting in a size of 2, which is greater than half quorum, namely, 2. Then QuorumVerifier#containsQuorum will return true, because it does not check whether the elements of the parameter are participants.

The same flaw exists in org.apache.zookeeper.server.quorum.Leader#waitForEpochAck

  was:
Once the ZooKeeper cluster finishes the election for new leader, all learners report their accepted epoch to the leader for the computation of new cluster epoch.

org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
{code:java}
    private final HashSet<Long> connectingFollowers = new HashSet<Long>();
    public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException {
        synchronized(connectingFollowers) {
            if (!waitingForNewEpoch) {
                return epoch;
            }
            if (lastAcceptedEpoch >= epoch) {
                epoch = lastAcceptedEpoch+1;
            }
            connectingFollowers.add(sid);
            QuorumVerifier verifier = self.getQuorumVerifier();
            if (connectingFollowers.contains(self.getId()) &&
                                            verifier.containsQuorum(connectingFollowers)) {
                waitingForNewEpoch = false;
                self.setAcceptedEpoch(epoch);
                connectingFollowers.notifyAll();
            } else {
                long start = Time.currentElapsedTime();
                long cur = start;
                long end = start + self.getInitLimit()*self.getTickTime();
                while(waitingForNewEpoch && cur < end) {
                    connectingFollowers.wait(end - cur);
                    cur = Time.currentElapsedTime();
                }
                if (waitingForNewEpoch) {
                    throw new InterruptedException("Timeout while waiting for epoch from quorum");
                }
            }
            return epoch;
        }
    }
{code}

The computation will get an outcome once :
# The leader has call method "getEpochToPropose"
# The number of all reporters is greater than half of participants.

The problem is, an observer server will also send its accepted epoch to the leader, while this procedure treat observers as participants.

Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader and the observer have reported their accepted epochs while neither of the followers has. Thus, the connectingFollowers set consists of two elements, resulting in a size of 2, which is greater than half quorum, namely, 2. Then QuorumVerifier#containsQuorum will return true, because it does not check whether the elements of the parameter is a participant.

The same flaw exists in org.apache.zookeeper.server.quorum.Leader#waitForEpochAck


> ignore epoch proposal and ack from observers when a newly elected leader computes new epoch
> -------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2959
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.10, 3.5.3
>            Reporter: xiangyq000
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners report their accepted epoch to the leader for the computation of new cluster epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
>     private final HashSet<Long> connectingFollowers = new HashSet<Long>();
>     public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException {
>         synchronized(connectingFollowers) {
>             if (!waitingForNewEpoch) {
>                 return epoch;
>             }
>             if (lastAcceptedEpoch >= epoch) {
>                 epoch = lastAcceptedEpoch+1;
>             }
>             connectingFollowers.add(sid);
>             QuorumVerifier verifier = self.getQuorumVerifier();
>             if (connectingFollowers.contains(self.getId()) &&
>                                             verifier.containsQuorum(connectingFollowers)) {
>                 waitingForNewEpoch = false;
>                 self.setAcceptedEpoch(epoch);
>                 connectingFollowers.notifyAll();
>             } else {
>                 long start = Time.currentElapsedTime();
>                 long cur = start;
>                 long end = start + self.getInitLimit()*self.getTickTime();
>                 while(waitingForNewEpoch && cur < end) {
>                     connectingFollowers.wait(end - cur);
>                     cur = Time.currentElapsedTime();
>                 }
>                 if (waitingForNewEpoch) {
>                     throw new InterruptedException("Timeout while waiting for epoch from quorum");
>                 }
>             }
>             return epoch;
>         }
>     }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader and the observer have reported their accepted epochs while neither of the followers has. Thus, the connectingFollowers set consists of two elements, resulting in a size of 2, which is greater than half quorum, namely, 2. Then QuorumVerifier#containsQuorum will return true, because it does not check whether the elements of the parameter are participants.
> The same flaw exists in org.apache.zookeeper.server.quorum.Leader#waitForEpochAck


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)