zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xiangyq000 (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ZOOKEEPER-2959) New epoch computation logic confusion
Date Wed, 20 Dec 2017 07:26:00 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

xiangyq000 updated ZOOKEEPER-2959:
----------------------------------
    Description: 
Once the ZooKeeper cluster finishes the election for new leader, all learners report their
accepted epoch to the leader for the computation of new cluster epoch.

org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
{code:java}

    private HashSet<Long> connectingFollowers = new HashSet<Long>();
    public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException,
IOException {
        synchronized(connectingFollowers) {
            if (!waitingForNewEpoch) {
                return epoch;
            }
            if (lastAcceptedEpoch >= epoch) {
                epoch = lastAcceptedEpoch+1;
            }
            connectingFollowers.add(sid);
            QuorumVerifier verifier = self.getQuorumVerifier();
            if (connectingFollowers.contains(self.getId()) && 
                                            verifier.containsQuorum(connectingFollowers))
{
                waitingForNewEpoch = false;
                self.setAcceptedEpoch(epoch);
                connectingFollowers.notifyAll();
            } else {
                long start = Time.currentElapsedTime();
                long cur = start;
                long end = start + self.getInitLimit()*self.getTickTime();
                while(waitingForNewEpoch && cur < end) {
                    connectingFollowers.wait(end - cur);
                    cur = Time.currentElapsedTime();
                }
                if (waitingForNewEpoch) {
                    throw new InterruptedException("Timeout while waiting for epoch from quorum");
       
                }
            }
            return epoch;
        }
    }
{code}

The computation will get an outcome once :
# The leader has reported its epoch.
# The number of all reporters is greater than half quorum, i.e., half of PARTICIPANTS.

The problem is, an observer server is not a PARTICIPANT, while this procedure treat observers
as participants.

Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader
and the observer have reported their epoch while neither of the followers has. Thus, the connectingFollowers
set consists of two elements, resulting in a size of 2, which is greater than half quorum,
namely, 2. So the if condition is met.

This procedure can be confusing. 
# The  connectingFollowers set can contain elements of SID of observers. (In fact, at least
it must contain the SID of the leader).
# The intent of QuorumVerifier#containsQuorum is to check whether a set of PARTICIPANTS makes
a quorum. However, here it just regards a set of peers as a set of participants.

There are 2 candidate solutions.
# Ignore epoch from observers.
# require number_of_reported_peers > number_of_all_peers / 2 , other than number_of_all_participants.

The similar confusion exists in the following procedure when the leader counts the ACKs for
the new epoch from learners.

  was:
Once the ZooKeeper cluster finishes the election for new leader, all learners report their
accepted epoch to the leader for the computation of new cluster epoch.

org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
{code:java}

    private HashSet<Long> connectingFollowers = new HashSet<Long>();
    public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException,
IOException {
        synchronized(connectingFollowers) {
            if (!waitingForNewEpoch) {
                return epoch;
            }
            if (lastAcceptedEpoch >= epoch) {
                epoch = lastAcceptedEpoch+1;
            }
            connectingFollowers.add(sid);
            QuorumVerifier verifier = self.getQuorumVerifier();
            if (connectingFollowers.contains(self.getId()) && 
                                            verifier.containsQuorum(connectingFollowers))
{
                waitingForNewEpoch = false;
                self.setAcceptedEpoch(epoch);
                connectingFollowers.notifyAll();
            } else {
                long start = Time.currentElapsedTime();
                long cur = start;
                long end = start + self.getInitLimit()*self.getTickTime();
                while(waitingForNewEpoch && cur < end) {
                    connectingFollowers.wait(end - cur);
                    cur = Time.currentElapsedTime();
                }
                if (waitingForNewEpoch) {
                    throw new InterruptedException("Timeout while waiting for epoch from quorum");
       
                }
            }
            return epoch;
        }
    }
{code}

The computation will get an outcome once :
1. The leader has reported its epoch.
2. The number of all reporters is greater than half quorum, i.e., half of PARTICIPANTS.

The problem is, an observer server is not a PARTICIPANT, while this procedure treat observers
as participants.

Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader
and the observer have reported their epoch while neither of the followers has. Thus, the connectingFollowers
set consists of two elements, resulting in a size of 2, which is greater than half quorum,
namely, 2. So the if condition is met.

This procedure can be confusing. 
1. The  connectingFollowers set can contain elements of SID of observers. (In fact, at least
it must contain the SID of the leader).
2. The intent of QuorumVerifier#containsQuorum is to check whether a set of PARTICIPANTS makes
a quorum. However, here it just regards a set of peers as a set of participants.

The similar confusion exists in the following procedure when the leader counts the ACKs for
the new epoch from learners.


> New epoch computation logic confusion
> -------------------------------------
>
>                 Key: ZOOKEEPER-2959
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.11
>            Reporter: xiangyq000
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners report
their accepted epoch to the leader for the computation of new cluster epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
>     private HashSet<Long> connectingFollowers = new HashSet<Long>();
>     public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException,
IOException {
>         synchronized(connectingFollowers) {
>             if (!waitingForNewEpoch) {
>                 return epoch;
>             }
>             if (lastAcceptedEpoch >= epoch) {
>                 epoch = lastAcceptedEpoch+1;
>             }
>             connectingFollowers.add(sid);
>             QuorumVerifier verifier = self.getQuorumVerifier();
>             if (connectingFollowers.contains(self.getId()) && 
>                                             verifier.containsQuorum(connectingFollowers))
{
>                 waitingForNewEpoch = false;
>                 self.setAcceptedEpoch(epoch);
>                 connectingFollowers.notifyAll();
>             } else {
>                 long start = Time.currentElapsedTime();
>                 long cur = start;
>                 long end = start + self.getInitLimit()*self.getTickTime();
>                 while(waitingForNewEpoch && cur < end) {
>                     connectingFollowers.wait(end - cur);
>                     cur = Time.currentElapsedTime();
>                 }
>                 if (waitingForNewEpoch) {
>                     throw new InterruptedException("Timeout while waiting for epoch from
quorum");        
>                 }
>             }
>             return epoch;
>         }
>     }
> {code}
> The computation will get an outcome once :
> # The leader has reported its epoch.
> # The number of all reporters is greater than half quorum, i.e., half of PARTICIPANTS.
> The problem is, an observer server is not a PARTICIPANT, while this procedure treat observers
as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the
leader and the observer have reported their epoch while neither of the followers has. Thus,
the connectingFollowers set consists of two elements, resulting in a size of 2, which is greater
than half quorum, namely, 2. So the if condition is met.
> This procedure can be confusing. 
> # The  connectingFollowers set can contain elements of SID of observers. (In fact, at
least it must contain the SID of the leader).
> # The intent of QuorumVerifier#containsQuorum is to check whether a set of PARTICIPANTS
makes a quorum. However, here it just regards a set of peers as a set of participants.
> There are 2 candidate solutions.
> # Ignore epoch from observers.
> # require number_of_reported_peers > number_of_all_peers / 2 , other than number_of_all_participants.
> The similar confusion exists in the following procedure when the leader counts the ACKs
for the new epoch from learners.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message