hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (JIRA)" <j...@apache.org>
Subject [jira] Updated: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election
Date Wed, 03 Feb 2010 23:31:27 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Henry Robinson updated ZOOKEEPER-569:

    Attachment: zookeeper-569.patch

Here's a patch with tests that appears to fix the issue (test fails without fix, test succeeds
with). All tests pass for me with this patch on my laptop. 

I have replaced one kludge with another here. QuorumPeer.electionAlg is set to null when electionType==0
until the election is actually run. This causes problems if you want to retrieve the electionAlg
object via getElectionAlg() beforehand for tests. 

I've set it up so that makeLEStrategy always creates a new LeaderElection if electionType
== 0, but also that createElectionAlgorithm sets electionAlg=new LeaderElection(this) instead
of null, so that as long as startLeaderElection has been called, getElectionAlg() won't return

I've checked to see if this will cause any obvious problems for the call sites of getElectionAlg
and couldn't find anything that expected null. It seems more consistent to me this way. The
question I have is over why LeaderElection needs re-instantiating each time when FLE does

If this sounds confusing, it's because the code really is! The interaction of createElectionAlgorithm,
startLeaderElection and makeLEStrategy is hard to discern. 

> Failure of elected leader can lead to never-ending leader election
> ------------------------------------------------------------------
>                 Key: ZOOKEEPER-569
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
>             Project: Zookeeper
>          Issue Type: Bug
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>             Fix For: 3.3.0
>         Attachments: zookeeper-569.patch, zookeeper-569.patch, zookeeper-569.patch
> It is possible for basic LeaderElection to enter a situation where it never terminates.

> As an example, consider a three node cluster A, B and C.
> 1. In the first round, A votes for A, B votes for B and C votes for C
> 2. Since C > B > A, all nodes resolve to vote for C in the second round as there
is no first round winner
> 3. A, B vote for C, but C fails.
> 4. C is not elected because neither A nor B hear from it, and so votes for it are discarded
> 5. A and B never reset their votes, despite not hearing from C, so continue to vote for
it ad infinitum. 
> Step 5 is the bug. If A and B reset their votes to themselves in the case where the heard-from
vote set is empty, leader election will continue.
> I do not know if this affects running ZK clusters, as it is possible that the out-of-band
failure detection protocols may cause leader election to be restarted anyhow, but I've certainly
seen this in tests. 
> I have a trivial patch which fixes it, but it needs a test (and tests for race conditions
are hard to write!)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message