hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flavio Paiva Junqueira (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-569) Failure of elected leader can lead to never-ending leader election
Date Sat, 06 Feb 2010 15:36:27 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830521#action_12830521
] 

Flavio Paiva Junqueira commented on ZOOKEEPER-569:
--------------------------------------------------

Thanks, Henry, it looks good. I agree with your comment on the confusion between LE between
instantiated every time it is used, and FLE behaving differently. We should really just have
one model.

One comment on the patch is that I don't think you need to instantiate QuorumCnxManager in
mockServer() on the new test. The conditional block that checks the listener can also be removed.

> Failure of elected leader can lead to never-ending leader election
> ------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-569
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-569
>             Project: Zookeeper
>          Issue Type: Bug
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>             Fix For: 3.3.0
>
>         Attachments: zookeeper-569.patch, zookeeper-569.patch, zookeeper-569.patch
>
>
> It is possible for basic LeaderElection to enter a situation where it never terminates.

> As an example, consider a three node cluster A, B and C.
> 1. In the first round, A votes for A, B votes for B and C votes for C
> 2. Since C > B > A, all nodes resolve to vote for C in the second round as there
is no first round winner
> 3. A, B vote for C, but C fails.
> 4. C is not elected because neither A nor B hear from it, and so votes for it are discarded
> 5. A and B never reset their votes, despite not hearing from C, so continue to vote for
it ad infinitum. 
> Step 5 is the bug. If A and B reset their votes to themselves in the case where the heard-from
vote set is empty, leader election will continue.
> I do not know if this affects running ZK clusters, as it is possible that the out-of-band
failure detection protocols may cause leader election to be restarted anyhow, but I've certainly
seen this in tests. 
> I have a trivial patch which fixes it, but it needs a test (and tests for race conditions
are hard to write!)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message