Return-Path: X-Original-To: apmail-zookeeper-dev-archive@www.apache.org Delivered-To: apmail-zookeeper-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 94556C24B for ; Sun, 3 Nov 2013 19:15:19 +0000 (UTC) Received: (qmail 34179 invoked by uid 500); 3 Nov 2013 19:15:19 -0000 Delivered-To: apmail-zookeeper-dev-archive@zookeeper.apache.org Received: (qmail 34101 invoked by uid 500); 3 Nov 2013 19:15:18 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 33843 invoked by uid 99); 3 Nov 2013 19:15:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Nov 2013 19:15:18 +0000 Date: Sun, 3 Nov 2013 19:15:18 +0000 (UTC) From: =?utf-8?Q?Germ=C3=A1n_Blanco_=28JIRA=29?= To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ZOOKEEPER-1805) "Don't care" value in ZooKeeper election breaks rolling upgrades MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ZOOKEEPER-1805?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 13812455#comment-13812455 ]=20 Germ=C3=A1n Blanco commented on ZOOKEEPER-1805: ------------------------------------------ As far as I can see, there is never a mix of messages with and without don'= t care values. The don't care values never get sent over the network ... or at least that = was not intentional. I have noticed that the current value (-1) happens to be the same that was = being used by default in Vote.java for some of the incomplete constructors,= and this is why the value does appear in the traces sent by Ra=C3=BAl for = the epoch (note that the epoch was not set to don't care value in this case= ). But it has nothing to do with the patch for ZOOKEEPER-1732. You can see = that e.g. zxid does not have a don't care value in these traces. What your change is doing is that if there is a don't care value, then it c= hecks if the epoch is greater or equal between the vote with the don't care= value and the other. All votes in the outofelection collection have don't = care values, so the result is that the comparison for the epochs ignores th= e value of the epochs in all cases. Epoch may be greater of equal or smalle= r or equal for the comparison to be succesful when both votes being compare= d have don't care values. The same result would have been achieved by setting the epoch to the don't = care value when inserting the vote in the outofelection collection (and in = the call to termPredicate) and not making any changes at all in the compari= sons in Vote.java. And in that case also, the changes in learner.java leade= r.java and QuorumPeer.java are not good for anything any more, since all th= ey do is setting the value of the epoch to a common value in Learners and L= eader and that value is going to be ignored. That would be the approach tha= t I would be taking to implement your proposal. For a test case, it would b= e enough to modify the test case added in ZOOKEEPER-1732 and just set the p= eerEpoch to any value, so that it is clear that this value is also ignored = in the comparison. But as far as I can see, the current patch has the same = behaviour, and the last decision of how to code behaviours is yours, so bot= h solutions to this problem are fine for me. If the decision was mine, I would go for setting epoch to newEpoch-1. Which= might be (arguably) a bit hacky, but the hackery is actually only covering= the case of the upgrade and it doesn't have any effect in other cases. Ign= oring the epoch applies to all cases in which a new server joins an establi= shed ensemble and it might have (at least) the problem of votes of ensemble= s established with different epochs to be taken into account as if they bel= onged to the same ensemble. I don't like that too much, but failures don't = seem likely and they might not cause problems, since even if the new server= joins the wrong leader, this leader will not process any transaction unles= s it has acks from sufficient followers. So the potential problem seems to = be only an small possibility of a delay when joining the right ensemble. Th= at means both (newEpoch-1 and ignoring epoch) look to me as working solutio= ns. Sorry if that was too long, but I think it summarises all corners of my per= sonal view of this issue. The short summary is "I am ok with this solution"= . If you want a patch with my alternative implementation of the option of i= gnoring the epoch, I can also prepare that. > "Don't care" value in ZooKeeper election breaks rolling upgrades > ---------------------------------------------------------------- > > Key: ZOOKEEPER-1805 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1805 > Project: ZooKeeper > Issue Type: Bug > Reporter: Flavio Junqueira > Assignee: Flavio Junqueira > Priority: Blocker > Fix For: 3.4.6, 3.5.0 > > Attachments: ZOOKEEPER-1805-b3.4.patch, ZOOKEEPER-1805.patch, ZOO= KEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-18= 05.patch, ZOOKEEPER-1805.patch, ZOOKEEPER-1805.patch > > > This is an issue that has been originally reported in ZOOKEEPER-1732. -- This message was sent by Atlassian JIRA (v6.1#6144)