Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 34157D668 for ; Sat, 28 Jul 2012 17:02:57 +0000 (UTC) Received: (qmail 77578 invoked by uid 500); 28 Jul 2012 17:02:56 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 77537 invoked by uid 500); 28 Jul 2012 17:02:56 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 77529 invoked by uid 99); 28 Jul 2012 17:02:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Jul 2012 17:02:56 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,FSL_RCVD_USER,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shralex@gmail.com designates 209.85.214.170 as permitted sender) Received: from [209.85.214.170] (HELO mail-ob0-f170.google.com) (209.85.214.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Jul 2012 17:02:47 +0000 Received: by obfk16 with SMTP id k16so5411277obf.15 for ; Sat, 28 Jul 2012 10:02:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=vvitzhLHgItVcWOuW2F2kpBrT64c3kKHJZu/7iwcNqg=; b=Fq9qvlnlNo3M1IrLRdMTrls/xFk7RCurQ8q0wHt4POV/PtEyEpp6wo1WnnXOYg3S2o FAasq/cHcS2ZKlwWUNm1dBlu6A3pU428ZlfeFTenQnUPrQAqOHs6cCJ8mUQRX2aM0TsP 4k+k3OAIayTZyuoNqqVACGnykFO8VkhfRcHnZ3nwD13eGzoniu2BNty9CM0YNQEvcKZe za5cK4JiFns2ewWQ6v0/JBn4vFchudsqWKWCgx6JpsL4i/bpsBbIGxHxI2I/n+tcz7HC XVGFG4GAbhbD4423UPtcINsfES1xTpzFxG5AdtXpcmuGl4Fbr/UiNN+O8Bzjyztb/Xd6 Nowg== Received: by 10.60.1.69 with SMTP id 5mr9065988oek.66.1343494946585; Sat, 28 Jul 2012 10:02:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.22.161 with HTTP; Sat, 28 Jul 2012 10:02:05 -0700 (PDT) In-Reply-To: References: <2F1B66C8-CAC0-4CF3-A7AF-E43E03B983F4@gmail.com> <125E14C6-2E02-4480-9704-4F3720FA6A16@gmail.com> From: Alexander Shraer Date: Sat, 28 Jul 2012 10:02:05 -0700 Message-ID: Subject: Re: Dynamic reconfiguration To: Jared Cantwell Cc: "user@zookeeper.apache.org" Content-Type: multipart/alternative; boundary=e89a8ff253620c7d8d04c5e6cbc9 --e89a8ff253620c7d8d04c5e6cbc9 Content-Type: text/plain; charset=ISO-8859-1 Hi Jared, figuring out what happened and how to recover is part of the reconfiguration protocol. I don't think that this is something you as a user should do, unless I missunderstand what you're trying to do. This should be handled by ZooKeeper just like it handles other failures without admin intervention. In your scenario, D-F come up and one of them is elected leader (since you said they know about the commit), so they start running the new config normally. When A-C come up, several things may happen: 1. During the preliminary FastLeaderElection, A-C will try to connect to D and E, and in fact they'll also try to connect with the new config members that they know was proposed. So most chances are that someone in the new config will send them the new config file and they'll store it and act accordingly (connect as non-voting followers in the new config). To make this happen, I changed FastLeaderElection to talk with proposed configs (if known) and to piggiback the last active config you know of on all messages. 2. Its possible that somehow A-C complete FastLeaderElection without talking to D-F. But since a reconfiguration was committed, it was acked by a quorum of the old config (and a quorum of the new one). Therefore, whoever is "elected" in the old config, knows about the reconfig proposal (this is guaranteed by normal ZooKeeper leader recovery). Before doing anything else, the new leader among A-C will try to complete the reconfiguration, which involves getting enough acks from a quorum of the new config. But in your scenario the servers in the new config will not connect to it because they moved on, so the candidate-leader will just give up and go back to (1) above. 3. In the remote chance that someone who heard about the reconfig commit connects to a candidate-leader who didn't hear about it, the first thing it does is to tell that candidate-leader that its not up to date, and the leader just updates its config file, gives up on being a leader and returns to (1). This was done by changing the first message that a follower/observer sends to a leader it is connecting to, even before the synchronization starts. Alex On Sat, Jul 28, 2012 at 8:43 AM, Jared Cantwell wrote: > So I'm working through some failure scenarios and I want to make sure I > fully understand the way that dynamic membership changes previous behavior, > so are my expectations correct in this situation: > > As in my previous example, lets say that the current membership of voting > participants is {A,B,C,D,E} and we're looking to change membership to > {D,E,F,G,H}. > 1. Reconfiguration to {D,E,F,G,H} completes internally > 2. D-F update their local configuration files, but A-C do not yet. > 3. Power loss to all nodes > > Now what happens if A,B, and C come up with configuration files that still > say {A,B,C,D,E}, but no other servers start up yet? Can A,B and C form a > quorum and elect a leader since they all agree on the same state? What > then happens when the new membership of D-H starts up? > > We're trying to automatically handle node failures during reconfiguration > situations, but it seems like without being able to query all nodes to make > sure you know of the latest membership list there is no safe way to do > this. I'm wondering if only doing single node additions/removals would > create less complicated failure scenarios. What are your thoughts and best > practices around this? > > Thanks! > Jared > > On Fri, Jul 27, 2012 at 8:57 PM, Jared Cantwell wrote: > >> We are trying to remove the need for all admin intervention so that is >> one failure scenario that is interesting to us. >> >> Jared >> >> >> On Jul 27, 2012, at 7:42 PM, Alexander Shraer wrote: >> >> Yes, this entry will be deleted. I don't like this either - if a new >> follower reboots before added to the config it will not be able to boot up >> without manual help from an admin. That's why I'm considering maybe to >> remove the check that a participant must always initially be in its own >> config, but for now its there. >> >> Alex >> >> On Fri, Jul 27, 2012 at 6:34 PM, Jared Cantwell > > wrote: >> >>> Sorry for the confusion in terminology, I was unfamiliar with the exact >>> leader/follower semantics previously. >>> >>> So if all connected servers update their config file, does that mean >>> that non-voting followers who aren't part of the new ensemble will lose the >>> entry specific to them in their config file? I can test this myself, but >>> getting an inside perspective is very helpful. >>> >>> Thanks again for the help! >>> Jared >>> >>> >>> On Jul 27, 2012, at 6:55 PM, Alexander Shraer wrote: >>> >>> Yes, any number of followers which are not in the configuration can just >>> connect and listen in. This has always been the case, also in 3.4, I just >>> made use of this for the purpose of adding members during reconfiguration. >>> Moreover, in 3.4 there this bug ZOOKEEPER-1113 >>> where the leader actually counts the votes of anyone connected, >>> regardless of config membership :) This is fixed in ZK-107, so they are >>> really non-voting followers. >>> >>> > I am assuming that's the case, and that it is a follower (and not >>> > participant) by virtue of not being in the official configuration >>> stored in >>> > zookeeper itself. >>> >>> Follower and participant types of servers is not something that was >>> defined in ZK-107. In ZooKeeper every follower/leader is a "participant". >>> Its just that the votes of participants that are not in the configuration >>> are not counted that's why we call them non-voting followers. BTW, >>> obviously a non-voting follower can not become leader (like ZK-1113 this >>> was also not enforced before ZK-107). >>> >>> > And a followup... does zookeeper only overwrite the dynamic >>> > configuration file for nodes that are voting participants? Such that >>> if I >>> > started a follower and then left it running through some >>> > reconfigurations, its file would not get updated if it was never added >>> as >>> > part of those reconfigurations? >>> >>> No, as soon as it connects to the current leader, its dynamic config >>> file is overwritten with the current configuration as part of the >>> synchronization with the leader. Every time a new configuration is >>> committed, all connected servers (voting, non-voting, observers) will >>> update their dynamic config file, doesn't matter if they're in the config. >>> >>> Alex >>> >>> On Fri, Jul 27, 2012 at 5:35 PM, Jared Cantwell < >>> jared.cantwell@gmail.com> wrote: >>> >>>> So does just having the server started and pointing to the existing >>>> ensemble automatically make it a "non participating follower"? In other >>>> words, there is no need to inform the existing nodes that this new node is >>>> joining as a follower? And to extend that, there could be any number of >>>> followers that are simply listening in on the event stream? I am assuming >>>> that's the case, and that it is a follower (and not participant) by virtue >>>> of not being in the official configuration stored in zookeeper itself. >>>> >>>> On Fri, Jul 27, 2012 at 6:29 PM, Alexander Shraer wrote: >>>> >>>>> there are just two supported types - participant and observer. >>>>> (participant can act as either follower or leader). >>>>> >>>>> So you can either write participant or leave it unspecified (which >>>>> means participant by default). Also, since the ip is the same for all your >>>>> ports you don't have to write it twice. All of these should work in the >>>>> same way: >>>>> >>>>> server.5=10.10.5.17:2182:2183:participant;10.10.5.17:2181 >>>>> server.5=10.10.5.17:2182:2183:participant;2181 >>>>> server.5=10.10.5.17:2182:2183;10.10.5.17:2181 >>>>> server.5=10.10.5.17:2182:2183;2181 >>>>> >>>>> >>>>> >>>>> On Fri, Jul 27, 2012 at 5:25 PM, Jared Cantwell < >>>>> jared.cantwell@gmail.com> wrote: >>>>> >>>>>> Thanks Alex for the response. Our current lines in the configuration >>>>>> look like this: >>>>>> >>>>>> server.5=10.10.5.17:2182:2183:participant;10.10.5.17:2181 >>>>>> >>>>>> For the new servers is it ok for their entry to have "participant"? >>>>>> Or should that be something different (e.g. "follower")? >>>>>> >>>>>> ~Jared >>>>>> >>>>>> On Fri, Jul 27, 2012 at 6:20 PM, Alexander Shraer wrote: >>>>>> >>>>>>> Hi Jared, >>>>>>> >>>>>>> Thanks for experimenting with this feature. >>>>>>> >>>>>>> The idea is that new servers join as "non voting followers". Which >>>>>>> means that they act as normal followers but the leader ignores their votes >>>>>>> since they are not part of the current configuration. The leader only >>>>>>> counts their votes during the reconfiguration itself (to make sure a quorum >>>>>>> of the new config is ready before the new config can be >>>>>>> committed/activated). Defining them as observers is not a good idea, for >>>>>>> example in your scenario if they were observers they wouldn't be able to >>>>>>> participate in the reconfiguration protocol (which is similar to the >>>>>>> protocol for committing any other operation in which observers don't >>>>>>> participate) and since we don't have a quorum of followers in the new >>>>>>> config that can ack, reconfiguration would throw an exception (of >>>>>>> KeeperException.NEWCONFIGNOQUORUM type). >>>>>>> Of course if you intend them to be observers in the new config you >>>>>>> can define them as observers since their votes are not needed during >>>>>>> reconfig anyway. >>>>>>> >>>>>>> You're right, the new servers must be able to connect to the old >>>>>>> quorum. At minimum, their file should contain the current leader, but >>>>>>> you can also copy the current configuration file to the new members >>>>>>> if you wish. >>>>>>> >>>>>>> In addition, you should add a line for the member itself, so that >>>>>>> server F appears in F's config file (Its not important that the other new >>>>>>> servers appear in F's file, but it won't hurt either, so you can do a union >>>>>>> of old and new if you wish). The constructor of QuorumPeer checks that the >>>>>>> server itself is in the configuration its started with, otherwise its not >>>>>>> going to run. This check has always been there, but I'm thinking of >>>>>>> possibly changing it in the future. >>>>>>> >>>>>>> As soon as F connects to the leader, its config file will be >>>>>>> overwritten with the current config file as part of the synchronization >>>>>>> process. >>>>>>> >>>>>>> Alex >>>>>>> >>>>>>> >>>>>>> On Fri, Jul 27, 2012 at 10:06 AM, Jared Cantwell < >>>>>>> jared.cantwell@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We are testing integration with 3.5.0 and dynamic membership and I >>>>>>>> have a >>>>>>>> question. If I have a current set of servers in my ensemble >>>>>>>> {A,B,C,D,E} >>>>>>>> and I want to reconfigure the ensemble to {D,E,F,G,H}, how should >>>>>>>> the >>>>>>>> dynamic config file on servers F,G,H be configured on startup? >>>>>>>> Should they >>>>>>>> have the old ensemble, the new ensemble, or the union of both >>>>>>>> ensembles? >>>>>>>> It seems like these new servers need to know about the old >>>>>>>> quorum, but >>>>>>>> since they aren't part of it yet its not clear to me how they >>>>>>>> should be >>>>>>>> configured. Should there be an intermediate configuration with >>>>>>>> F,G, and H >>>>>>>> as simply Observers? >>>>>>>> >>>>>>>> I can't find much documentation on this so I want to make sure I >>>>>>>> understand >>>>>>>> things correctly. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> ~Jared >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > --e89a8ff253620c7d8d04c5e6cbc9--