zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jared Cantwell <jared.cantw...@gmail.com>
Subject Re: Dynamic reconfiguration
Date Sat, 28 Jul 2012 17:17:45 GMT
Thanks Alex for the detailed explanations--  it really helps to fill in my
understanding of the implementation left open by the papers/presentations
I've read (without having to read the code yet :-) ).  #2 is what I was
unsure of, but makes perfect sense.

Obviously committing the new configuration to the internal database is a
prerequisite to committing on a server, but is writing the new *configuration
file* to disk also a prerequisite for committing the new configuration?
 I'm curious about this so I can match it with my observations, since
reading the configuration file is much easier than getting the database


On Sat, Jul 28, 2012 at 11:02 AM, Alexander Shraer <shralex@gmail.com>wrote:

> Hi Jared,
> figuring out what happened and how to recover is part of the
> reconfiguration protocol. I don't think that this is something you as a
> user should do, unless I missunderstand what you're trying to do. This
> should be handled by ZooKeeper just like it handles other failures without
> admin intervention.
> In your scenario, D-F come up and one of them is elected leader (since you
> said they know about the commit), so they start running the new config
> normally. When A-C come up, several things may happen:
> 1. During the preliminary FastLeaderElection, A-C will try to connect to D
> and E, and in fact they'll also try to connect with the new config members
> that they know was proposed. So most chances are that someone in the new
> config will send them the new config file and they'll store it and act
> accordingly (connect as non-voting followers in the new config). To make
> this happen, I changed FastLeaderElection to talk with proposed configs (if
> known) and to piggiback the last active config you know of on all messages.
> 2. Its possible that somehow A-C complete FastLeaderElection without
> talking to D-F. But since a reconfiguration was committed, it was acked by
> a quorum of the old config (and a quorum of the new one). Therefore,
> whoever is "elected" in the old config, knows about the reconfig proposal
> (this is guaranteed by normal ZooKeeper leader recovery). Before doing
> anything else, the new leader among A-C will try to complete the
> reconfiguration, which involves getting enough acks from a quorum of the
> new config. But in your scenario the servers in the new config will not
> connect to it because they moved on, so the candidate-leader will just give
> up and go back to (1) above.
> 3. In the remote chance that someone who heard about the reconfig commit
> connects to a candidate-leader who didn't hear about it, the first thing it
> does  is to tell that candidate-leader that its not up to date, and the
> leader just updates its config file, gives up on being a leader and returns
> to (1). This was done by changing the first message that a
> follower/observer sends to a leader it is connecting to, even before the
> synchronization starts.
> Alex
> On Sat, Jul 28, 2012 at 8:43 AM, Jared Cantwell  <jared.cantwell@gmail.com
> > wrote:
>> So I'm working through some failure scenarios and I want to make sure I
>> fully understand the way that dynamic membership changes previous behavior,
>> so are my expectations correct in this situation:
>> As in my previous example, lets say that the current membership of voting
>> participants is {A,B,C,D,E} and we're looking to change membership to
>> {D,E,F,G,H}.
>> 1. Reconfiguration to {D,E,F,G,H} completes internally
>> 2. D-F update their local configuration files, but A-C do not yet.
>> 3. Power loss to all nodes
>> Now what happens if A,B, and C come up with configuration files that
>> still say {A,B,C,D,E}, but no other servers start up yet?  Can A,B and C
>> form a quorum and elect a leader since they all agree on the same state?
>>  What then happens when the new membership of D-H starts up?
>> We're trying to automatically handle node failures during reconfiguration
>> situations, but it seems like without being able to query all nodes to make
>> sure you know of the latest membership list there is no safe way to do
>> this.  I'm wondering if only doing single node additions/removals would
>> create less complicated failure scenarios.  What are your thoughts and best
>> practices around this?
>> Thanks!
>> Jared
>> On Fri, Jul 27, 2012 at 8:57 PM, Jared Cantwell <jared.cantwell@gmail.com
>> > wrote:
>>> We are trying to remove the need for all admin intervention so that is
>>> one failure scenario that is interesting to us.
>>> Jared
>>> On Jul 27, 2012, at 7:42 PM, Alexander Shraer <shralex@gmail.com> wrote:
>>> Yes, this entry will be deleted. I don't like this either - if a new
>>> follower reboots before added to the config it will not be able to boot up
>>> without manual help from an admin. That's why I'm considering maybe to
>>> remove the check that a participant must always initially be in its own
>>> config, but for now its there.
>>> Alex
>>> On Fri, Jul 27, 2012 at 6:34 PM, Jared Cantwell <
>>> jared.cantwell@gmail.com> wrote:
>>>> Sorry for the confusion in terminology, I was unfamiliar with the exact
>>>> leader/follower semantics previously.
>>>> So if all connected servers update their config file, does that mean
>>>> that non-voting followers who aren't part of the new ensemble will lose the
>>>> entry specific to them in their config file?  I can test this myself, but
>>>> getting an inside perspective is very helpful.
>>>> Thanks again for the help!
>>>> Jared
>>>> On Jul 27, 2012, at 6:55 PM, Alexander Shraer <shralex@gmail.com>
>>>> wrote:
>>>> Yes, any number of followers which are not in the configuration can
>>>> just connect and listen in. This has always been the case, also in 3.4, I
>>>> just made use of this for the purpose of adding members during
>>>> reconfiguration. Moreover, in 3.4 there this bug ZOOKEEPER-1113<https://issues.apache.org/jira/browse/ZOOKEEPER-1113>
>>>> where the leader actually counts the votes of anyone connected,
>>>> regardless of config membership :) This is fixed in ZK-107, so they are
>>>> really non-voting followers.
>>>> >   I am assuming that's the case, and that it is a follower (and not
>>>> > participant) by virtue of not being in the official configuration
>>>> stored in
>>>> > zookeeper itself.
>>>> Follower and participant types of servers is not something that was
>>>> defined in ZK-107. In ZooKeeper every follower/leader is a "participant".
>>>> Its just that the votes of participants that are not in the configuration
>>>> are not counted that's why we call them non-voting followers. BTW,
>>>> obviously a non-voting follower can not become leader (like ZK-1113 this
>>>> was also not enforced before ZK-107).
>>>> > And a followup... does zookeeper only overwrite the dynamic
>>>> > configuration file for nodes that are voting participants?  Such that
>>>> if I
>>>> > started a follower and then left it running through some
>>>> > reconfigurations, its file would not get updated if it was never
>>>> added as
>>>> > part of those reconfigurations?
>>>> No, as soon as it connects to the current leader, its dynamic config
>>>> file is overwritten with the current configuration as part of the
>>>> synchronization with the leader. Every time a new configuration is
>>>> committed, all connected servers (voting, non-voting, observers) will
>>>> update their dynamic config file, doesn't matter if they're in the config.
>>>> Alex
>>>> On Fri, Jul 27, 2012 at 5:35 PM, Jared Cantwell <
>>>> jared.cantwell@gmail.com> wrote:
>>>>> So does just having the server started and pointing to the existing
>>>>> ensemble automatically make it a "non participating follower"?  In other
>>>>> words, there is no need to inform the existing nodes that this new node
>>>>> joining as a follower?  And to extend that, there could be any number
>>>>> followers that are simply listening in on the event stream?  I am assuming
>>>>> that's the case, and that it is a follower (and not participant) by virtue
>>>>> of not being in the official configuration stored in zookeeper itself.
>>>>> On Fri, Jul 27, 2012 at 6:29 PM, Alexander Shraer <shralex@gmail.com>wrote:
>>>>>> there are just two supported types - participant and observer.
>>>>>> (participant can act as either follower or leader).
>>>>>> So you can either write participant or leave it unspecified (which
>>>>>> means participant by default). Also, since the ip is the same for
all your
>>>>>> ports you don't have to write it twice.  All of these should work
in the
>>>>>> same way:
>>>>>> server.5=;
>>>>>> server.5=;2181<>
>>>>>> server.5=;
>>>>>> server.5=;2181 <>
>>>>>> On Fri, Jul 27, 2012 at 5:25 PM, Jared Cantwell <
>>>>>> jared.cantwell@gmail.com> wrote:
>>>>>>> Thanks Alex for the response.  Our current lines in the
>>>>>>> configuration look like this:
>>>>>>> server.5=;
>>>>>>> For the new servers is it ok for their entry to have "participant"?
>>>>>>>  Or should that be something different (e.g. "follower")?
>>>>>>> ~Jared
>>>>>>> On Fri, Jul 27, 2012 at 6:20 PM, Alexander Shraer <shralex@gmail.com
>>>>>>> > wrote:
>>>>>>>> Hi Jared,
>>>>>>>> Thanks for experimenting with this feature.
>>>>>>>> The idea is that new servers join as "non voting followers".
>>>>>>>> means that they act as normal followers but the leader ignores
their votes
>>>>>>>> since they are not part of the current configuration. The
leader only
>>>>>>>> counts their votes during the reconfiguration itself (to
make sure a quorum
>>>>>>>> of the new config is ready before the new config can be
>>>>>>>> committed/activated). Defining them as observers is not a
good idea, for
>>>>>>>> example in your scenario if they were observers they wouldn't
be able to
>>>>>>>> participate in the reconfiguration protocol (which is similar
to the
>>>>>>>> protocol for committing any other operation in which observers
>>>>>>>> participate) and since we don't have a quorum of followers
in the new
>>>>>>>> config that can ack, reconfiguration would throw an exception
>>>>>>>> KeeperException.NEWCONFIGNOQUORUM type).
>>>>>>>> Of course if you intend them to be observers in the new config
>>>>>>>> can define them as observers since their votes are not needed
>>>>>>>> reconfig anyway.
>>>>>>>> You're right, the new servers must be able to connect to
the old
>>>>>>>> quorum. At minimum, their file should contain the current
leader, but
>>>>>>>> you can also copy the current configuration file to the new
>>>>>>>> if you wish.
>>>>>>>> In addition, you should add a line for the member itself,
so that
>>>>>>>> server F appears in F's config file (Its not important that
the other new
>>>>>>>> servers appear in F's file, but it won't hurt either, so
you can do a union
>>>>>>>> of old and new if you wish). The constructor of QuorumPeer
checks that the
>>>>>>>> server itself is in the configuration its started with, otherwise
its not
>>>>>>>> going to run. This check has always been there, but I'm thinking
>>>>>>>> possibly changing it in the future.
>>>>>>>> As soon as F connects to the leader, its config file will
>>>>>>>> overwritten with the current config file as part of the synchronization
>>>>>>>> process.
>>>>>>>> Alex
>>>>>>>> On Fri, Jul 27, 2012 at 10:06 AM, Jared Cantwell <
>>>>>>>> jared.cantwell@gmail.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>> We are testing integration with 3.5.0 and dynamic membership
and I
>>>>>>>>> have a
>>>>>>>>> question.  If I have a current set of servers in my ensemble
>>>>>>>>> {A,B,C,D,E}
>>>>>>>>> and I want to reconfigure the ensemble to {D,E,F,G,H},
how should
>>>>>>>>> the
>>>>>>>>> dynamic config file on servers F,G,H be configured on
>>>>>>>>>  Should they
>>>>>>>>> have the old ensemble, the new ensemble, or the union
of both
>>>>>>>>> ensembles?
>>>>>>>>>  It seems like these new servers need to  know about
the old
>>>>>>>>> quorum, but
>>>>>>>>> since they aren't part of it yet its not clear to me
how they
>>>>>>>>> should be
>>>>>>>>> configured.  Should there be an intermediate configuration
>>>>>>>>> F,G, and H
>>>>>>>>> as simply Observers?
>>>>>>>>> I can't find much documentation on this so I want to
make sure I
>>>>>>>>> understand
>>>>>>>>> things correctly.
>>>>>>>>> Thanks!
>>>>>>>>> ~Jared

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message