zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: Rolling config change considered harmful?
Date Sat, 15 Jun 2013 05:01:05 GMT
Hi German,

During normal operation ZK guarantees that a quorum (any majority) of
the ZK ensemble has all operations that may have been committed.

So without dynamic reconfiguration you should ensure that when you're
changing the ensemble, any possible quorum of the new ensemble
necessarily intersects with any quorum of the 'old' ensemble.

If you add D and E right away this property may not be guaranteed,
since a new quorum (3 out of 5) can be for example C, D, E. Whereas
its possible that only A and B have the latest state, so it'll be
lost.

To ensure this when going from 3 to 5 servers you should probably do 2
transitions. First add
server D. Any majority here, i.e., any 3 servers out of 4, will
necessarily contain 2 servers from the original ensemble (A, B, C).
So at least one server in any new quorum actually has the latest state.

Then add E. A quorum here is any 3 out of 5 servers, so even if some
quorum includes E and the server from (A, B, C, D) that
didn't have the latest state, we still have 1 server in the quorum
that does have the latest state, so we're fine...

As long as you add servers one by one and wait for leader election to
complete in every stage, or preserve the quorum intersection property
in some other way, it should be safe. But with dynamic reconfig you
don't need to do that and no reboots necessary of course.

Alex

On Fri, Jun 14, 2013 at 9:14 PM, German Blanco
<german.blanco.blanco@gmail.com> wrote:
> Hello,
>
> Could you please clarify if this thread is about a rolling start in an
> ensemble without the dynamic reconfiguration support?
> And when you say "Create a 5 node ensemble", that means quorum is 5. But
> then you give server lists with only 3 servers in each node?
> If the server list has 3 servers, then quorum is actually 3 and what is
> described may happen in that scenario.
> In that case C follows B, E follows D and A follows either B or D and there
> are two working ensembles.
> It should be possible to create problems, even with more standard
> configuration changes:
> If we want to change a quorum of three to a quorum of five {A,B,C} to
> {A,B,C,D,E}:
> - First the configuration is changed in all the nodes, but they are not
> restarted. Only A, B and C are running.
> - One of them is stopped (e.g. A).
> - At this point, if A, D and E are started with the new configuration, they
> may elect a leader before any of them is aware of either B or C, form an
> ensemble and start serving txns.
> - However, if A is started, we wait until it connects to the leader of B
> and C, and then D and E are started and then B and C are restarted,
> everything should be ok. The fact that this depends on the human ability to
> start D and E while A,B and C are connected to the ensemble seems a bit
> risky though.
> I have found a presentation on the topic:
> http://www.slideshare.net/Hadoop_Summit/dynamic-reconfiguration-of-zookeeper
>
> If anybody knows of a safer way to change a quorum of 3 to a quorum of 5
> with e.g. zookeeper 3.4.5, please point it out.
>
> Regards,
>
> Germán.
>
>
> On Fri, Jun 14, 2013 at 11:46 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> I got the test cluster into the state described with 2 leaders. I then
>> allocated 100 Curator clients to write nodes "/n" where n is the index
>> (i.e. "/0", "/1", …). The idea that the nodes would be distributed around
>> the cluster instances. I then allocated a single Curator instance dedicated
>> to one of the servers instance, did a sync, and did an exists() to verify
>> that each cluster instances had all the nodes. For the 2 leader cluster,
>> this fails.
>>
>> -JZ
>>
>> On Jun 14, 2013, at 1:54 PM, "FPJ" <fpjunqueira@yahoo.com> wrote:
>>
>> > I messed up the last sentence, here is what I was trying to say:
>> >
>> > It is ok to have two servers thinking they are leaders as long as only
>> one
>> > is
>> > able to commit txns at a time by having a quorum of supporters. Each
>> server
>> > is going to follow a single leader, so I don't see a problem in your
>> > scenario
>> > with the information you provided. Now if you tell me that when you keep
>> > sending new transactions to those leaders, both keep committing new
>> > transactions (not the same txns), then we have a problem. I don't see how
>> > this can happen, though.
>> >
>> > Also, one of the leaders should eventually time out and go back to leader
>> > election.
>> >
>> >> -----Original Message-----
>> >> From: FPJ [mailto:fpjunqueira@yahoo.com]
>> >> Sent: 14 June 2013 21:44
>> >> To: user@zookeeper.apache.org
>> >> Subject: RE: Rolling config change considered harmful?
>> >>
>> >> It is ok to have two servers thinking they are leaders as long as only
>> one
>> > is
>> >> able to commit txns at a time by having a quorum of supporters. Each
>> > server
>> >> is going to follow a single leader, so I don't see a problem in your
>> > scenario
>> >> with the information you provided. Now if you tell me that when you keep
>> >> sending new transactions to those leaders and they keep committing them
>> >> forever, both keep committing new transactions, then we have a problem.
>> I
>> >> don't see how this can happen, though.
>> >>
>> >> Also, one of the leaders should eventually time out and go back to
>> leader
>> >> election.
>> >>
>> >> -Flavio
>> >>
>> >>> -----Original Message-----
>> >>> From: Jordan Zimmerman [mailto:jordan@jordanzimmerman.com]
>> >>> Sent: 14 June 2013 21:10
>> >>> To: user@zookeeper.apache.org
>> >>> Subject: Re: Rolling config change considered harmful?
>> >>>
>> >>> More on this.
>> >>>
>> >>> I just did some testing with wholly contrived scenarios and I was able
>> >>> to
>> >> get a
>> >>> cluster in a state where it had two leaders. NOTE: all of this was
>> >>> done
>> >> with
>> >>> Curator's TestingCluster
>> >>>
>> >>> * Create a 5 node ensemble
>> >>> * Save the list of instances, directories etc.
>> >>> * Wait for quorum
>> >>> * Shut down the cluster
>> >>> * Restart the ensemble with the same ports and directories. However,
>> >>> this time, give different server lists to each instance:
>> >>>     * Instance A -> A D E
>> >>>     * Instance B -> A B C
>> >>>     * Instance C -> A B C
>> >>>     * Instance D -> A D E
>> >>>     * Instance E -> A D E
>> >>>
>> >>> There is at least one common server amongst all of them. When I
>> >>> restart
>> >> the
>> >>> cluster with this configuration I ended up with two leaders. This
>> >>> state
>> >> stays
>> >>> consistent after leader election (i.e. it doesn't try to re-elect).
>> >>>
>> >>> A: following
>> >>> B: leading
>> >>> C: following
>> >>> D: leading
>> >>> E: following
>> >>>
>> >>> This may be the correct behavior. i.e. it may be that ZooKeeper cannot
>> >>> realistically run in this scenario. What it means to me is that
>> >>> rolling
>> >> config
>> >>> changes, if too lax, can create chaos.
>> >>>
>> >>> -Jordan
>> >>>
>> >>> On Jun 14, 2013, at 12:27 PM, "FPJ" <fpjunqueira@yahoo.com> wrote:
>> >>>
>> >>>> In the case I described, the txn is not reflected in the zookeeper
>> >> state.
>> >>>> Say T is a create txn. Once C is elected, it determines the initial
>> >>>> history of txns for the new epoch that is starting and this initial
>> >>>> history is not going to include T.
>> >>>>
>> >>>> In the example below, I was ignoring the client that triggered T,
>> >>>> but since it has been acked by a quorum, the client might as well
>> >>>> have received the confirmation of the operation and think that the
>> >>>> znode has
>> >>> been created.
>> >>>>
>> >>>> -Flavio
>> >>>>
>> >>>>> -----Original Message-----
>> >>>>> From: Jordan Zimmerman [mailto:jordan@jordanzimmerman.com]
>> >>>>> Sent: 14 June 2013 20:16
>> >>>>> To: user@zookeeper.apache.org
>> >>>>> Subject: Re: Rolling config change considered harmful?
>> >>>>>
>> >>>>> Yes - save that I'm not sure what happens with a client when
a
>> >>>>> transaction
>> >>>> is
>> >>>>> lost. What is the error to the client? Or are you referring
to
>> >>>>> internal transactions as part of the leader election?
>> >>>>>
>> >>>>> -JZ
>> >>>>>
>> >>>>> On Jun 14, 2013, at 12:07 PM, "FPJ" <fpjunqueira@yahoo.com>
wrote:
>> >>>>>
>> >>>>>> Not sure if this helps but here is an example:
>> >>>>>>
>> >>>>>> - Txn T is acknowledged by A and B (ensemble is {A, B, C})
>> >>>>>> - Ensemble changes to {B, C, D}
>> >>>>>> - C and D form a quorum and elect C because it has the highest
zxid.
>> >>>>>>
>> >>>>>> C won't have T, so the txn gets lost.
>> >>>>>>
>> >>>>>> Does it make sense?
>> >>>>>>
>> >>>>>> -Flavio
>> >>>>
>> >>>>
>> >>
>> >
>> >
>>
>>

Mime
View raw message