zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From German Blanco <german.blanco.bla...@gmail.com>
Subject Re: Rolling config change considered harmful?
Date Sat, 15 Jun 2013 07:25:24 GMT
Thanks a lot Alex!
Excellent explanation :-)


On Sat, Jun 15, 2013 at 7:01 AM, Alexander Shraer <shralex@gmail.com> wrote:

> Hi German,
>
> During normal operation ZK guarantees that a quorum (any majority) of
> the ZK ensemble has all operations that may have been committed.
>
> So without dynamic reconfiguration you should ensure that when you're
> changing the ensemble, any possible quorum of the new ensemble
> necessarily intersects with any quorum of the 'old' ensemble.
>
> If you add D and E right away this property may not be guaranteed,
> since a new quorum (3 out of 5) can be for example C, D, E. Whereas
> its possible that only A and B have the latest state, so it'll be
> lost.
>
> To ensure this when going from 3 to 5 servers you should probably do 2
> transitions. First add
> server D. Any majority here, i.e., any 3 servers out of 4, will
> necessarily contain 2 servers from the original ensemble (A, B, C).
> So at least one server in any new quorum actually has the latest state.
>
> Then add E. A quorum here is any 3 out of 5 servers, so even if some
> quorum includes E and the server from (A, B, C, D) that
> didn't have the latest state, we still have 1 server in the quorum
> that does have the latest state, so we're fine...
>
> As long as you add servers one by one and wait for leader election to
> complete in every stage, or preserve the quorum intersection property
> in some other way, it should be safe. But with dynamic reconfig you
> don't need to do that and no reboots necessary of course.
>
> Alex
>
> On Fri, Jun 14, 2013 at 9:14 PM, German Blanco
> <german.blanco.blanco@gmail.com> wrote:
> > Hello,
> >
> > Could you please clarify if this thread is about a rolling start in an
> > ensemble without the dynamic reconfiguration support?
> > And when you say "Create a 5 node ensemble", that means quorum is 5. But
> > then you give server lists with only 3 servers in each node?
> > If the server list has 3 servers, then quorum is actually 3 and what is
> > described may happen in that scenario.
> > In that case C follows B, E follows D and A follows either B or D and
> there
> > are two working ensembles.
> > It should be possible to create problems, even with more standard
> > configuration changes:
> > If we want to change a quorum of three to a quorum of five {A,B,C} to
> > {A,B,C,D,E}:
> > - First the configuration is changed in all the nodes, but they are not
> > restarted. Only A, B and C are running.
> > - One of them is stopped (e.g. A).
> > - At this point, if A, D and E are started with the new configuration,
> they
> > may elect a leader before any of them is aware of either B or C, form an
> > ensemble and start serving txns.
> > - However, if A is started, we wait until it connects to the leader of B
> > and C, and then D and E are started and then B and C are restarted,
> > everything should be ok. The fact that this depends on the human ability
> to
> > start D and E while A,B and C are connected to the ensemble seems a bit
> > risky though.
> > I have found a presentation on the topic:
> >
> http://www.slideshare.net/Hadoop_Summit/dynamic-reconfiguration-of-zookeeper
> >
> > If anybody knows of a safer way to change a quorum of 3 to a quorum of 5
> > with e.g. zookeeper 3.4.5, please point it out.
> >
> > Regards,
> >
> > Germán.
> >
> >
> > On Fri, Jun 14, 2013 at 11:46 PM, Jordan Zimmerman <
> > jordan@jordanzimmerman.com> wrote:
> >
> >> I got the test cluster into the state described with 2 leaders. I then
> >> allocated 100 Curator clients to write nodes "/n" where n is the index
> >> (i.e. "/0", "/1", …). The idea that the nodes would be distributed
> around
> >> the cluster instances. I then allocated a single Curator instance
> dedicated
> >> to one of the servers instance, did a sync, and did an exists() to
> verify
> >> that each cluster instances had all the nodes. For the 2 leader cluster,
> >> this fails.
> >>
> >> -JZ
> >>
> >> On Jun 14, 2013, at 1:54 PM, "FPJ" <fpjunqueira@yahoo.com> wrote:
> >>
> >> > I messed up the last sentence, here is what I was trying to say:
> >> >
> >> > It is ok to have two servers thinking they are leaders as long as only
> >> one
> >> > is
> >> > able to commit txns at a time by having a quorum of supporters. Each
> >> server
> >> > is going to follow a single leader, so I don't see a problem in your
> >> > scenario
> >> > with the information you provided. Now if you tell me that when you
> keep
> >> > sending new transactions to those leaders, both keep committing new
> >> > transactions (not the same txns), then we have a problem. I don't see
> how
> >> > this can happen, though.
> >> >
> >> > Also, one of the leaders should eventually time out and go back to
> leader
> >> > election.
> >> >
> >> >> -----Original Message-----
> >> >> From: FPJ [mailto:fpjunqueira@yahoo.com]
> >> >> Sent: 14 June 2013 21:44
> >> >> To: user@zookeeper.apache.org
> >> >> Subject: RE: Rolling config change considered harmful?
> >> >>
> >> >> It is ok to have two servers thinking they are leaders as long as
> only
> >> one
> >> > is
> >> >> able to commit txns at a time by having a quorum of supporters. Each
> >> > server
> >> >> is going to follow a single leader, so I don't see a problem in your
> >> > scenario
> >> >> with the information you provided. Now if you tell me that when you
> keep
> >> >> sending new transactions to those leaders and they keep committing
> them
> >> >> forever, both keep committing new transactions, then we have a
> problem.
> >> I
> >> >> don't see how this can happen, though.
> >> >>
> >> >> Also, one of the leaders should eventually time out and go back to
> >> leader
> >> >> election.
> >> >>
> >> >> -Flavio
> >> >>
> >> >>> -----Original Message-----
> >> >>> From: Jordan Zimmerman [mailto:jordan@jordanzimmerman.com]
> >> >>> Sent: 14 June 2013 21:10
> >> >>> To: user@zookeeper.apache.org
> >> >>> Subject: Re: Rolling config change considered harmful?
> >> >>>
> >> >>> More on this.
> >> >>>
> >> >>> I just did some testing with wholly contrived scenarios and I was
> able
> >> >>> to
> >> >> get a
> >> >>> cluster in a state where it had two leaders. NOTE: all of this
was
> >> >>> done
> >> >> with
> >> >>> Curator's TestingCluster
> >> >>>
> >> >>> * Create a 5 node ensemble
> >> >>> * Save the list of instances, directories etc.
> >> >>> * Wait for quorum
> >> >>> * Shut down the cluster
> >> >>> * Restart the ensemble with the same ports and directories. However,
> >> >>> this time, give different server lists to each instance:
> >> >>>     * Instance A -> A D E
> >> >>>     * Instance B -> A B C
> >> >>>     * Instance C -> A B C
> >> >>>     * Instance D -> A D E
> >> >>>     * Instance E -> A D E
> >> >>>
> >> >>> There is at least one common server amongst all of them. When I
> >> >>> restart
> >> >> the
> >> >>> cluster with this configuration I ended up with two leaders. This
> >> >>> state
> >> >> stays
> >> >>> consistent after leader election (i.e. it doesn't try to re-elect).
> >> >>>
> >> >>> A: following
> >> >>> B: leading
> >> >>> C: following
> >> >>> D: leading
> >> >>> E: following
> >> >>>
> >> >>> This may be the correct behavior. i.e. it may be that ZooKeeper
> cannot
> >> >>> realistically run in this scenario. What it means to me is that
> >> >>> rolling
> >> >> config
> >> >>> changes, if too lax, can create chaos.
> >> >>>
> >> >>> -Jordan
> >> >>>
> >> >>> On Jun 14, 2013, at 12:27 PM, "FPJ" <fpjunqueira@yahoo.com>
wrote:
> >> >>>
> >> >>>> In the case I described, the txn is not reflected in the zookeeper
> >> >> state.
> >> >>>> Say T is a create txn. Once C is elected, it determines the
initial
> >> >>>> history of txns for the new epoch that is starting and this
initial
> >> >>>> history is not going to include T.
> >> >>>>
> >> >>>> In the example below, I was ignoring the client that triggered
T,
> >> >>>> but since it has been acked by a quorum, the client might as
well
> >> >>>> have received the confirmation of the operation and think that
the
> >> >>>> znode has
> >> >>> been created.
> >> >>>>
> >> >>>> -Flavio
> >> >>>>
> >> >>>>> -----Original Message-----
> >> >>>>> From: Jordan Zimmerman [mailto:jordan@jordanzimmerman.com]
> >> >>>>> Sent: 14 June 2013 20:16
> >> >>>>> To: user@zookeeper.apache.org
> >> >>>>> Subject: Re: Rolling config change considered harmful?
> >> >>>>>
> >> >>>>> Yes - save that I'm not sure what happens with a client
when a
> >> >>>>> transaction
> >> >>>> is
> >> >>>>> lost. What is the error to the client? Or are you referring
to
> >> >>>>> internal transactions as part of the leader election?
> >> >>>>>
> >> >>>>> -JZ
> >> >>>>>
> >> >>>>> On Jun 14, 2013, at 12:07 PM, "FPJ" <fpjunqueira@yahoo.com>
> wrote:
> >> >>>>>
> >> >>>>>> Not sure if this helps but here is an example:
> >> >>>>>>
> >> >>>>>> - Txn T is acknowledged by A and B (ensemble is {A,
B, C})
> >> >>>>>> - Ensemble changes to {B, C, D}
> >> >>>>>> - C and D form a quorum and elect C because it has
the highest
> zxid.
> >> >>>>>>
> >> >>>>>> C won't have T, so the txn gets lost.
> >> >>>>>>
> >> >>>>>> Does it make sense?
> >> >>>>>>
> >> >>>>>> -Flavio
> >> >>>>
> >> >>>>
> >> >>
> >> >
> >> >
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message