zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: zookeeper deployment strategy for multi data centers
Date Fri, 03 Jun 2016 21:16:22 GMT
> Is there any settings to override the quorum rule? Would you know the
rationale behind it?

The rule comes from a theoretical impossibility saying that you must have n
> 2f replicas
to tolerate f failures, for any algorithm trying to solve consensus while
being able to handle
periods of asynchrony (unbounded message delays, processing times, etc).
The earliest proof is probably here: paper
<https://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p225-chandra.pdf>.
ZooKeeper is assuming this model, so the bound applies
to it.

The intuition is what's called a 'partition argument'. Essentially if only
2f replicas were sufficient, you
could arbitrarily divide them into 2 sets of f replicas, and create a
situation where each set of f
must go on independently without coordinating with the other set (split
brain), when the links between the two sets are slow (i.e., a network
partition),
simply because the other set could also be down (the algorithm tolerates f
failures) and it can't distinguish the two situations.
When n > 2f this can be avoided since one of the sets will have majority
while the other set won't.

The key here is that the links between the two data centers can arbitrarily
delay messages, so an automatic
'fail-over' where one data center decides that the other one is down is
usually considered unsafe. If in your system
you have a reliable way to know that the other data center is really in
fact down (this is a synchrony assumption), you could do as Camille
suggested and
reconfigure the system to only include the remaining data center. This
would still be very tricky to do since this reconfiguration
would have to involve manually changing configuration files and rebooting
servers, while somehow making sure that you're
not loosing committed state. So not recommended.



On Fri, Jun 3, 2016 at 11:30 PM, Camille Fournier <camille@apache.org>
wrote:

> 2 servers is the same as 1 server wrt fault tolerance, so yes, you are
> correct. If they want fault tolerance, they have to run 3 (or more).
>
> On Fri, Jun 3, 2016 at 4:25 PM, Shawn Heisey <apache@elyograg.org> wrote:
>
> > On 6/3/2016 1:44 PM, Nomar Morado wrote:
> > > Is there any settings to override the quorum rule? Would you know the
> > > rationale behind it? Ideally, you will want to operate the application
> > > even if at least one data center is up.
> >
> > I do not know if the quorum rule can be overridden, or whether your
> > application can tell the difference between a loss of quorum and
> > zookeeper going down entirely.  I really don't know anything about
> > zookeeper client code or zookeeper internals.
> >
> > From what I understand, majority quorum is the only way to be
> > *completely* sure that cluster software like SolrCloud or your
> > application can handle write operations with confidence that they are
> > applied correctly.  If you lose quorum, which will happen if only one DC
> > is operational, then your application should go read-only.  This is what
> > SolrCloud does.
> >
> > I am a committer on the Apache Solr project, and Solr uses zookeeper
> > when it is running in SolrCloud mode.  The cloud code is handled by
> > other people -- I don't know much about it.
> >
> > I joined this list because I wanted to have the ZK devs include a
> > clarification in zookeeper documentation -- oddly enough, related to the
> > very thing we are discussing.  I wanted to be sure that the
> > documentation explicitly mentioned that three serversare required for a
> > fault-tolerant setup.  Some SolrCloud users don't want to accept this as
> > a fact, and believe that two servers should be enough.
> >
> > Thanks,
> > Shawn
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message