zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: zookeeper deployment strategy for multi data centers
Date Fri, 03 Jun 2016 22:23:53 GMT
You can't solve this with weights.
On Jun 3, 2016 6:03 PM, "Michael Han" <hanm@cloudera.com> wrote:

> ZK supports more than just majority quorum rule, there are also weights /
> hierarchy of groups based quorum [1]. So probably one can assign more
> weights to one out of two data center which can form a weight based quorum
> even if another DC is failing?
>
> Another idea is to instead of forming a single ZK ensemble across DCs,
> forming multiple ZK ensembles across DCs with one ensemble per DC. This
> solution might be applicable for heavy read / light write workload while
> providing certain degree of fault tolerance. Some relevant discussions [2].
>
> [1] https://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html (
> weight.x=nnnnn, group.x=nnnnn[:nnnnn]).
> [2]
>
> https://www.quora.com/Has-anyone-deployed-a-ZooKeeper-ensemble-across-data-centers
>
> On Fri, Jun 3, 2016 at 2:16 PM, Alexander Shraer <shralex@gmail.com>
> wrote:
>
> > > Is there any settings to override the quorum rule? Would you know the
> > rationale behind it?
> >
> > The rule comes from a theoretical impossibility saying that you must
> have n
> > > 2f replicas
> > to tolerate f failures, for any algorithm trying to solve consensus while
> > being able to handle
> > periods of asynchrony (unbounded message delays, processing times, etc).
> > The earliest proof is probably here: paper
> > <https://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p225-chandra.pdf
> >.
> > ZooKeeper is assuming this model, so the bound applies
> > to it.
> >
> > The intuition is what's called a 'partition argument'. Essentially if
> only
> > 2f replicas were sufficient, you
> > could arbitrarily divide them into 2 sets of f replicas, and create a
> > situation where each set of f
> > must go on independently without coordinating with the other set (split
> > brain), when the links between the two sets are slow (i.e., a network
> > partition),
> > simply because the other set could also be down (the algorithm tolerates
> f
> > failures) and it can't distinguish the two situations.
> > When n > 2f this can be avoided since one of the sets will have majority
> > while the other set won't.
> >
> > The key here is that the links between the two data centers can
> arbitrarily
> > delay messages, so an automatic
> > 'fail-over' where one data center decides that the other one is down is
> > usually considered unsafe. If in your system
> > you have a reliable way to know that the other data center is really in
> > fact down (this is a synchrony assumption), you could do as Camille
> > suggested and
> > reconfigure the system to only include the remaining data center. This
> > would still be very tricky to do since this reconfiguration
> > would have to involve manually changing configuration files and rebooting
> > servers, while somehow making sure that you're
> > not loosing committed state. So not recommended.
> >
> >
> >
> > On Fri, Jun 3, 2016 at 11:30 PM, Camille Fournier <camille@apache.org>
> > wrote:
> >
> > > 2 servers is the same as 1 server wrt fault tolerance, so yes, you are
> > > correct. If they want fault tolerance, they have to run 3 (or more).
> > >
> > > On Fri, Jun 3, 2016 at 4:25 PM, Shawn Heisey <apache@elyograg.org>
> > wrote:
> > >
> > > > On 6/3/2016 1:44 PM, Nomar Morado wrote:
> > > > > Is there any settings to override the quorum rule? Would you know
> the
> > > > > rationale behind it? Ideally, you will want to operate the
> > application
> > > > > even if at least one data center is up.
> > > >
> > > > I do not know if the quorum rule can be overridden, or whether your
> > > > application can tell the difference between a loss of quorum and
> > > > zookeeper going down entirely.  I really don't know anything about
> > > > zookeeper client code or zookeeper internals.
> > > >
> > > > From what I understand, majority quorum is the only way to be
> > > > *completely* sure that cluster software like SolrCloud or your
> > > > application can handle write operations with confidence that they are
> > > > applied correctly.  If you lose quorum, which will happen if only one
> > DC
> > > > is operational, then your application should go read-only.  This is
> > what
> > > > SolrCloud does.
> > > >
> > > > I am a committer on the Apache Solr project, and Solr uses zookeeper
> > > > when it is running in SolrCloud mode.  The cloud code is handled by
> > > > other people -- I don't know much about it.
> > > >
> > > > I joined this list because I wanted to have the ZK devs include a
> > > > clarification in zookeeper documentation -- oddly enough, related to
> > the
> > > > very thing we are discussing.  I wanted to be sure that the
> > > > documentation explicitly mentioned that three serversare required
> for a
> > > > fault-tolerant setup.  Some SolrCloud users don't want to accept this
> > as
> > > > a fact, and believe that two servers should be enough.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> >
>
>
>
> --
> Cheers
> Michael.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message