zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: zookeeper deployment strategy for multi data centers
Date Fri, 03 Jun 2016 22:54:43 GMT
You don't need weights to run this cluster successfully across 2
datacenters, unless you want to run with 4 live read/write nodes which
isn't really a recommended setup (we advise odd numbers because running
with even numbers doesn't generally buy you anything).

I would probably run 3 voting members, 1 observer if you want to run 4
nodes. In that setup you can lose any one voting node, and of course the
observer, and be fine. If you lose 2 voting nodes, whether in the same DC
or x-DC, you will not be able to continue. But votes only need to be acked
by any 2 servers to be committed.

In the case of weights and 4 servers, you will either need to ack both of
the servers in the weighted datacenter or the 2 in the unweighted DC and
one in the weighted DC.

I've actually yet to see the killer app for using hierarchy and weights
although I'd be interested in hearing about it if someone has an example.
It's not clear that there's a huge value here unless the observer is
significantly less effective than a full r/w quorum member which would be
surprising.

C


On Fri, Jun 3, 2016 at 6:33 PM, Dan Benediktson <
dbenediktson@twitter.com.invalid> wrote:

> Weights will at least let you do better: if you weight it, you can make it
> so that datacenter A will survive even if datacenter B goes down, but not
> the other way around. While not ideal, it's probably better than the
> non-weighted alternative. (2, 2, 1, 1) weights might work fairly well - as
> long as any three machines are up, or both machines in the preferred
> datacenter, quorum can be achieved.
>
> On Fri, Jun 3, 2016 at 3:23 PM, Camille Fournier <camille@apache.org>
> wrote:
>
> > You can't solve this with weights.
> > On Jun 3, 2016 6:03 PM, "Michael Han" <hanm@cloudera.com> wrote:
> >
> > > ZK supports more than just majority quorum rule, there are also
> weights /
> > > hierarchy of groups based quorum [1]. So probably one can assign more
> > > weights to one out of two data center which can form a weight based
> > quorum
> > > even if another DC is failing?
> > >
> > > Another idea is to instead of forming a single ZK ensemble across DCs,
> > > forming multiple ZK ensembles across DCs with one ensemble per DC. This
> > > solution might be applicable for heavy read / light write workload
> while
> > > providing certain degree of fault tolerance. Some relevant discussions
> > [2].
> > >
> > > [1] https://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html (
> > > weight.x=nnnnn, group.x=nnnnn[:nnnnn]).
> > > [2]
> > >
> > >
> >
> https://www.quora.com/Has-anyone-deployed-a-ZooKeeper-ensemble-across-data-centers
> > >
> > > On Fri, Jun 3, 2016 at 2:16 PM, Alexander Shraer <shralex@gmail.com>
> > > wrote:
> > >
> > > > > Is there any settings to override the quorum rule? Would you know
> the
> > > > rationale behind it?
> > > >
> > > > The rule comes from a theoretical impossibility saying that you must
> > > have n
> > > > > 2f replicas
> > > > to tolerate f failures, for any algorithm trying to solve consensus
> > while
> > > > being able to handle
> > > > periods of asynchrony (unbounded message delays, processing times,
> > etc).
> > > > The earliest proof is probably here: paper
> > > > <
> > https://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p225-chandra.pdf
> > > >.
> > > > ZooKeeper is assuming this model, so the bound applies
> > > > to it.
> > > >
> > > > The intuition is what's called a 'partition argument'. Essentially if
> > > only
> > > > 2f replicas were sufficient, you
> > > > could arbitrarily divide them into 2 sets of f replicas, and create a
> > > > situation where each set of f
> > > > must go on independently without coordinating with the other set
> (split
> > > > brain), when the links between the two sets are slow (i.e., a network
> > > > partition),
> > > > simply because the other set could also be down (the algorithm
> > tolerates
> > > f
> > > > failures) and it can't distinguish the two situations.
> > > > When n > 2f this can be avoided since one of the sets will have
> > majority
> > > > while the other set won't.
> > > >
> > > > The key here is that the links between the two data centers can
> > > arbitrarily
> > > > delay messages, so an automatic
> > > > 'fail-over' where one data center decides that the other one is down
> is
> > > > usually considered unsafe. If in your system
> > > > you have a reliable way to know that the other data center is really
> in
> > > > fact down (this is a synchrony assumption), you could do as Camille
> > > > suggested and
> > > > reconfigure the system to only include the remaining data center.
> This
> > > > would still be very tricky to do since this reconfiguration
> > > > would have to involve manually changing configuration files and
> > rebooting
> > > > servers, while somehow making sure that you're
> > > > not loosing committed state. So not recommended.
> > > >
> > > >
> > > >
> > > > On Fri, Jun 3, 2016 at 11:30 PM, Camille Fournier <
> camille@apache.org>
> > > > wrote:
> > > >
> > > > > 2 servers is the same as 1 server wrt fault tolerance, so yes, you
> > are
> > > > > correct. If they want fault tolerance, they have to run 3 (or
> more).
> > > > >
> > > > > On Fri, Jun 3, 2016 at 4:25 PM, Shawn Heisey <apache@elyograg.org>
> > > > wrote:
> > > > >
> > > > > > On 6/3/2016 1:44 PM, Nomar Morado wrote:
> > > > > > > Is there any settings to override the quorum rule? Would
you
> know
> > > the
> > > > > > > rationale behind it? Ideally, you will want to operate
the
> > > > application
> > > > > > > even if at least one data center is up.
> > > > > >
> > > > > > I do not know if the quorum rule can be overridden, or whether
> your
> > > > > > application can tell the difference between a loss of quorum
and
> > > > > > zookeeper going down entirely.  I really don't know anything
> about
> > > > > > zookeeper client code or zookeeper internals.
> > > > > >
> > > > > > From what I understand, majority quorum is the only way to be
> > > > > > *completely* sure that cluster software like SolrCloud or your
> > > > > > application can handle write operations with confidence that
they
> > are
> > > > > > applied correctly.  If you lose quorum, which will happen if
only
> > one
> > > > DC
> > > > > > is operational, then your application should go read-only. 
This
> is
> > > > what
> > > > > > SolrCloud does.
> > > > > >
> > > > > > I am a committer on the Apache Solr project, and Solr uses
> > zookeeper
> > > > > > when it is running in SolrCloud mode.  The cloud code is handled
> by
> > > > > > other people -- I don't know much about it.
> > > > > >
> > > > > > I joined this list because I wanted to have the ZK devs include
a
> > > > > > clarification in zookeeper documentation -- oddly enough, related
> > to
> > > > the
> > > > > > very thing we are discussing.  I wanted to be sure that the
> > > > > > documentation explicitly mentioned that three serversare required
> > > for a
> > > > > > fault-tolerant setup.  Some SolrCloud users don't want to accept
> > this
> > > > as
> > > > > > a fact, and believe that two servers should be enough.
> > > > > >
> > > > > > Thanks,
> > > > > > Shawn
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Cheers
> > > Michael.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message