zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Bower <smb-apa...@alcyon.net>
Subject Re: Multi-facility Ensemble
Date Tue, 27 May 2014 03:16:50 GMT
This is basically what I've seen.. I setup a test with 3 VMs and then used
iptables to block/unblock traffic between the VMs.. it seems that the state
after a split is highly dependent of the state before the split..

Is there a document somewhere that describes all of the conditions required
to elect a leader? eg what needs to be connected to what.. couldn't find
this on the web..



On Fri, May 23, 2014 at 12:52 PM, Camille Fournier <camille@apache.org>wrote:

> Well, if A can't talk to C but B can talk to both, it kind of depends on
> what the state was before the partition, and then what happens after the
> partition.
> If the leader is in A, all of the members of C will go into disconnected
> state, but may also try to become leader since they can talk to B. You
> might see some weird thrashing of election state etc.
> If the leader is in B you might be fine but honestly I've never tested that
> so far as I can recall. Really, if one site loses contact with one or more
> others, you probably just want to kill all the connections in that site
> until connectivity comes back.
> Best thing to do if faced with this question is to actually run a test that
> simulates it since these things always have a ton of nuance; it is unlikely
> that you will lose any data (the basic rules of the protocol account for
> this fairly well), but the performance might degrade in an unexpected way.
> I think it could happen, in a very bad case, where quorum is made with A to
> B, then flips to C to B due to network whatever, and data gets truncated.
> I would put pretty aggressive monitoring around this if I were implementing
> such a situation and kill one of the partitions if it happened, given the
> byzantine nature of the edge cases.
> C
> On Wed, May 21, 2014 at 11:36 PM, Steven Bower <smb-apache@alcyon.net
> >wrote:
> > I am contemplating setting up a zookeeper ensemble across multiple
> > facilities. I know the docs warn against multi-facility emsembles, but
> for
> > the sake of discussion can we assume that all are connected with the same
> > reliability/performance you'd expect if they were all in the same LAN.
> >
> > Imagine a ensemble with three facilities (A, B and C). Within each
> facility
> > there are 3 instances of zookeeper. So total 9 members of the ensemble
> > which gives us quorum at 5 instances. All facilities are connected with
> > point-to-point connections between each other (by point-to-point i'm
> > implying that if the connection between A and C went down that A could
> not
> > talk to C via B).
> >
> >
> > With this environment what behaviors would I see if for example the link
> > between A and B went down?
> >
> > Any other recommendations?
> >
> > thanks,
> >
> > steve
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message