zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: Multi-facility Ensemble
Date Fri, 23 May 2014 16:52:58 GMT
Well, if A can't talk to C but B can talk to both, it kind of depends on
what the state was before the partition, and then what happens after the
partition.
If the leader is in A, all of the members of C will go into disconnected
state, but may also try to become leader since they can talk to B. You
might see some weird thrashing of election state etc.
If the leader is in B you might be fine but honestly I've never tested that
so far as I can recall. Really, if one site loses contact with one or more
others, you probably just want to kill all the connections in that site
until connectivity comes back.
Best thing to do if faced with this question is to actually run a test that
simulates it since these things always have a ton of nuance; it is unlikely
that you will lose any data (the basic rules of the protocol account for
this fairly well), but the performance might degrade in an unexpected way.
I think it could happen, in a very bad case, where quorum is made with A to
B, then flips to C to B due to network whatever, and data gets truncated.
I would put pretty aggressive monitoring around this if I were implementing
such a situation and kill one of the partitions if it happened, given the
byzantine nature of the edge cases.

C


On Wed, May 21, 2014 at 11:36 PM, Steven Bower <smb-apache@alcyon.net>wrote:

> I am contemplating setting up a zookeeper ensemble across multiple
> facilities. I know the docs warn against multi-facility emsembles, but for
> the sake of discussion can we assume that all are connected with the same
> reliability/performance you'd expect if they were all in the same LAN.
>
> Imagine a ensemble with three facilities (A, B and C). Within each facility
> there are 3 instances of zookeeper. So total 9 members of the ensemble
> which gives us quorum at 5 instances. All facilities are connected with
> point-to-point connections between each other (by point-to-point i'm
> implying that if the connection between A and C went down that A could not
> talk to C via B).
>
>
> With this environment what behaviors would I see if for example the link
> between A and B went down?
>
> Any other recommendations?
>
> thanks,
>
> steve
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message