zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: Reconfig without quorum
Date Thu, 18 Sep 2014 04:56:48 GMT
Hi Martin,

Yes, reconfig like other ZooKeeper operations works only when there's a
quorum. Although you're saying
that zone 1 failed, it may be the case that the link between zone 1 and
zone 2 failed but the zones themselves are fine. In this case if we allow
the zones to process commands, like reconfig or others, we will end up with
split-brain and loose consistency.

if you're sure that zone 1 is down you could shut down the servers in zone
2, change the configuration files to exclude zone 1 and restart. Note that
when you restart you should bring the servers up in an order that wouldn't
allow a quorum without someone with the latest state. Otherwise you'll
loose data.
Example: zone 1 has participant replicas A, B, C zone 2 has participants D,
E, F. Latest state is on A, B, C, D. Zone 1 fails, you restart zone 2
servers, but E and F come up first. In this case you're likely to loose
latest updates.

Perhaps others can suggest a better solution, but you could consider having
a tie breaker replica somewhere in a third location. Or if you don't need
consistency between the zones you could run 2 separate zookeepers. Does
your application require consistency between zones 1 and 2 ?


On Wed, Sep 17, 2014 at 1:19 PM, Martin Grotzke <
martin.grotzke@googlemail.com> wrote:

> Hi,
> is it true, that the reconfig command that's available since 3.5.0 can only
> be used if there's a quorum?
> Our situation is that we have 2 datacenters (actually only 2 zones within
> the same DC) which will be provisioned equally, so that we'll have an even
> number of ZK nodes (true, not optimal). When 1 zone fails, there won't be a
> quorum any more and ZK will be unavailable - that's my understanding. Is it
> possible to add new nodes to the ZK cluster and achieve a quorum again
> while the failed zone is still unavailable?
> What would you recommend how to handle this situation?
> We're using (going to use) SolrCloud as clients.
> Thanks && cheers,
> Martin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message