zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Multi DC ( DC-1 and DC-2) zookeeper setup
Date Wed, 09 Mar 2016 00:10:14 GMT
On 3/8/2016 3:40 PM, s influxdb wrote:
> How does the client failover to the DC2 if DC1 is down ? Does the services
> registered on DC1 for example with ephemeral nodes have to re-register
> DC2 ?

Even though Flavio and Camille have both said this, I'm not sure whether
the posters on this thread are hearing it:

If you only have two datacenters, you cannot set up a reliable zookeeper
ensemble.  It's simply not possible.  There are NO combinations of
servers that will achieve fault tolerance with only two datacenters.

The reason this won't work is the same reason that you cannot set up a
reliable ensemble with only two servers.  If either data center goes
down, half of your ZK nodes will be gone, and neither data center will
have enough nodes to achieve quorum.

When you have three datacenters that are all capable of directly
reaching each other, you only need one ZK node in each location.  If any
single DC goes down, the other two will be able to keep the ensemble

Data is replicated among the DCs in exactly the same way that it is if
all the servers are in one place.  I don't know enough about internal ZK
operation to comment further.


Some TL;DR information to follow:

If you want to be able to take a node down for maintenance in a multi-DC
situation and *still* survive an entire DC going down, you need three
nodes in each of three data centers -- nine total.  This ensemble is
able to survive any four servers going down, so you can take down a node
in one DC for maintenance, and if one of the other DCs fails entirely,
there will be five functioning servers that can maintain quorum.

Detailed information for the specific situation outlined by Kaushal:

DC-1 1 Leader 2 Followers
DC-2 1 Follower 2 Observers.

A six-node ensemble requires at least operational four nodes to maintain
quorum.  If either of those data centers fails, there are only three
nodes left, which is not enough.


View raw message