zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: Running Zookeeper in 2 machines
Date Tue, 05 Nov 2013 22:17:28 GMT
I don't think reconfiguration will help you here as it requires a
quorum of the old and a quorum of the new ensembles, and here you're
missing a quorum of the old one.

The problem is that you may have some committed operations on the B
servers that A doesn't know about (writes are done to a quorum).
Moreover, B may just be slow and may be still operational.

To solve the problem here I think you either need a tie breaker, a
reliable failure detection mechanism (such as when you're manually
doing this because you're sure that B is down) or some kind of
stronger synchrony assumptions (e.g., if A didn't hear from B for 3
sec it means that B has crashed), this is something that ZK doesn't do
to be more robust to network delays.

Since this scenario seems very common It may be interesting to
implement some kind of a tie breaker quorum system in zookeeper.


On Tue, Nov 5, 2013 at 12:44 PM, Cameron McKenzie
<mckenzie.cam@gmail.com> wrote:
> I have a similar problem to you. I have more than 2 machines, but only 2
> geographically redundant sites.
> In your situation, you could get some redundancy by running 2 instances on
> one host, and 1 instance on the other host. This would protect you from
> temporary network glitches (because the machine with 2 instances can still
> form a quorum), and will protect you from failure of the machine with the
> single instance. It will not help you if the machine with 2 instances
> crashes.
> In this situation, where the 2 instance machine dies, you can temporarily
> configure the 1 instance machine to be a single instance cluster, and then
> when the 2 instance machine is recovered, you can reconfigure the single
> instance machine to be part of the 3 instance cluster again. This process
> is manual, and slightly dangerous, because if you restart nodes in the
> wrong order, you have potential to lose data. This is the approach that I
> have tested and seems to work, but I'd recommend testing it also.
> Machine A has ZK instance 1
> Machine B has ZK instances 2 and 3
> Machine B dies
> Reconfigure ZK instance 1 so that it only has itself in the cluster. This
> means that there is no redundancy at this point, but it can form a quorum
> as its the only instance in the cluster.
> Restart ZK instance 1 to pickup config changes
> Fix up Machine B
> Reconfigure ZK 1 instance to have ZK instances 2 and 3 in its configuration
> Restart ZK instance 1 to pickup config changes
> Start ZK instance 2 on Machine B.
> Wait for ZK instance 1 on Machine A and ZK instance 2 on machine B form a
> quorum. This is vitally important. If you start instance 3 before a quorum
> is formed it is possible that instances 2 and 3 will form a quorum. This
> will cause any updates that have occurred via instance 1 during the outage
> of Machine B to be lost.
> Start ZK instance 3 on Machine B
> This process should become easier once dynamic reconfiguration is
> implemented (in ZK 3.5 I believe?) because restarts won't be required.
> cheers
> Cam
> On Tue, Nov 5, 2013 at 6:05 PM, erolagnab <trung.n.k@gmail.com> wrote:
>> Thanks, I got the idea now. So is it fair to say that it is not possible to
>> create ZK cluster providing some redundancy with 2 physical machines? If
>> so,
>> is there a way to make it happen?
>> --
>> View this message in context:
>> http://zookeeper-user.578899.n2.nabble.com/Running-Zookeeper-in-2-machines-tp7579232p7579237.html
>> Sent from the zookeeper-user mailing list archive at Nabble.com.

View raw message