zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergei Babovich <sbabov...@demandware.com>
Subject Re: Achieving quorum with only half of the nodes
Date Thu, 15 Jul 2010 17:30:24 GMT
Thanks, Flavio, I appreciate your feedback.
Three power sources obviously would solve the problem. Unfortunately at 
this moment it does not seem to be feasible (we will need to rebuild the 
whole existing infrastructure). This is the main reason why I am 
exploring possible alternative (besides that ZK ideally fits our 
needs).  EC2 is also, even theoretically possible, but as you noted, 
very shaky solution at best.
The other possibility I see that might work is to dynamically adjust 
quorum rules in case of failure detected. Let's say if we detected 
failure of the half of the servers (manually or automatically) we can 
notify alive nodes to adjust quorum policy by excluding dead nodes votes 
(of course we need to make sure that dead nodes are dead - we can kill 
processes). Basically it means that we need to reconfigure cluster on 
the fly. Obviously it also complicates recovery. Any opinion, input, 
ideas on this approach?
Please do not think that I am stubborn with looking for a solution here. 
The thing that I would hate most is to give up on ZK (which otherwise is 
ideal for us) just because of these limitations.

On 07/15/2010 12:26 PM, Flavio Junqueira wrote:
> Your EC2 suggestion sounds reasonable. If your deployment is able to 
> form a local quorum most of the time, then you would be able to get a 
> quorum of acks most of the time.
>
> One concern is that the EC2 replica might lag behind badly, which may 
> force the leader to either slow down or to drop the connection to the 
> EC2 follower, assuming that EC2 server is not the leader itself.
>
> It might not be a possibility for you, but Ideally, you could have 
> three power sources, and have three sets of servers. We could then 
> tolerate the failure of one power source with the mechanisms we have 
> currently implemented.
>
> -Flavio
>
> On Jul 14, 2010, at 11:16 PM, Sergei Babovich wrote:
>
>> Thanks, Flavio,
>> Yep... I see. This is a problem. Any better idea?
>> As an alternative option we could probably consider running single ZK
>> node on EC2 - only in order to handle this specific case. Does it make
>> sense to you? Is it feasible? Would it result in considerable
>> performance impact due to network latency? I hope that at least in
>> theory since quorum can be reached without ack from EC2 node performance
>> impact might be manageable.
>>
>> Regards,
>> Sergei
>>
>> On 07/14/2010 04:52 PM, Flavio Junqueira wrote:
>>> Hi Sergei, I'm not sure what the implementation of QuorumVerifier you
>>> have in mind would look like to make your setting work. Even if you
>>> don't have partitions, variation in message delays can cause
>>> inconsistencies in your ZooKeeper cluster. Keep in mind that we make
>>> the assumption that quorums intersect.
>>>
>>> -Flavio
>>>
>>> On Jul 14, 2010, at 9:43 PM, Sergei Babovich wrote:
>>>
>>>> Hi,
>>>> We are currently evaluating use of ZK in our infrastructure. In our
>>>> setup we have a set of servers running from two different power feeds.
>>>> If one power feed goes away so does half of the servers. This makes
>>>> problematic to configure ZK ensemble that would tolerate such outage.
>>>> The network partitioning is not an issue in our case. The only solution
>>>> I come up with so far is to provide custom QuorumVerifier that will add
>>>> a little premium in case if all servers in the quorum set are from the
>>>> same group. Basically if we have only half of votes but all of them
>>>> belong to the same group then we decide to have a quorum.
>>>> Any ideas or better solutions are very appreciated. Sorry if this has
>>>> been already discussed/answered.
>>>>
>>>> Regards,
>>>> Sergei
>>>> This e-mail message and all attachments transmitted with it may
>>>> contain privileged and/or confidential information intended solely
>>>> for the use of the addressee(s). If the reader of this message is not
>>>> the intended recipient, you are hereby notified that any reading,
>>>> dissemination, distribution, copying, forwarding or other use of this
>>>> message or its attachments is strictly prohibited. If you have
>>>> received this message in error, please notify the sender immediately
>>>> and delete this message, all attachments and all copies and backups
>>>> thereof.
>>>>
>>>
>>> *flavio*
>>> *junqueira*
>>>
>>> research scientist
>>>
>>> fpj@yahoo-inc.com <mailto:fpj@yahoo-inc.com>
>>> direct +34 93-183-8828
>>>
>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>> phone (408) 349 3300 fax (408) 349 3301
>>>
>>>
>>
>>
>>
>> This e-mail message and all attachments transmitted with it may 
>> contain privileged and/or confidential information intended solely 
>> for the use of the addressee(s). If the reader of this message is not 
>> the intended recipient, you are hereby notified that any reading, 
>> dissemination, distribution, copying, forwarding or other use of this 
>> message or its attachments is strictly prohibited. If you have 
>> received this message in error, please notify the sender immediately 
>> and delete this message, all attachments and all copies and backups 
>> thereof.
>
> *flavio*
> *junqueira*
>
> research scientist
>
> fpj@yahoo-inc.com <mailto:fpj@yahoo-inc.com>
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300 fax (408) 349 3301
>
>



This e-mail message and all attachments transmitted with it may contain privileged and/or
confidential information intended solely for the use of the addressee(s). If the reader of
this message is not the intended recipient, you are hereby notified that any reading, dissemination,
distribution, copying, forwarding or other use of this message or its attachments is strictly
prohibited. If you have received this message in error, please notify the sender immediately
and delete this message, all attachments and all copies and backups thereof.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message