incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timo Nentwig <timo.nent...@toptarif.de>
Subject Re: Quorum: killing 1 out of 3 server kills the cluster (?)
Date Thu, 09 Dec 2010 17:26:23 GMT

On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote:

>> I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), N1
will still remain on another node. Only if both fail, I actually lose data. But apparently
this is not how it works...
> 
> Sure, the data that N1 holds is also on another node and you won't
> lose it by only losing N1.
> But when you do a quorum query, you are saying to Cassandra "Please
> please would you fail my request
> if you can't get a response from 2 nodes". So if only 1 node holding
> the data is up at the moment of the
> query then Cassandra, which is a very polite software, do what you
> asked and fail.

And my application would fall back to ONE. Quorum writes will also fail so I would also use
ONE so that the app stays up. What would I have to do make the data to redistribute when the
broken node is up again? Simply call nodetool repair on it?

> If you want Cassandra to send you an answer with only one node up, use
> CL=ONE (as said by David).
> 
>> 
>>> On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne <sylvain@yakaz.com> wrote:
>>> I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you have
>>> 2 replicas. And since quorum is also 2 with that replication factor,
>>> you cannot lose
>>> a node, otherwise some query will end up as UnavailableException.
>>> 
>>> Again, this is not related to the total number of nodes. Even with 200
>>> nodes, if
>>> you use RF=2, you will have some query that fail (altough much less that what
>>> you are probably seeing).
>>> 
>>> On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig <timo.nentwig@toptarif.de>
wrote:
>>>> 
>>>> On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
>>>> 
>>>>> Quorum is really only useful when RF > 2, since the for a quorum to
>>>>> succeed RF/2+1 replicas must be available.
>>>> 
>>>> 2/2+1==2 and I killed 1 of 3, so... don't get it.
>>>> 
>>>>> This means for RF = 2, consistency levels QUORUM and ALL yield the same
result.
>>>>> 
>>>>> /d
>>>>> 
>>>>> On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig <timo.nentwig@toptarif.de>
wrote:
>>>>>> Hi!
>>>>>> 
>>>>>> I've 3 servers running (0.7rc1) with a replication_factor of 2 and
use quorum for writes. But when I shut down one of them UnavailableExceptions are thrown.
Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it continues with
the remaining 2 nodes and redistributes the data to the broken one as soons as its up again?
>>>>>> 
>>>>>> What may I be doing wrong?
>>>>>> 
>>>>>> thx
>>>>>> tcn
>>>> 
>>>> 
>>> 
>> 
>> 


Mime
View raw message