incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@yakaz.com>
Subject Re: Quorum: killing 1 out of 3 server kills the cluster (?)
Date Thu, 09 Dec 2010 17:46:39 GMT
> And my application would fall back to ONE. Quorum writes will also fail so I would also
use ONE so that the app stays up. What would I have to do make the data to redistribute when
the broken node is up again? Simply call nodetool repair on it?

There is 3 mechanisms for that:
  - hinted handoff: basically, when the node is back up, the other
node will send him what he missed.
  - read-repair: whenever you read a data and an inconsistency is
detected (because one node is not up to date), it gets repaired.
  - calling nodetool repair

The two first are automatic, you have nothing to do.
Nodetool repair is usually run only periodically (say once a week) to
repair some cold data that wasn't dealt with by
the two first mechanisms.

--
Sylvain

>
>> If you want Cassandra to send you an answer with only one node up, use
>> CL=ONE (as said by David).
>>
>>>
>>>> On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne <sylvain@yakaz.com>
wrote:
>>>> I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you
have
>>>> 2 replicas. And since quorum is also 2 with that replication factor,
>>>> you cannot lose
>>>> a node, otherwise some query will end up as UnavailableException.
>>>>
>>>> Again, this is not related to the total number of nodes. Even with 200
>>>> nodes, if
>>>> you use RF=2, you will have some query that fail (altough much less that
what
>>>> you are probably seeing).
>>>>
>>>> On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig <timo.nentwig@toptarif.de>
wrote:
>>>>>
>>>>> On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
>>>>>
>>>>>> Quorum is really only useful when RF > 2, since the for a quorum
to
>>>>>> succeed RF/2+1 replicas must be available.
>>>>>
>>>>> 2/2+1==2 and I killed 1 of 3, so... don't get it.
>>>>>
>>>>>> This means for RF = 2, consistency levels QUORUM and ALL yield the
same result.
>>>>>>
>>>>>> /d
>>>>>>
>>>>>> On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig <timo.nentwig@toptarif.de>
wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> I've 3 servers running (0.7rc1) with a replication_factor of
2 and use quorum for writes. But when I shut down one of them UnavailableExceptions are thrown.
Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it continues with
the remaining 2 nodes and redistributes the data to the broken one as soons as its up again?
>>>>>>>
>>>>>>> What may I be doing wrong?
>>>>>>>
>>>>>>> thx
>>>>>>> tcn
>>>>>
>>>>>
>>>>
>>>
>>>
>
>

Mime
View raw message