cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Riyad Kalla <rka...@gmail.com>
Subject Re: increased RF and repair, not working?
Date Fri, 27 Jul 2012 16:22:34 GMT
Ah!

Yan I think you want your writes to use QUORUM and your reads to just be a
single node right?

If you need/want the read-repair, then I suppose you would need more nodes
up (or deployed in your cluster) but if you are keeping 3 machines a RF of
2 with a write consistency of 2 and a read of 1 should give you good
behavior with what you have now.

On Fri, Jul 27, 2012 at 8:56 AM, Yan Chunlu <springrider@gmail.com> wrote:

> I think Dave is right,  I have read this article again:
> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
>
> I have data on two nodes, and "QUORUM read" means it need read from both
> two nodes.
>
> I guess I need to increase the RF to 3, to make the system can tolerance
> one node failure.
>
> thanks for all of the kind help!
>
> On Fri, Jul 27, 2012 at 8:08 PM, Riyad Kalla <rkalla@gmail.com> wrote:
>
>> Dave, per my understanding of Yan's description he has 3 nodes and took
>> one down manually to test; that should have worked, no?
>>
>>
>> On Thu, Jul 26, 2012 at 11:00 PM, Dave Brosius <dbrosius@mebigfatguy.com>wrote:
>>
>>>  Quorum is defined as
>>>
>>> (replication_factor / 2) + 1
>>> therefore quorum when rf = 2 is 2! so in your case, both nodes must be up.
>>>
>>> Really, using Quorum only starts making sense as a 'quorum' when RF=3
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 07/26/2012 10:38 PM, Yan Chunlu wrote:
>>>
>>> I am using Cassandra 1.0.2, have a 3 nodes cluster. the consistency
>>> level of read & write are  both QUORUM.
>>>
>>>  At first the RF=1, and I figured that one node down will cause the
>>> cluster unusable. so I changed RF to 2, and run nodetool repair on every
>>> node(actually I did it twice).
>>>
>>>  After the operation I think my data should be in at least two nodes,
>>> and it would be okay if one of them is down.
>>>
>>> But when I tried to simulate the failure, by disablegossip of one node,
>>> and the cluster knows this node is down. then access data from the cluster,
>>> it returned  "MaximumRetryException"(pycassa).   as my experiences this is
>>> caused by "UnavailableException", which is means the data it is requesting
>>> is on a node which is down.
>>>
>>>  so I wonder my data might not be replicated right, what should I do?
>>> thanks for the help!
>>>
>>>  here is the keyspace info:
>>>
>>>  *
>>> *
>>>  *Keyspace: comments:*
>>> *  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy*
>>> *  Durable Writes: true*
>>> *    Options: [replication_factor:2]*
>>>
>>>
>>>
>>>  the scheme version is okay:
>>>
>>>  *[default@unknown] describe cluster;*
>>> *Cluster Information:*
>>> *   Snitch: org.apache.cassandra.locator.SimpleSnitch*
>>> *   Partitioner: org.apache.cassandra.dht.RandomPartitioner*
>>> *   Schema versions: *
>>> * f67d0d50-b923-11e1-0000-4f7cf9240aef: [192.168.1.129, 192.168.1.40,
>>> 192.168.1.50]*
>>>
>>>
>>>
>>>  the loads are as below:
>>>
>>>  *nodetool -h localhost ring*
>>> *Address         DC          Rack        Status State   Load
>>>  Owns    Token                                       *
>>> *
>>>          113427455640312821154458202477256070484     *
>>> *192.168.1.50    datacenter1 rack1       Up     Normal  28.77 GB
>>>  33.33%  0                                           *
>>> *192.168.1.40    datacenter1 rack1       Up     Normal  26.67 GB
>>>  33.33%  56713727820156410577229101238628035242      *
>>> *192.168.1.129   datacenter1 rack1       Up     Normal  33.25 GB
>>>  33.33%  113427455640312821154458202477256070484    *
>>>
>>>
>>>
>>
>

Mime
View raw message