cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: RE: UnavailableException with 3 nodes and RF=2
Date Tue, 14 Sep 2010 20:36:04 GMT
For background have a read of http://wiki.apache.org/cassandra/HintedHandoff

As the doc (the one above and Martin :) ) says, CL ONE, QUORUM and ALL only count writes to
nodes that are responsible for the key. Then HH is used to eventually deliver that write to
any nodes that were not available. 

CL.ANY is a lot less consistent and will ack when only a HH is recorded. 

AFAIK you are right that upping the RF to 5 will mean you can lose two nodes *responsible
for the key* and still run a QUORUM write. 

Aaron

On 14 Sep, 2010,at 11:36 PM, Chris Jansen <chris.jansen@cognitomobile.com> wrote:

Thank you Martin, this has cleared things up for me. I thought that a replica would always
be stored on the node I was connecting to, which makes sense as to why the load on each node
is equally balanced.
 
So I could sustain quorum with two node failures if I have a RF=5 or greater.
 
Thanks again.
 
Chris
 
From: Dr. Martin Grabmüller [mailto:Martin.Grabmueller@eleven.de] 
Sent: 14 September 2010 09:54
To: user@cassandra.apache.org
Subject: RE: UnavailableException with 3 nodes and RF=2
 
When you write with QUORUM, RF/2+1 of the nodes cassandra *wants to write*
to have to be up.  In your case, RF/2+1 = 2, that means, the two nodes responsible
for the write have to be up, not any two nodes.  Each write which tries to the node
with token 78502309573904554351249603414557542595  and another node
will fail.
 
QUORUM consistency only gives you more availability when you have a RF of 3 or higher.
 
Martin
From: Chris Jansen [mailto:chris.jansen@cognitomobile.com] 
Sent: Tuesday, September 14, 2010 10:44 AM
To: user@cassandra.apache.org
Subject: UnavailableException with 3 nodes and RF=2

Hi All,
 
I’m a newbie to Cassandra so I could have a configuration issue here, I am using the latest
stable release 0.6.0.
 
I have created a cluster of 3 nodes, a keyspace with RF=2 and a rack unaware replication strategy.
When I write with CL=QUORUM with all 3 nodes commit the data fine, but when I write with the
same CL with one of the nodes down I see an UnavailableException thrown. Surely if one of
the nodes in the cluster is down another should acknowledge the writes and maintain the quorum,
or is there something that I have misunderstood? From what I understand, in this case with
a RF=2 for the quorum writes to succeed I need two nodes to acknowledge the write (RF/2+1),
which I have.
 
Here is how the cluster looks when quorum writes succeed:
 
192.168.245.2 Up         477.33 KB     78502309573904554351249603414557542595    
|<--|
192.168.245.4 Up         426.74 KB     139625953069891725539207365034742863768   
|   |
192.168.245.1 Up         496.67 KB     163572901304139170217093255272499595459   
|-->|
 
This is how it looks with one node down and quorum writes fail (I am writing to 192.168.245.1):
 
192.168.245.2 Down       423.58 KB     78502309573904554351249603414557542595    
|<--|
192.168.245.4 Up         426.74 KB     139625953069891725539207365034742863768   
|   |
192.168.245.1 Up         496.67 KB     163572901304139170217093255272499595459   
|-->|
 
Here is the exception that is thrown:
 
Cannot write: 9e48b039-7687-4b14-9b40-0096b15fd7b0 RETRYING
UnavailableException()
                at org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:12303)
                at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:675)
                at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:648)
                at cassandraclient.Main.writeReadDelete(Main.java:101)
                at cassandraclient.Main.run(Main.java:188)
                at java.lang.Thread.run(Thread.java:619)
 
If I switch CL=ONE the writes succeed, but I don’t know if the data is being replicated.
 
Any help would be greatly appreciated, thanks.
 
Chris Jansen



NOTICE: Cognito Limited. Benham Valence, Newbury, Berkshire, RG20 8LU. UK. Company number
02723032. This e-mail message and any attachment is confidential. It may not be disclosed
to or used by anyone other than the intended recipient. If you have received this e-mail in
error please notify the sender immediately then delete it from your system. Whilst every effort
has been made to check this mail is virus free we accept no responsibility for software viruses
and you should check for viruses before opening any attachments. Opinions, conclusions and
other information in this email and any attachments which do not relate to the official business
of the company are neither given by the company nor endorsed by it.

This email message has been scanned for viruses by Mimecast
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message