incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Zhu <wz1...@yahoo.com>
Subject Re: Data not fully replicated with 2 nodes and replication factor 2
Date Thu, 20 Jun 2013 16:53:52 GMT
I don't think you can fully trust hintedhandoff, it's more like "we are trying our best to
deliver it" but no guarantee. Even if the hints are guaranteed to be delivered and there will
be a delay which is supposed to be part of "eventual consistency" paradigm. If you want enforce
real consistency, change your consistency level. Or do a repair. 

Thanks. 
-Wei 

----- Original Message -----

From: "James Lee" <James.Lee@metaswitch.com> 
To: user@cassandra.apache.org, "Wei Zhu" <wz1975@yahoo.com>, rcoli@eventbrite.com 
Sent: Thursday, June 20, 2013 3:21:30 AM 
Subject: RE: Data not fully replicated with 2 nodes and replication factor 2 

Rob, Wei, thank you both for your responses - from what Rob says below my test is a valid
one. 

I've run some additional tests and observed the following: 
-- I mentioned before that some of the initial writes initially failed and then succeed when
the test tool retries them. I've checked that there's no correlation between the keys for
writes which required a retry and the keys for the failed reads (i.e. the reads are failing
for keys that were written fine at the first attempt). 
-- I've retried this test but limiting the rate of initial writes to be much lower (from 8000/s
down to 2000/s). This makes the problem go away completely: no more read failures. 

So it seems like I have exposed a genuine bug in Cassandra replication which manifests under
high write load. What's the best next step - should I be filing a bug report, and if so what
diagnostics are likely to be useful? 

Thanks, 
James Lee 


-----Original Message----- 
From: Robert Coli [mailto:rcoli@eventbrite.com] 
Sent: 19 June 2013 20:59 
To: user@cassandra.apache.org; Wei Zhu 
Subject: Re: Data not fully replicated with 2 nodes and replication factor 2 

On Wed, Jun 19, 2013 at 11:43 AM, Wei Zhu <wz1975@yahoo.com> wrote: 
> I think hints are only stored when the other node is down, not on the 
> dropped mutations. (Correct me if I am wrong, actually it's not a bad 
> idea to store hints for dropped mutations and replay them later?) 

This used to be the way it worked pre-1.0... 

https://issues.apache.org/jira/browse/CASSANDRA-2034 

In modern cassandra, anything but a successful ack from a coordinated write results in a hint
on the coordinator. 

> To solve your issue, as I mentioned, either do nodetool repair, or 
> increase your consistency level. By the way, you probably write 
> faster than your cluster can handle if you see that many dropped mutations. 

If his hints are ultimately delivered, OP should not "need" repair to be consistent. 

=Rob 


Mime
View raw message