incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Lee <James....@metaswitch.com>
Subject RE: Data not fully replicated with 2 nodes and replication factor 2
Date Thu, 20 Jun 2013 10:21:30 GMT
Rob, Wei, thank you both for your responses - from what Rob says below my test is a valid one.

I've run some additional tests and observed the following:
-- I mentioned before that some of the initial writes initially failed and then succeed when
the test tool retries them.  I've checked that there's no correlation between the keys for
writes which required a retry and the keys for the failed reads (i.e. the reads are failing
for keys that were written fine at the first attempt).
-- I've retried this test but limiting the rate of initial writes to be much lower (from 8000/s
down to 2000/s).  This makes the problem go away completely: no more read failures.

So it seems like I have exposed a genuine bug in Cassandra replication which manifests under
high write load.  What's the best next step - should I be filing a bug report, and if so what
diagnostics are likely to be useful?

Thanks,
James Lee


-----Original Message-----
From: Robert Coli [mailto:rcoli@eventbrite.com] 
Sent: 19 June 2013 20:59
To: user@cassandra.apache.org; Wei Zhu
Subject: Re: Data not fully replicated with 2 nodes and replication factor 2

On Wed, Jun 19, 2013 at 11:43 AM, Wei Zhu <wz1975@yahoo.com> wrote:
> I think hints are only stored when the other node is down, not on the 
> dropped mutations. (Correct me if I am wrong, actually it's not a bad 
> idea to store hints for dropped mutations and replay them later?)

This used to be the way it worked pre-1.0...

https://issues.apache.org/jira/browse/CASSANDRA-2034

In modern cassandra, anything but a successful ack from a coordinated write results in a hint
on the coordinator.

> To solve your issue, as I mentioned, either do nodetool repair, or 
> increase your consistency level.  By the way, you probably write 
> faster than your cluster can handle if you see that many dropped mutations.

If his hints are ultimately delivered, OP should not "need" repair to be consistent.

=Rob

Mime
View raw message