cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Wille <>
Subject Re: Written data is lost and no exception thrown back to the client
Date Fri, 21 Aug 2015 11:04:19 GMT
RF=1 with QUORUM consistency. I know QUORUM is weird with RF=1, but it should be the same as
ONE. If’s QUORUM instead of ONE because production has RF=3, and I was running this against
my test cluster with RF=1.

On Aug 20, 2015, at 7:28 PM, Jason <<>>

What consistency level were the writes?
From: Robert Wille<>
Sent: ‎8/‎20/‎2015 18:25
Subject: Written data is lost and no exception thrown back to the client

I wrote a data migration application which I was testing, and I pushed it too hard and the
FlushWriter thread pool blocked, and I ended up with dropped mutation messages. I compared
the source data against what is in my cluster, and as expected I have missing records. The
strange thing is that my application didn’t error out. I’ve been doing some forensics,
and there’s a lot about this that makes no sense and makes me feel very uneasy.

I use a lot of asynchronous queries, and I thought it was possible that I had bad error handling,
so I checked for errors in other, independent ways.

I have a retry policy that on the first failure logs the error and then requests a retry.
On the second failure it logs the error and then rethrows. A few retryable errors appeared
in my logs, but no fatal errors. In theory, I should have a fatal error in my logs for any
error that gets reported back to the client.

I wrap my Session object, and all queries go through this wrapper. This wrapper logs all query
errors. Synchronous queries are wrapped in a try/catch which logs and rethrows. Asynchronous
queries use a FutureCallback to log any onFailure invocations.

My logs indicate that no errors whatsoever were reported back to me. I do not understand how
I can get dropped mutation messages and not know about it. I am running 2.0.16 with datastax
Java driver 2.0.8. Three node cluster with RF=1. If someone could help me understand how this
can occur, I would greatly appreciate it. A database that errors out is one thing. A database
that errors out and makes you think everything was fine is quite another.



View raw message