cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Node goes AWOL briefly; failed replication does not report error to client, though consistency=ALL
Date Wed, 08 Dec 2010 15:30:21 GMT
On Tue, Dec 7, 2010 at 4:00 PM, Reverend Chip <> wrote:
> On 12/7/2010 1:10 PM, Jonathan Ellis wrote:
>> I'm inclined to think there's a bug in your client, then.
> That doesn't pass the smell test.  The very same client has logged
> timeout and unavailable exceptions on other occasions, e.g. when there
> are too many clients or (in a previous configuration) when the JVMs had
> insufficient memory.  It's too much of a coincidence to believe that the
> client's exception reporting happens to fail only at the same time that
> a server experiences unexplained and problematic gossip failures.

You're probably right.

>>   DEBUG-level
>> logs could confirm or refute this by logging for each insert how many
>> replicas are being blocked for, which nodes it got responses from, and
>> whether a TimedOutException from not getting ALL replies was returned
>> to the client.
> Full DEBUG level logs would be a space problem; I'm loading at least 1T
> per node (after 3x replication), and these events are rare.  Can the
> DEBUG logs be limited to the specific modules helpful for this diagnosis
> of the gossip problem and, secondarily, the failure to report
> replication failure?

The gossip problem is almost certainly due to a GC pause.  You can
check that by enabling verbose GC logging (uncomment the lines in

The replication failure is what we want DEBUG logs for, and
restricting it to the right modules isn't going to help since when
you're stress-testing writes, the write modules are going to be 99% of
the log volume anyway.

Maybe a script to constantly throw away all but the most recent log
file until you see the WARN line would be sufficient workaround?

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support

View raw message