cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (Created) (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-3641) inconsistent/corrupt counters w/ broken shards never converge
Date Thu, 15 Dec 2011 20:11:30 GMT
inconsistent/corrupt counters w/ broken shards never converge

                 Key: CASSANDRA-3641
             Project: Cassandra
          Issue Type: Bug
            Reporter: Peter Schuller

We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had counters that
were corrupt (hopefully due to CASSANDRA-3178). The corruption was that there would exist
shards with the *same* node_id, *same* clock id, but *different* counts.

The counter column diffing and reconciliation code assumes that this never happens, and ignores
the count. The problem with this is that if there is an inconsistency, the result of a reconciliation
will depend on the order of the shards.

In our case for example, we would see the value of the counter randomly fluctuating on a CL.ALL
read, but we would get consistent (whatever the node had) on CL.ONE (submitted to one of the
nodes in the replica set for the key).

In addition, read repair would not work despite digest mismatches because the diffing algorithm
also did not care about the counts when determining the differences to send.

I'm attaching patches that fixes this. The first patch is against our 0.8 branch, which is
not terribly useful to people, but I include it because it is the well-tested version that
we have used on the production cluster which was subject to this corruption.

The other patch is against trunk, and contains the same change.

What the patch does is:

* On diffing, treat as DISJOINT if there is a count discrepancy.
* On reconciliation, look at the count and *deterministically* pick the higher one, and:
** log the fact that we detected a corrupt counter
** increment a JMX observable counter for monitoring purposes

A cluster which is subject to such corruption and has this patch, will fix itself with and
AES + compact (or just repeated compactions assuming the replicate-on-compact is able to deliver

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message