cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-3641) inconsistent/corrupt counters w/ broken shards never converge
Date Mon, 19 Dec 2011 17:37:30 GMT


Peter Schuller commented on CASSANDRA-3641:

I'll fix the comment (it was written before I understood fully the role of deltas).

As for the JMX counter: I kind of see your concern, but at the same time - most people that
have monitoring of Cassandra at all will have setups to easily monitor/graph/alert on JMX
exposed values and I really think it's a shame if we can't add additional instrumentation
because it would confuse users.

How about putting it somewhere else, where it's clearly nothing you need to worry about normally?
I actually had an original patch before I submitted upstream where I had created a separate
MBean I called "RedFlags" because I found no good place to put the counter. The idea was that
it felt completely overkill to have a dedicated MBean for the purpose, but at the same time
I really wanted it accounted for. RedFlags was intended as a place to put counters that you
essentially always expect to be exactly 0 during healthy production use.

I could see putting more stuff there like exception counts in places where any exception indicates
a sever problem, or a count of out of disk space conditions preventing or affecting (different
bucket) compaction, or a count of GC pauses above a certain threshold, etc.

If you agree I'll volunteer to go through and add some things I can think of, along with this

Else I can certainly re-submit without the JMX counter. Or just submit a separate JIRA for
it (but that's only worth it if you might be okay with a RedFlags style approach and it's
not just this one counter).

> inconsistent/corrupt counters w/ broken shards never converge
> -------------------------------------------------------------
>                 Key: CASSANDRA-3641
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>         Attachments: 3641-0.8-internal-not-for-inclusion.txt, 3641-trunk.txt
> We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had counters
that were corrupt (hopefully due to CASSANDRA-3178). The corruption was that there would exist
shards with the *same* node_id, *same* clock id, but *different* counts.
> The counter column diffing and reconciliation code assumes that this never happens, and
ignores the count. The problem with this is that if there is an inconsistency, the result
of a reconciliation will depend on the order of the shards.
> In our case for example, we would see the value of the counter randomly fluctuating on
a CL.ALL read, but we would get consistent (whatever the node had) on CL.ONE (submitted to
one of the nodes in the replica set for the key).
> In addition, read repair would not work despite digest mismatches because the diffing
algorithm also did not care about the counts when determining the differences to send.
> I'm attaching patches that fixes this. The first patch is against our 0.8 branch, which
is not terribly useful to people, but I include it because it is the well-tested version that
we have used on the production cluster which was subject to this corruption.
> The other patch is against trunk, and contains the same change.
> What the patch does is:
> * On diffing, treat as DISJOINT if there is a count discrepancy.
> * On reconciliation, look at the count and *deterministically* pick the higher one, and:
> ** log the fact that we detected a corrupt counter
> ** increment a JMX observable counter for monitoring purposes
> A cluster which is subject to such corruption and has this patch, will fix itself with
and AES + compact (or just repeated compactions assuming the replicate-on-compact is able
to deliver correctly).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message