I have a very reliable repro case on our cluster involving nodetool repair. I posted a summary in a comment on the issue. Let me know if more details are needed.


On Fri, Sep 7, 2012 at 8:35 AM, Sylvain Lebresne <sylvain@datastax.com> wrote:
> Is there a way to fix this error ? What is its impact on my data ?

The fact that the message shows means that Cassandra has attempted to
"repair" the problem so there isn't much to do. However the fact that
you do get the messages in the first means that there is a bug
somewhere that generate those.
Now as Peter said, we don't know what is that bug that generate this problem

> What is its impact on my data ?

The problem is that as Peter said, we actually don't know what is
causing that problem. What the message said though is that two
different values have been found for a given counter (it's two
different values for a sub-part of the counter but that's a technical
detail). Now what the code does to "repair" in that case is to pick
the higher of the two value it has. But honestly that's random,
there's a 50/50 chance that it will pick the right value.

The main problem is that I have not clue how to reproduce this easily,
which makes it really hard to track. If someone finds a way to
reproduce, please do share by all mean (on
https://issues.apache.org/jira/browse/CASSANDRA-4417 typically). What
I can suggest is that if you have a log with multiple instances of
said log message, you attach it to the ticket. I can have a look to
see if there is some pattern between the different occurrences that
suggest a reason why this happen. But to be honest I have some doubts
that it will help much short of having a way to reproduce.

I will also note that we did fixed a bug that was affecting counters
in 1.1.3 (https://issues.apache.org/jira/browse/CASSANDRA-4436). I
don't really think this could be the cause of what you are seeing, but
there is a slim chance that I'm wrong on that. So it's probably worth
upgrading to be sure.