cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-4436) Counters in columns don't preserve correct values after cluster restart
Date Wed, 18 Jul 2012 18:33:34 GMT


Sylvain Lebresne updated CASSANDRA-4436:

    Attachment: 4436-1.1.txt

Thanks a lot Peter for helping out reproducing this issue.

The problem is that when a node stops (or is drained for that matter, we don't wait for all
compaction to end during drain as this could mean waiting for a very long time, at least with
SizeTieredCompaction) just when a compaction is finishing, it is possible for some of the
compacted file to not have -Compacted components even if the compacted file is not temporary
anymore. In other words, it is possible that when the node is restart, it will load both the
compacted files and some of the file used to compact it. While this is harmless (though inefficient)
for normal column family, this means overcounting for counters.

I'll note that even though I can't reproduce the counter bug on 1.1 with the test case above,
it is just "luck" as 1.1 is affected as well.

What we need to guarantee is that we will never use both a compacted file and one of it's
ancestor. One way to ensure that is to keep in the metadata of the compacted file, the list
of it's ancestors (we only need to keep the generation). Then when a node start, it can gather
all the ancestors of all the sstable in the data dir, and delete all those sstable that are
in this ancestor set. Since we don't want to keep ever going list of ancestors however, a
newly compacted sstable only need to keep the list of it's still live ancestor (which 99%
of the time means keeping only the generation of the file that were compacted to obtain it).
I note that if we do that, we don't need to generate -Compacted components.

Attaching patch to implement this. Attaching a patch for 1.0 and 1.1 (which aren't very different).
I wrote the 1.0 version because it's on this version that I knew how to reproduce the counter
bug reliably, and I've checked that this patch does fix the issue. However, this patch doesn't
only affect counter code and is not trivial per se, so I don't know how I feel about risking
to breaking things on 1.0 for non-counter user at this point. I think it might me wiser to
put this in 1.1.3 only and say that counter users should either apply the attached patch at
their own risk or upgrade to 1.1.3.

> Counters in columns don't preserve correct values after cluster restart
> -----------------------------------------------------------------------
>                 Key: CASSANDRA-4436
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.10
>            Reporter: Peter Velas
>            Assignee: Sylvain Lebresne
>             Fix For: 1.1.3
>         Attachments: 4436-1.0.txt, 4436-1.1.txt, increments.cql.gz
> Similar to #3821. but affecting normal columns. 
> Set up a 2-node cluster with rf=2.
> 1. Create a counter column family and increment a 100 keys in loop 5000 times. 
> 2. Then make a rolling restart to cluster. 
> 3. Again increment another 5000 times.
> 4. Make a rolling restart to cluster.
> 5. Again increment another 5000 times.
> 6. Make a rolling restart to cluster.
> After step 6 we were able to reproduce bug with bad counter values. 
> Expected values were 15 000. Values returned from cluster are higher then 15000 + some
random number.
> Rolling restarts are done with nodetool drain. Always waiting until second node discover
its down then kill java process. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message