Hello,

I have a cluster running cassandra 1.2.12. On one node I'm getting exceptions about corruption detected in one of the DB files. Exceptions occurred when I was trying to run the upgradesstables nodetool command. After this exception upgradesstables couldn't continue.
Then I decided to run nodetool scrub on corrupted columnfamily. It failed with the same exception. I found another approach with offline scrubbing using sstablescrub utility. It complained about the same corrupted file but I expected that it would throw away the corrupted rows and rebuild this file. Below some log message from sstablescrub:

Scrubbing SSTableReader(path='/var/lib/cassandra/data/Foo/Bar/Foo-Bar-hf-6247-Data.db')
WARNING: Non-fatal error reading row (stacktrace follows)
WARNING: Row at 83742783839 is unreadable; skipping to next
Error scrubbing SSTableReader(path='/var/lib/cassandra/data/Foo/Bar/Foo-Bar-hf-6247-Data.db'): org.apache.cassandra.io.compress.CorruptBlockException: (/var/lib/cassandra/data/Foo/Bar/Foo-Bar-hf-6247-Data.db): corruption detected, chunk at 71488440767 of length 55562.


After sstablescrub had finished, the corrupted file did not seem modified. All files had been recreated, except for the corrupted one.
I anyway tried to run upgradesstables but this time in offline mode with sstableupgrade utility.
It failed with a similar exception:

Found 1 sstables that need upgrading.
Upgrading SSTableReader(path='/var/lib/cassandra/data/Foo/Bar/Foo-Bar-hf-6247-Data.db')
Error upgrading SSTableReader(path='/var/lib/cassandra/data/Foo/Bar/Foo-Bar-hf-6247-Data.db'): org.apache.cassandra.io.compress.CorruptBlockException: (/var/lib/cassandra/data/Foo/Bar/Foo-Bar-hf-6247-Data.db): corruption detected, chunk at 71488440767 of length 55562.


Cluster has RF = 3 so it should be safe to just remove this file and run repair. But I would prefer to fix this file instead of removing. It is possible?

I've compared this CF on each node with cfstats and it looks like on the node with corrupted data the CF size is double of size on the other nodes. Corruption is there for a long time so all repairs were failing with this exception. I'm a bit worry if this CF is properly replicated.

Do you have any suggestion how to safely recover this CF?

Thank you,
Michal