First I need to vent.


<rant>

One of my cassandra cluster is a dual data center setup, with DC1 acting as primary, and DC2 acting as a hot backup.


Well, guess what ? I am pretty sure that it falls behind on replication. So I am told I need to run repair.


I run repair (with -pr) on DC2. First time I run it it gets *stuck* (i.e. frozen) within the first 30 seconds, with no error or any sort of message. I then run it again -- and it completes in seconds on each node, with about 50 gigs of data on each.


That seems suspicious, so I do some research.


I am told on IRC that running repair -pr will only do the repair on "100" tokens (the offset from DC1 to DC2)… Seriously ???


Repair process is, indeed, a joke: https://issues.apache.org/jira/browse/CASSANDRA-5396 . Repair is the worst thing you can do to your cluster, it consumes enormous resources, and can leave your cluster in an inconsistent state. Oh and by the way you must run it every week…. Whoever invented that process must not live in a real world, with real applications.

</rant>


No… lets have a constructive conversation.


How do I know, with certainty, that my DC2 cluster is up to date on replication ? I have a few options:


1) I set read repair chance to 100% on critical column families and I write a tool to scan every CF, every column of every row. This strikes me as very silly. 

Q1: Do I need to scan every column or is looking at one column enough to trigger a read repair ?


2) Can someone explain to me how the repair works such that I don't totally trash my cluster or spill into work week ?


Is there any improvement and clarity in 1.2 ? How about 2.0 ?




-- 

Regards,

Oleg Dulin

http://www.olegdulin.com