On Thu, Aug 1, 2013 at 9:35 AM, Carl Lerche <me@carllerche.com> wrote:
I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait.

TTL is effectively DELETE; you need to run a repair once every gc_grace_seconds. If you don't, data might un-delete itself. Even if you don't care about data un-deleting itself, you still need to run repair occasionally to ensure overall consistency. Hinted handoff and read repair are only an optimization and do not have an official responsibility for providing consistency.

If you struggle with the overhead of repair, one way to reduce the pain is to increase gc_grace_seconds. The default of 10 days is arbitrary and IMO too low, something more like 30 days will reduce the fixed very-high cost of repair, at the cost of keeping tombstones around for 3x as long.

If you are running a version below 1.2.6, especially below 1.2.0, the combination of TTL with repair can lead to insane over-repair.

https://issues.apache.org/jira/browse/CASSANDRA-4905
https://issues.apache.org/jira/browse/CASSANDRA-5398

There is a mechanism for incremental (manually managed..) repair.

https://issues.apache.org/jira/browse/CASSANDRA-3912

=Rob