On Tue, Nov 6, 2012 at 8:27 AM, horschi <horschi@gmail.com> wrote:
it is a big itch for my use case.  Repair ends up streaming tens of gigabytes of data which has expired TTL and has been compacted away on some nodes but not yet on others.  The wasted work is not nice plus it drives up the memory usage (for bloom filters, indexes, etc) of all nodes since there are many more rows to track than planned.  Disabling the periodic repair lowered the per-node load by 100GB which was all dead data in my case.

What is the issue with your setup? Do you use TTLs or do you think its due to DeletedColumns?  Was your intension to push the idea of removing localDeletionTime from DeletedColumn.updateDigest ?

I don't know enough about the code level implementation to comment on the validity of the fix.  My main issue is that we use a lot of TTL columns and in many cases all columns have a TTL that is less than gc_grace.  The problem arises when the columns are gc-able and are compacted away on one node but not on all replicas, the periodic repair process ends up copying all the garbage columns & rows back to all other replicas.  It consumes a lot of repair resources and makes rows stick around for much longer than they really should which consumes even more cluster resources.