As the OP of this thread, it is a big itch for my use case. Repair ends up streaming tens of gigabytes of data which has expired TTL and has been compacted away on some nodes but not yet on others. The wasted work is not nice plus it drives up the memory usage (for bloom filters, indexes, etc) of all nodes since there are many more rows to track than planned. Disabling the periodic repair lowered the per-node load by 100GB which was all dead data in my case.
Created CASSANDRA-4917. I changed the example implementation to use (localExpirationTime-timeToLive) for the tombstone. I agree this is not the biggest itch to scratch. But it might save a few seeks here and there :-)That's true, we could just create an already gcable tombstone. It's a bit of an abuse of the localDeletionTime but why not. Honestly a good part of the reason we haven't done anything yet is because we never really had anything for which tombstones of expired columns where a big pain point. Again, feel free to open a ticket (but what we should do is retrieve the ttl from the localExpirationTime when creating the tombstone, not using the creation time (partly because that creation time is a user provided timestamp so we can't use it, and because we must still keep tombstones if the ttl < gcGrace)).
Did you also have a look at DeletedColumn? It uses the updateDigest implementation from its parent class, which applies also the value to the digest. Unfortunetaly the value is the localDeletionTime, which is being generated on each node individually, right? (at RowMutation.delete)
The resolution of the time is low, so there is a good chance the timestamps will match on all nodes, but that should be nothing to rely on.