cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From horschi <>
Subject Re: repair, compaction, and tombstone rows
Date Tue, 06 Nov 2012 16:27:00 GMT
Hi Bryan,

As the OP of this thread,

sorry for stealing for thread btw ;-)

> it is a big itch for my use case.  Repair ends up streaming tens of
> gigabytes of data which has expired TTL and has been compacted away on some
> nodes but not yet on others.  The wasted work is not nice plus it drives up
> the memory usage (for bloom filters, indexes, etc) of all nodes since there
> are many more rows to track than planned.  Disabling the periodic repair
> lowered the per-node load by 100GB which was all dead data in my case.

What is the issue with your setup? Do you use TTLs or do you think its due
to DeletedColumns?  Was your intension to push the idea of removing
localDeletionTime from DeletedColumn.updateDigest ?


> On Mon, Nov 5, 2012 at 5:12 PM, horschi <> wrote:
>> That's true, we could just create an already gcable tombstone. It's a bit
>>> of an abuse of the localDeletionTime but why not. Honestly a good part of
>>> the reason we haven't done anything yet is because we never really had
>>> anything for which tombstones of expired columns where a big pain point.
>>> Again, feel free to open a ticket (but what we should do is retrieve the
>>> ttl from the localExpirationTime when creating the tombstone, not using the
>>> creation time (partly because that creation time is a user provided
>>> timestamp so we can't use it, and because we must still keep tombstones if
>>> the ttl < gcGrace)).
>> Created CASSANDRA-4917. I changed the example implementation to use
>> (localExpirationTime-timeToLive) for the tombstone. I agree this is not the
>> biggest itch to scratch. But it might save a few seeks here and there :-)
>> Did you also have a look at DeletedColumn? It uses the updateDigest
>> implementation from its parent class, which applies also the value to the
>> digest. Unfortunetaly the value is the localDeletionTime, which is being
>> generated on each node individually, right? (at RowMutation.delete)
>> The resolution of the time is low, so there is a good chance the
>> timestamps will match on all nodes, but that should be nothing to rely on.
>> cheers,
>> Christian

View raw message