cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Tombstone removal optimization and question
Date Tue, 06 Nov 2018 13:20:22 GMT
Thanks for the confirmation Kurt

Le 6 nov. 2018 11:59, "kurt greaves" <kurt@instaclustr.com> a écrit :

> Yes it does. Consider if it didn't and you kept writing to the same
> partition, you'd never be able to remove any tombstones for that partition.
>
> On Tue., 6 Nov. 2018, 19:40 DuyHai Doan <doanduyhai@gmail.com wrote:
>
>> Hello all
>>
>> I have tried to sum up all rules related to tombstone removal:
>>
>> ------------------------------------------------------------
>> ----------------------
>>
>> Given a tombstone written at timestamp (t) for a partition key (P) in
>> SSTable (S1). This tombstone will be removed:
>>
>> 1) after gc_grace_seconds period has passed
>> 2) at the next compaction round, if SSTable S1 is selected (not at all
>> guaranteed because compaction is not deterministic)
>> 3) if the partition key (P) is not present in any other SSTable that is
>> NOT picked by the current round of compaction
>>
>> Rule 3) is quite complex to understand so here is the detailed
>> explanation:
>>
>> If Partition Key (P) also exists in another SSTable (S2) that is NOT
>> compacted together with SSTable (S1), if we remove the tombstone, there is
>> some data in S2 that may resurrect.
>>
>> Precisely, at compaction time, Cassandra does not have ANY detail about
>> Partition (P) that stays in S2 so it cannot remove the tombstone right away.
>>
>> Now, for each SSTable, we have some metadata, namely minTimestamp and
>> maxTimestamp.
>>
>> I wonder if the current compaction optimization does use/leverage this
>> metadata for tombstone removal. Indeed if we know that tombstone timestamp
>> (t) < minTimestamp, it can be safely removed.
>>
>> Does someone has the info ?
>>
>> Regards
>>
>>
>>

Mime
View raw message