cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Ortolani <ostef...@gmail.com>
Subject Re: LCS, range tombstones, and eviction
Date Tue, 16 May 2017 11:06:51 GMT
That makes sense.
I see however some unexpected performance data on my test, but I will start
another thread for that.

Thanks again!

On Fri, May 12, 2017 at 6:56 PM, Blake Eggleston <beggleston@apple.com>
wrote:

> The start and end points of a range tombstone are basically stored as
> special purpose rows alongside the normal data in an sstable. As part of a
> read, they're reconciled with the data from the other sstables into a
> single partition, just like the other rows. The only difference is that
> they don't contain any 'real' data, and, of course, they prevent 'deleted'
> data from being returned in the read. It's a bit more complicated than
> that, but that's the general idea.
>
>
> On May 12, 2017 at 6:23:01 AM, Stefano Ortolani (ostefano@gmail.com)
> wrote:
>
> Thanks a lot Blake, that definitely helps!
>
> I actually found a ticket re range tombstones and how they are accounted
> for: https://issues.apache.org/jira/browse/CASSANDRA-8527
>
> I am wondering now what happens when a node receives a read request. Are
> the range tombstones read before scanning the SStables? More interestingly,
> given that a single partition might be split across different levels, and
> that some range tombstones might be in L0 while all the rest of the data in
> L1, are all the tombstones prefetched from _all_ the involved SStables
> before doing any table scan?
>
> Regards,
> Stefano
>
> On Thu, May 11, 2017 at 7:58 PM, Blake Eggleston <beggleston@apple.com>
> wrote:
>
>> Hi Stefano,
>>
>> Based on what I understood reading the docs, if the ratio of garbage
>> collectable tomstones exceeds the "tombstone_threshold", C* should start
>> compacting and evicting.
>>
>>
>> If there are no other normal compaction tasks to be run, LCS will attempt
>> to compact the sstables it estimates it will be able to drop the most
>> tombstones from. It does this by estimating the number of tombstones an
>> sstable has that have passed the gc grace period. Whether or not a
>> tombstone will actually be evicted is more complicated. Even if a tombstone
>> has passed gc grace, it can't be dropped if the data it's deleting still
>> exists in another sstable, otherwise the data would appear to return. So, a
>> tombstone won't be dropped if there is data for the same partition in other
>> sstables that is older than the tombstone being evaluated for eviction.
>>
>> I am quite puzzled however by what might happen when dealing with range
>> tombstones. In that case a single tombstone might actually stand for an
>> arbitrary number of normal tombstones. In other words, do range
>> tombstones
>> contribute to the "tombstone_threshold"? If so, how?
>>
>>
>> From what I can tell, each end of the range tombstone is counted as a
>> single tombstone tombstone. So a range tombstone effectively contributes
>> '2' to the count of tombstones for an sstable. I'm not 100% sure, but I
>> haven't seen any sstable writing logic that tracks open tombstones and
>> counts covered cells as tombstones. So, it's likely that the effect of
>> range tombstones covering many rows are under represented in the droppable
>> tombstone estimate.
>>
>> I am also a bit confused by the "tombstone_compaction_interval". If I am
>> dealing with a big partition in LCS which is receiving new records every
>> day,
>> and a weekly incremental repair job continously anticompacting the data
>> and
>> thus creating SStables, what is the likelhood of the default interval
>> (10 days) to be actually hit?
>>
>>
>> It will be hit, but probably only in the repaired data. Once the data is
>> marked repaired, it shouldn't be anticompacted again, and should get old
>> enough to pass the compaction interval. That shouldn't be an issue though,
>> because you should be running repair often enough that data is repaired
>> before it can ever get past the gc grace period. Otherwise you'll have
>> other problems. Also, keep in mind that tombstone eviction is a part of all
>> compactions, it's just that occasionally a compaction is run specifically
>> for that purpose. Finally, you probably shouldn't run incremental repair on
>> data that is deleted. There is a design flaw in the incremental repair used
>> in pre-4.0 of cassandra that can cause consistency issues. It can also
>> cause a *lot* of over streaming, so you might want to take a look at how
>> much streaming your cluster is doing with full repairs, and incremental
>> repairs. It might actually be more efficient to run full repairs.
>>
>> Hope that helps,
>>
>> Blake
>>
>> On May 11, 2017 at 7:16:26 AM, Stefano Ortolani (ostefano@gmail.com)
>> wrote:
>>
>> Hi all,
>>
>> I am trying to wrap my head around how C* evicts tombstones when using
>> LCS.
>> Based on what I understood reading the docs, if the ratio of garbage
>> collectable tomstones exceeds the "tombstone_threshold", C* should start
>> compacting and evicting.
>>
>> I am quite puzzled however by what might happen when dealing with range
>> tombstones. In that case a single tombstone might actually stand for an
>> arbitrary number of normal tombstones. In other words, do range
>> tombstones
>> contribute to the "tombstone_threshold"? If so, how?
>>
>> I am also a bit confused by the "tombstone_compaction_interval". If I am
>> dealing with a big partition in LCS which is receiving new records every
>> day,
>> and a weekly incremental repair job continously anticompacting the data
>> and
>> thus creating SStables, what is the likelhood of the default interval
>> (10 days) to be actually hit?
>>
>> Hopefully somebody will be able to shed some lights here!
>>
>> Thanks in advance!
>> Stefano
>>
>>
>

Mime
View raw message